Skip to content

Commit

Permalink
Merge pull request #2260 from billhollings/mtl3-arg-buff-support
Browse files Browse the repository at this point in the history
Improvements to bindless resources and descriptor indexing
  • Loading branch information
billhollings committed Jul 3, 2024
2 parents c6373b8 + bfb35bd commit 94f5ff8
Show file tree
Hide file tree
Showing 22 changed files with 775 additions and 641 deletions.
12 changes: 2 additions & 10 deletions Docs/MoltenVK_Configuration_Parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -616,22 +616,14 @@ cleared via a call to the `vkTrimCommandPoolKHR()` command.
---------------------------------------
#### MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS

##### Type: Enumeration
- `0`: Don't use _Metal_ Argument Buffers.
- `1`: Use _Metal_ Argument Buffers for all pipelines.
- `2`: Use _Metal_ Argument Buffers only if the `VK_EXT_descriptor_indexing` extension is enabled.

##### Default: `0`
##### Type: Boolean
##### Default: `1`

Controls whether **MoltenVK** should use _Metal_ argument buffers for resources defined in descriptor sets,
if _Metal_ argument buffers are supported on the platform. Using _Metal_ argument buffers dramatically
increases the number of buffers, textures and samplers that can be bound to a pipeline shader, and in most
cases improves performance.

_**NOTE:**_ Currently, _Metal_ argument buffer support is in beta stage, and is only supported on _macOS 11.0+_,
or on older versions of _macOS_ using an _Intel_ GPU. _Metal_ argument buffers support is not available on _iOS_ or _tvOS_.
Development to support _iOS_ and _tvOS_ and a wider combination of GPU's on older _macOS_ versions is under way.


---------------------------------------
#### MVK_CONFIG_USE_MTLHEAP
Expand Down
9 changes: 9 additions & 0 deletions Docs/Whats_New.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,19 @@ MoltenVK 1.2.10

Released TBD

- Improvements to bindless resources and descriptor indexing:
- Add support for Metal3 argument buffers.
- Support argument buffers on all platforms, when Metal 3 is available.
- Support argument buffers on macOS when Metal3 is not available.
- Use Metal argument buffers by default when they are available.
- Revert MVKConfiguration::useMetalArgumentBuffers and env var
`MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS` to a boolean value, and enable it by default.
- Update max number of bindless buffers and textures per stage to 1M, per Apple Docs.
- Add option to generate a GPU capture via a temporary named pipe from an external process.
- Fix shader conversion failure when using native texture atomics.
- MSL shader conversion, only pass resource bindings that apply to current shader stage.
- Update documentation for minimum runtime OS requirements to indicate _macOS 10.15_, _iOS 13_, or _tvOS 13_.
- Update `MVK_PRIVATE_API_VERSION` to version `42`.
- Update to latest SPIRV-Cross:
- MSL: Add option to force depth write in fragment shaders
- MSL: Improve handling of padded descriptors with argument buffers
Expand Down
2 changes: 1 addition & 1 deletion ExternalRevisions/SPIRV-Cross_repo_revision
Original file line number Diff line number Diff line change
@@ -1 +1 @@
d47a140735cb44e511d0188a6318c365789e4699
6fd1f75636b1c424b809ad8a84804654cf5ae48b
2 changes: 1 addition & 1 deletion MoltenVK/MoltenVK/API/mvk_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ extern "C" {
/**
* This header is obsolete and deprecated, and is provided for legacy compatibility only.
*
* To configure MoltenVK, use one of the following mechanisms,
* To configure MoltenVK, use one of the following mechanisms,
* as documented in MoltenVK_Configuration_Parameters.md:
*
* - The standard Vulkan VK_EXT_layer_settings extension (layer name "MoltenVK").
Expand Down
17 changes: 5 additions & 12 deletions MoltenVK/MoltenVK/API/mvk_private_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ typedef unsigned long MTLArgumentBuffersTier;
*/


#define MVK_PRIVATE_API_VERSION 41
#define MVK_PRIVATE_API_VERSION 42


#pragma mark -
Expand Down Expand Up @@ -140,14 +140,6 @@ typedef enum MVKConfigAdvertiseExtensionBits {
} MVKConfigAdvertiseExtensionBits;
typedef VkFlags MVKConfigAdvertiseExtensions;

/** Identifies the use of Metal Argument Buffers. */
typedef enum MVKUseMetalArgumentBuffers {
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_NEVER = 0, /**< Don't use Metal Argument Buffers. */
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_ALWAYS = 1, /**< Use Metal Argument Buffers for all pipelines. */
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_DESCRIPTOR_INDEXING = 2, /**< Use Metal Argument Buffers only if VK_EXT_descriptor_indexing extension is enabled. */
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_MAX_ENUM = 0x7FFFFFFF
} MVKUseMetalArgumentBuffers;

/** Identifies the Metal functionality used to support Vulkan semaphore functionality (VkSemaphore). */
typedef enum MVKVkSemaphoreSupportStyle {
MVK_CONFIG_VK_SEMAPHORE_SUPPORT_STYLE_SINGLE_QUEUE = 0, /**< Limit Vulkan to a single queue, with no explicit semaphore synchronization, and use Metal's implicit guarantees that all operations submitted to a queue will give the same result as if they had been run in submission order. */
Expand Down Expand Up @@ -240,7 +232,7 @@ typedef struct {
uint32_t apiVersionToAdvertise; /**< MVK_CONFIG_API_VERSION_TO_ADVERTISE */
MVKConfigAdvertiseExtensions advertiseExtensions; /**< MVK_CONFIG_ADVERTISE_EXTENSIONS */
VkBool32 resumeLostDevice; /**< MVK_CONFIG_RESUME_LOST_DEVICE */
MVKUseMetalArgumentBuffers useMetalArgumentBuffers; /**< MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS */
VkBool32 useMetalArgumentBuffers; /**< MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS */
MVKConfigCompressionAlgorithm shaderSourceCompressionAlgorithm; /**< MVK_CONFIG_SHADER_COMPRESSION_ALGORITHM */
VkBool32 shouldMaximizeConcurrentCompilation; /**< MVK_CONFIG_SHOULD_MAXIMIZE_CONCURRENT_COMPILATION */
float timestampPeriodLowPassAlpha; /**< MVK_CONFIG_TIMESTAMP_PERIOD_LOWPASS_ALPHA */
Expand Down Expand Up @@ -352,8 +344,8 @@ typedef struct {
uint32_t minSubgroupSize; /**< The minimum number of threads in a SIMD-group. */
VkBool32 textureBarriers; /**< If true, texture barriers are supported within Metal render passes. Deprecated. Will always be false on all platforms. */
VkBool32 tileBasedDeferredRendering; /**< If true, this device uses tile-based deferred rendering. */
VkBool32 argumentBuffers; /**< If true, Metal argument buffers are supported. */
VkBool32 descriptorSetArgumentBuffers; /**< If true, a Metal argument buffer can be assigned to a descriptor set, and used on any pipeline and pipeline stage. If false, a different Metal argument buffer must be used for each pipeline-stage/descriptor-set combination. */
VkBool32 argumentBuffers; /**< If true, Metal argument buffers are supported on the platform. */
VkBool32 descriptorSetArgumentBuffers; /**< If true, Metal argument buffers can be used for descriptor sets. */
MVKFloatRounding clearColorFloatRounding; /**< Identifies the type of rounding Metal uses for MTLClearColor float to integer conversions. */
MVKCounterSamplingFlags counterSamplingPoints; /**< Identifies the points where pipeline GPU counter sampling may occur. */
VkBool32 programmableSamplePositions; /**< If true, programmable MSAA sample positions are supported. */
Expand All @@ -364,6 +356,7 @@ typedef struct {
VkBool32 dynamicVertexStride; /**< If true, VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is supported. */
VkBool32 needsCubeGradWorkaround; /**< If true, sampling from cube textures with explicit gradients is broken and needs a workaround. */
VkBool32 nativeTextureAtomics; /**< If true, atomic operations on textures are supported natively. */
VkBool32 needsArgumentBufferEncoders; /**< If true, Metal argument buffer encoders are needed to populate argument buffer content. */
} MVKPhysicalDeviceMetalFeatures;


Expand Down
58 changes: 12 additions & 46 deletions MoltenVK/MoltenVK/Commands/MVKCommandEncoderState.mm
Original file line number Diff line number Diff line change
Expand Up @@ -651,7 +651,7 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl

_boundDescriptorSets[descSetIndex] = descSet;

if (descSet->isUsingMetalArgumentBuffers()) {
if (descSet->hasMetalArgumentBuffer()) {
// If the descriptor set has changed, track new resource usage.
if (dsChanged) {
auto& usageDirty = _metalUsageDirtyDescriptors[descSetIndex];
Expand All @@ -671,46 +671,22 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl
}
}

// Encode the dirty descriptors to the Metal argument buffer, set the Metal command encoder
// usage for each resource, and bind the Metal argument buffer to the command encoder.
// Encode the Metal command encoder usage for each resource,
// and bind the Metal argument buffer to the command encoder.
void MVKResourcesCommandEncoderState::encodeMetalArgumentBuffer(MVKShaderStage stage) {
if ( !_cmdEncoder->isUsingMetalArgumentBuffers() ) { return; }

bool useDescSetArgBuff = _cmdEncoder->isUsingDescriptorSetMetalArgumentBuffers();

MVKPipeline* pipeline = getPipeline();
uint32_t dsCnt = pipeline->getDescriptorSetCount();
for (uint32_t dsIdx = 0; dsIdx < dsCnt; dsIdx++) {
auto* descSet = _boundDescriptorSets[dsIdx];
if ( !descSet ) { continue; }
if ( !(descSet && descSet->hasMetalArgumentBuffer()) ) { continue; }

auto* dsLayout = descSet->getLayout();

// The Metal arg encoder can only write to one arg buffer at a time (it holds the arg buffer),
// so we need to lock out other access to it while we are writing to it.
auto& mvkArgEnc = useDescSetArgBuff ? dsLayout->getMTLArgumentEncoder() : pipeline->getMTLArgumentEncoder(dsIdx, stage);
lock_guard<mutex> lock(mvkArgEnc.mtlArgumentEncodingLock);

id<MTLBuffer> mtlArgBuffer = nil;
NSUInteger metalArgBufferOffset = 0;
id<MTLArgumentEncoder> mtlArgEncoder = mvkArgEnc.getMTLArgumentEncoder();
if (useDescSetArgBuff) {
mtlArgBuffer = descSet->getMetalArgumentBuffer();
metalArgBufferOffset = descSet->getMetalArgumentBufferOffset();
} else {
// TODO: Source a different arg buffer & offset for each pipeline-stage/desccriptors set
// Also need to only encode the descriptors that are referenced in the shader.
// MVKMTLArgumentEncoder could include an MVKBitArray to track that and have it checked below.
}

if ( !(mtlArgEncoder && mtlArgBuffer) ) { continue; }

auto& argBuffDirtyDescs = descSet->getMetalArgumentBufferDirtyDescriptors();
auto& resourceUsageDirtyDescs = _metalUsageDirtyDescriptors[dsIdx];
auto& shaderBindingUsage = pipeline->getDescriptorBindingUse(dsIdx, stage);

bool mtlArgEncAttached = false;
bool shouldBindArgBuffToStage = false;

uint32_t dslBindCnt = dsLayout->getBindingCount();
for (uint32_t dslBindIdx = 0; dslBindIdx < dslBindCnt; dslBindIdx++) {
auto* dslBind = dsLayout->getBindingAt(dslBindIdx);
Expand All @@ -719,32 +695,22 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl
uint32_t elemCnt = dslBind->getDescriptorCount(descSet);
for (uint32_t elemIdx = 0; elemIdx < elemCnt; elemIdx++) {
uint32_t descIdx = dslBind->getDescriptorIndex(elemIdx);
bool argBuffDirty = argBuffDirtyDescs.getBit(descIdx, true);
bool resourceUsageDirty = resourceUsageDirtyDescs.getBit(descIdx, true);
if (argBuffDirty || resourceUsageDirty) {
// Don't attach the arg buffer to the arg encoder unless something actually needs
// to be written to it. We often might only be updating command encoder resource usage.
if (!mtlArgEncAttached && argBuffDirty) {
[mtlArgEncoder setArgumentBuffer: mtlArgBuffer offset: metalArgBufferOffset];
mtlArgEncAttached = true;
}
if (resourceUsageDirtyDescs.getBit(descIdx, true)) {
auto* mvkDesc = descSet->getDescriptorAt(descIdx);
mvkDesc->encodeToMetalArgumentBuffer(this, mtlArgEncoder,
dsIdx, dslBind, elemIdx,
stage, argBuffDirty, true);
mvkDesc->encodeResourceUsage(this, dslBind, stage);
}
}
}
}
descSet->encodeAuxBufferUsage(this, stage);

// If the arg buffer was attached to the arg encoder, detach it now.
if (mtlArgEncAttached) { [mtlArgEncoder setArgumentBuffer: nil offset: 0]; }

// If it is needed, bind the Metal argument buffer itself to the command encoder,
if (shouldBindArgBuffToStage) {
auto& mvkArgBuff = descSet->getMetalArgumentBuffer();
MVKMTLBufferBinding bb;
bb.mtlBuffer = descSet->getMetalArgumentBuffer();
bb.offset = descSet->getMetalArgumentBufferOffset();
bb.mtlBuffer = mvkArgBuff.getMetalArgumentBuffer();
bb.offset = mvkArgBuff.getMetalArgumentBufferOffset();
bb.index = dsIdx;
bindMetalArgumentBuffer(stage, bb);
}
Expand All @@ -753,7 +719,7 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl
// the contents of Metal argument buffers. Triggering an extraction of the arg buffer
// contents here, after filling it, seems to correct that.
// Sigh. A bug report has been filed with Apple.
if (getDevice()->isCurrentlyAutoGPUCapturing()) { [descSet->getMetalArgumentBuffer() contents]; }
if (getDevice()->isCurrentlyAutoGPUCapturing()) { [descSet->getMetalArgumentBuffer().getMetalArgumentBuffer() contents]; }
}
}

Expand Down
2 changes: 1 addition & 1 deletion MoltenVK/MoltenVK/Commands/MVKMTLBufferAllocation.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class MVKMTLBufferAllocation : public MVKBaseObject, public MVKLinkableMixin<MVK
* Returns a pointer to the begining of this allocation memory, taking into
* consideration this allocation's offset into the underlying MTLBuffer.
*/
inline void* getContents() const { return (void*)((uintptr_t)_mtlBuffer.contents + _offset); }
void* getContents() const { return (void*)((uintptr_t)_mtlBuffer.contents + _offset); }

/** Returns the pool whence this object was created. */
MVKMTLBufferAllocationPool* getPool() const { return _pool; }
Expand Down
Loading

0 comments on commit 94f5ff8

Please sign in to comment.