WIP: Add: KHR_maintenance4 #2116

spnda · 2024-01-06T11:24:09Z

This adds support for KHR_maintenance4:

Allow the application to destroy their VkPipelineLayout object immediately after it was used to create another object. It is no longer necessary to keep its handle valid while the created object is in use.

this was already possible with the current code, as the layout defining data gets copied into MVKPipeline when creating.
Add a new maxBufferSize implementation-defined limit for the maximum size VkBuffer that can be created.
Add support for the SPIR-V 1.2 LocalSizeId execution mode, which can be used as an alternative to LocalSize to specify the local workgroup size with specialization constants.

SPIRV-Cross already handles this.
Add a guarantee that images created with identical creation parameters will always have the same alignment requirements.
Add new vkGetDeviceBufferMemoryRequirementsKHR, vkGetDeviceImageMemoryRequirementsKHR, and vkGetDeviceImageSparseMemoryRequirementsKHR to allow the application to query the image memory requirements without having to create an image object and query it.

This took me a bit to implement, as I initially wanted separate code for evaluating the memory requirements. However, this would duplicate a lot of code and it's hard to directly unify it because of static contexts, so I decided to go with this simple approach of just creating an implicit MVKBuffer or MVKImage object. Not sure I like this that much but it does pass CTS. I want to hear what others have to say how this could be done in the MVK codebase.
Relax the requirement that push constants must be initialized before they are dynamically accessed.

I think this is fine with Metal, as the buffers in the argument table are initially nullptr afaik. So its valid to not set the buffers if they're not used (which is what my interpretation of this sentence is).
Relax the interface matching rules to allow a larger output vector to match with a smaller input vector, with additional values being discarded.

SPIRV-Cross already handles this case as far as I can tell. I used an example Vulkan application where the vertex outputs a vec4 and the fragment accepts a vec2 or vec3, and it is handled correctly in the conversion process. It correctly replaces the float2/float3 with a float4 and then uses vector swizzling to discard the unused values.
Add a guarantee for buffer memory requirement that the size memory requirement is never greater than the result of aligning create size with the alignment memory requirement.

Unrelated, but I did some maintenance in the MVKImage and MVKDevice files in code related to this change. Mainly around the getExtent3D, getBytesPerRow, and getBytesPerLayer functions as they were repeating function calls multiple times. getBytesPerLayer would call getExtent3D and getBytesPerRow, which also calls getExtent3D. Additionally, the caller of getBytesPerLayer would also call getExtent3D. My changes drastically reduce the amount of unnecessary function calls made in that area.

billhollings

Thanks for submitting this.

A couple of changes needed, and some clarification of some of the changes.

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm

billhollings · 2024-01-10T00:22:23Z

MoltenVK/MoltenVK/GPUObjects/MVKImage.h

-	VkDeviceSize getBytesPerLayer(uint8_t planeIndex, uint32_t mipLevel);
-    
+	VkDeviceSize getBytesPerLayer(uint8_t planeIndex, VkExtent3D mipExtent);
+	


What is the need for making the changes to getBytesPerRow() and getBytesPerLayer()?

I might be missing something, but I don't see how it changes the use of these functions in the code in this PR.

If the change is required, neither of these functions are connected to MIP levels now, so the function documentation should be changed, at the very least.

I tried explaining it at the bottom of the PR message. When I originally transformed the code to be all in MVKDevice I came across these functions and it just bugged me that it would call getExtent3D 3 times and getBytesPerRow 2 times, when every caller of getBytesPerLayer would need the values themselves and could instead just pass them as a parameter. So I decided to refactor this a bit just to remove some wasted function calls.

Not entirely related to the current state of the PR, but it was relevant to the original change where I re-implemented the memory requirement functions in the MVKDevice object.

MoltenVK/MoltenVK/GPUObjects/MVKImage.h

cdavis5e · 2024-01-10T07:01:20Z

Not sure I like this that much

I don't like it either, considering that the whole point of the new memory requirements functions is that you don't have to create a resource just to figure how much memory it needs. I think it should be possible to refactor MVKBuffer::getMemoryRequirements() and MVKImage::getMemoryRequirements() to be methods on the MVKDevice object; then you could have MVKBuffer and MVKImage call back to MVKDevice, passing the appropriate information. This maps better to Metal anyway: the corresponding methods in Metal are MTLDevice methods.

spnda · 2024-01-10T09:11:20Z

Not sure I like this that much

I don't like it either, considering that the whole point of the new memory requirements functions is that you don't have to create a resource just to figure how much memory it needs. I think it should be possible to refactor MVKBuffer::getMemoryRequirements() and MVKImage::getMemoryRequirements() to be methods on the MVKDevice object; then you could have MVKBuffer and MVKImage call back to MVKDevice, passing the appropriate information. This maps better to Metal anyway: the corresponding methods in Metal are MTLDevice methods.

I originally had it setup this way but it required a lot of code to be duplicated (especially on images with computing usage, formats, atomics, mips, layers, ...), and I wasn't sure I had everything setup correctly. The only thing I came up with that I would like is having a separate MVKBufferInfo/MVKImageInfo struct which contains basic information used to compute the memory requirements which gets created in every MVKBuffer/MVKImage but also the memory requirement functions on MVKDevice. Those structs would then have the getMemoryRequirement functions.

I originally decided to just open the PR like this and then talk a bit about a possible better implementation. Though I will definitely look into shifting the functions into MVKDevice again.

billhollings · 2024-01-11T16:57:40Z

Full CTS testing of this PR caused at least one CTS crash, because vkGetDeviceImageMemoryRequirements() tried to access MVKImage::_memoryBindings out of bounds.

This exposed the fragility of how we were mapping plane indexes to memory bindings. PR #2125 fixes that fragility, and we should rebase this PR to after #2125, once it's merged (note that PR has been pushed to a temporary branch 1.2.8-temp until after the current SDK process is complete).

In particular, any references in this PR to _memoryBindings should use getMemoryBinding() instead. In particular, the new getMemoryRequirements(VkMemoryRequirements2...) should become:

VkResult MVKImage::getMemoryRequirements(VkMemoryRequirements2 *pMemoryRequirements, uint8_t planeIndex) {
	VkResult rslt = getMemoryRequirements(&pMemoryRequirements->memoryRequirements, planeIndex);
	if (rslt != VK_SUCCESS) { return rslt; }
	return getMemoryBinding(planeIndex)->getMemoryRequirements(nullptr, pMemoryRequirements);
}

spnda · 2024-01-11T17:02:05Z

Full CTS testing of this PR caused at least one CTS crash, because vkGetDeviceImageMemoryRequirements() tried to access MVKImage::_memoryBindings out of bounds.

This exposed the fragility of how we were mapping plane indexes to memory bindings. PR #2125 fixes that fragility, and we should rebase this PR to after #2125, once it's merged (note that PR has been pushed to a temporary branch 1.2.8-temp until after the current SDK process is complete).

In particular, any references in this PR to _memoryBindings should use getMemoryBinding() instead. In particular, the new getMemoryRequirements(VkMemoryRequirements2...) should become:
VkResult MVKImage::getMemoryRequirements(VkMemoryRequirements2 *pMemoryRequirements, uint8_t planeIndex) {
	VkResult rslt = getMemoryRequirements(&pMemoryRequirements->memoryRequirements, planeIndex);
	if (rslt != VK_SUCCESS) { return rslt; }
	return getMemoryBinding(planeIndex)->getMemoryRequirements(nullptr, pMemoryRequirements);
}

Interesting. I only ever tested some parts of the CTS test because I couldn't figure out if there is any txt file that contained all of the tests related to KHR_maintenance4 or any specific extension for that matter. For future reference, how do I figure out an exhaustive list of all CTS tests related to a feature, an extension, or a specific Vulkan core version (say Vulkan 1.0 conformance)?

billhollings · 2024-01-11T18:01:50Z

Interesting. I only ever tested some parts of the CTS test because I couldn't figure out if there is any txt file that contained all of the tests related to KHR_maintenance4 or any specific extension for that matter. For future reference, how do I figure out an exhaustive list of all CTS tests related to a feature, an extension, or a specific Vulkan core version (say Vulkan 1.0 conformance)?

It's complicated. Sometimes an extension will include a large number of CTS test that have part of the extension name embedded in the test name. In this case, AFAICT there are only 3 CTS tests that have maintenance4 in their name, but there are hundreds of other tests that have code like if(mtce4) embedded in them, to decide between two code paths that may or may not be particularly connected to KHR_maintenance4.

If the extension has decent CTS coverage, I don't always run full CTS tests on every extension addition, but since we were close to SDK release, I was doing that, in case we could squeeze this PR into the current SDK release.

BTW...that full CTS run also exposed a couple of hundred new test failures along the lines of:

Test case 'dEQP-VK.pipeline.monolithic.interface_matching.vector_length.out_vec4_in_vec3_member_of_block_vert_out_frag_in'..
  NotSupported (VK_KHR_maintenance4 is not supported at vktTestCase.cpp:824)

becomes:

Test case 'dEQP-VK.pipeline.monolithic.interface_matching.vector_length.out_vec4_in_vec3_member_of_block_vert_out_frag_in'..
[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Render pipeline compile failed (Error code 3):
Fragment input(s) `user(locn1)` mismatching vertex shader output type(s) or not written by vertex shader.
  Fail (retcode: VK_ERROR_INITIALIZATION_FAILED at vkPipelineConstructionUtil.cpp:186)

From the error text, it doesn't look like it's related to VK_KHR_maintenance4 specifically, but the extension seems to be exposing a code path in the test that is triggering a shader translation matching error.

In such a case, it might be good to pull the KHR_maintenance4 extension in once it's ready, and then address these in a separate PR, after.

spnda · 2024-01-11T18:41:29Z

Yes this should probably be a WIP PR. I didn't fully test it and the implementation of the memory requirements functions wasnt yet "finished". But thanks, I'll look into the CTS tests more and see where the issues are coming from.

billhollings · 2024-01-11T19:46:24Z

Yes this should probably be a WIP PR. I didn't fully test it and the implementation of the memory requirements functions wasnt yet "finished". But thanks, I'll look into the CTS tests more and see where the issues are coming from.

The important thing when adding any new extension is to try to avoid CTS regressions (ie. Pass -> Fail).

For Not Supported -> Fail, it depends on whether the now-failing functionality is caused by the new extension, or it's just exposing more examples of existing failures occurring on other tests (ie- dups of the existing 11K test failures). If the former, then it should be fixed in the extension PR. If the latter, then it depends on the priority relative to the extension (is the extension still useful). Otherwise, we risk it being asymptotically hard to add a useful extension because some existing (unrelated & non-critical) bug is still waiting to be addressed.

In this case, the mismatching vertex shader output type(s) or not written by vertex shader error does not seem to appear without KHR_maintenance4, so it should probably be addressed here.

spnda · 2024-01-17T22:12:53Z

I just want to put some of this information here for tracking. I've tried to find all CTS tests related to the extension, and ran them. There's a few things that fail with the currently pushed code:

Images with transient usage (so lazily allocated memory) seem to have different memory types depending on the image usage flags. (one has INPUT_ATTACHMENT_BIT, while the other has COLOR_ATTACHMENT_BIT)
SPIRV-Cross fails to generate valid MSL for some of the vector conversions, which seem fixable, but I think they should be fixed before merging this as it's directly related to the extension. These seem to occur for all tessellation tests, where the vector length differs, such as dEQP-VK.pipeline.monolithic.interface_matching.vector_length.out_vec4_in_vec3_loose_variable_vert_out_tesc_in_tese_frag.

[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Shader library compile failed (Error code 3):
program_source:22:38: error: member reference base type 'const device unsigned int' is not a structure or union
    float _34 = float(gl_in[0].m_39.x.x == 4u) * float(gl_in[0].m_39.y.x == 16u);
                      ~~~~~~~~~~~~~~~^~
program_source:22:71: error: member reference base type 'const device unsigned int' is not a structure or union
    float _34 = float(gl_in[0].m_39.x.x == 4u) * float(gl_in[0].m_39.y.x == 16u);
                                                       ~~~~~~~~~~~~~~~^~

Related to the same tests, SPIRV-Cross does handle some vector conversions correctly, but obviously needs to cover more cases as required by this extension. The cases I tested with direct values (so layout(location = 0) vec2 xy all pass, but as soon as there's a type of a different vector length in an array or a structure they fail with the error you posted already.

Firstly, I will refactor the memory requirement code and then look at SPIRV-Cross. I currently have a concept which adds a MVKBufferInfo and MVKImageInfo which both inherit from a MVKResourceInfo. These classes are then also used by their respective resource classes for calculating the memory requirements, as well as be created by an MVKDevice for the static memory requirement functions. I initially tried to tackle this with virtual inheritance but that quickly became a shitshow, so I'm just placing the class objects as members into the resource classes.

cdavis5e · 2024-01-17T23:46:21Z

Images with transient usage (so lazily allocated memory) seem to have different memory types depending on the image usage flags. (one has INPUT_ATTACHMENT_BIT, while the other has COLOR_ATTACHMENT_BIT)

To fix this properly, you need to implement input attachments as shader framebuffer fetch. SPIRV-Cross is already done; you just need to wire it up in MoltenVK. The hard part is mapping Vulkan input attachments to Metal render target indices.

spnda requested a review from billhollings January 6, 2024 11:24

Add: KHR_maintenance4

232a125

spnda force-pushed the KHR_maintenance4 branch from 3fb74dc to 232a125 Compare January 6, 2024 11:30

billhollings requested changes Jan 10, 2024

View reviewed changes

billhollings mentioned this pull request Jan 11, 2024

Fix potential crash when using multi-planar images. #2125

Merged

billhollings changed the title ~~Add: KHR_maintenance4~~ WIP: Add: KHR_maintenance4 Jan 16, 2024

This was referenced Jan 23, 2024

Vulkan 1.3 requirements #1930

Open

MoltenVK Enhancement Roadmap #1975

Open

spnda mentioned this pull request Mar 1, 2024

OpAccessChain swizzling itself is not handled correctly when input type is widened KhronosGroup/SPIRV-Cross#2287

Closed

spnda mentioned this pull request Apr 2, 2024

MSL: Shader input overrides are not applied for block, array, and struct types KhronosGroup/SPIRV-Cross#2305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add: KHR_maintenance4 #2116

WIP: Add: KHR_maintenance4 #2116

spnda commented Jan 6, 2024 •

edited

Loading

billhollings left a comment

billhollings Jan 10, 2024

spnda Jan 10, 2024

cdavis5e commented Jan 10, 2024

spnda commented Jan 10, 2024

billhollings commented Jan 11, 2024

spnda commented Jan 11, 2024

billhollings commented Jan 11, 2024

spnda commented Jan 11, 2024

billhollings commented Jan 11, 2024 •

edited

Loading

spnda commented Jan 17, 2024 •

edited

Loading

cdavis5e commented Jan 17, 2024

		VkDeviceSize getBytesPerLayer(uint8_t planeIndex, uint32_t mipLevel);

		VkDeviceSize getBytesPerLayer(uint8_t planeIndex, VkExtent3D mipExtent);

WIP: Add: KHR_maintenance4 #2116

Are you sure you want to change the base?

WIP: Add: KHR_maintenance4 #2116

Conversation

spnda commented Jan 6, 2024 • edited Loading

billhollings left a comment

Choose a reason for hiding this comment

billhollings Jan 10, 2024

Choose a reason for hiding this comment

spnda Jan 10, 2024

Choose a reason for hiding this comment

cdavis5e commented Jan 10, 2024

spnda commented Jan 10, 2024

billhollings commented Jan 11, 2024

spnda commented Jan 11, 2024

billhollings commented Jan 11, 2024

spnda commented Jan 11, 2024

billhollings commented Jan 11, 2024 • edited Loading

spnda commented Jan 17, 2024 • edited Loading

cdavis5e commented Jan 17, 2024

spnda commented Jan 6, 2024 •

edited

Loading

billhollings commented Jan 11, 2024 •

edited

Loading

spnda commented Jan 17, 2024 •

edited

Loading