Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation issue with partially bound texture arrays #2205

Open
mbechard opened this issue Apr 13, 2024 · 16 comments
Open

Validation issue with partially bound texture arrays #2205

mbechard opened this issue Apr 13, 2024 · 16 comments
Labels

Comments

@mbechard
Copy link
Contributor

mbechard commented Apr 13, 2024

Hey, not sure if this is the right spot to put this (should it go in discussions?). I'm currently diagnosing some issues with partially bound texture arrays and MoltenVK. Looking for some guidance to diagnose it and I'll make a PR once fixed.
I am not using argument buffers.

I have an array of size 128, but the shader only uses 1 element, and I'm only binding one texture on binding index = 0. This means that indexes 1-127 are also allocated for that texture array, but unused.
If a previous operation has binded a texture to index 1, I may get a metal validation errors such as:

-[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5446: failed assertion `Draw Errors Validation
Fragment Function(main0): The pixel format (MTLPixelFormatRG16Uint) of the texture (name:uniqueID: 7359) bound at index 1 is incompatible with the data type (MTLDataTypeFloat) of the texture parameter (sTDInstanceTextures [[texture(1)]]). MTLPixelFormatRG16Uint is compatible with the data type(s) (
    uint,
    ushort

Shouldn't any currently bound indices for that array get unbound to avoid stale bindings being checked during validation?
Does that only need to happen when validation is enabled, and can be ignored otherwise, with the presumption the shader isn't going to access that stale binding?

Thanks

@mbechard mbechard changed the title Issues with partially bound texture arrays Validation issue with partially bound texture arrays Apr 13, 2024
@billhollings
Copy link
Contributor

Shouldn't any currently bound indices for that array get unbound to avoid stale bindings being checked during validation?

By "binding", are you referring to vkUpdateDescriptorSets()?

If so, vkUpdateDescriptorSets() is specifically designed by Vulkan to update only the descriptors that you indicate, and leave the others unchanged. Descriptor elements are not automatically unbound.

@mbechard
Copy link
Contributor Author

mbechard commented Apr 15, 2024

Sorry, I mean the Metal bindings set with setFragmentTexture. I guess the correct term is entry in the argument table.
Those are left as-is, but Metal's validation seems to check those entries vs the shader, and can throw that assert, even though the shader does not dynamically use that entry.

@cdavis5e
Copy link
Collaborator

So in other words, vkCmdBindDescriptorSets().

The Vulkan standard says only that, if a pipeline with a new layout is bound, and descriptor set n is then bound when there are already sets bound, then:

  • If descriptor set m < n is already bound, and the pipeline layouts are not compatible for set m, then set m is disturbed; and
  • If the pipeline layouts are not compatible for set n, then all sets following n are disturbed.

It also says that, once descriptor sets are disturbed, the descriptors become undefined, with everything that entails.

Does what you're doing work with the validation layer?

@mbechard
Copy link
Contributor Author

mbechard commented Apr 16, 2024

Thanks for the feedback. I don't have validation layers running on macOS because I link with MoltenVK directly, but the same codepath doesn't trigger validation issues on Windows.

However I think the issue is at the Metal level, not at a Vulkan usage level.
Say I have a shader that uses two integer textures at Metal bindings 0 and 1. Following that I have a shader that has two floating point textures declared at Metal bindings 0 and 1, but only index 0 is dynamically used by the shader.

In Vulkan I would provide a descriptors for the integer textures and they'd get bound to metal indices 0 and 1. Then for the next shader I could provide a PARTIALLY_BOUND descriptor that only has an entry for index 0. When the geometry for that shader is rendered the binding at 1 is still leftover as an integer texture, triggering this metal assert (even though what I did was legal from a Vulkan standpoint).

This is my understanding of what is happening from the outside atleast, without being a Metal expert.

@billhollings
Copy link
Contributor

Hmmm. MoltenVK doesn't do anything with the VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT.

The simplest thing to do would be for MoltenVK to disable descriptorBindingPartiallyBound feature flag, but it is a required feature for EXT_descriptor_indexing and Vulkan 1.2, in which case, if we can't implement this within MoltenVK, we would have to treat it as a portability limitation, similar to the way we do with robustBufferAccess

@mbechard
Copy link
Contributor Author

mbechard commented Apr 17, 2024

I don't think it's a required feature of EXT_descriptor_indexing. Just if it's not supported then all descriptors need to be assigned a valid resource.
It would be a shame to remove it though, as it does seem like Metal supports it in it's own fashion.
I think a fix to this case would be to ensure every binding the shader expects either points to something specified by the descriptorset, or is unbound in the case of it not being specified (size too small), or is an invalid resource.
The question is if that is only required for when Metal validation is enabled, and can safely be avoided for regular builds, with the assumption the shader isn't using this invalid/nil resources.

@billhollings
Copy link
Contributor

I don't think it's a required feature of EXT_descriptor_indexing.

The Vulkan spec has wording that requires it:

All Vulkan graphics implementations must support the following features:
...

  • If the descriptorIndexing feature is supported, or if the VK_EXT_descriptor_indexing extension is supported:
    ...
    • descriptorBindingPartiallyBound

I think a fix to this case would be to ensure every binding the shader expects either points to something specified by the descriptorset, or is unbound in the case of it not being specified (size too small), or is an invalid resource.

Hmmm...but doesn't your situation have a texture in index 1, but that the shader is not using it, and validation complains anyway? MoltenVK would have to somehow know the shader doesn't ever access element 1, and remove it as being set. How would MoltenVK know what indexes the shader might be accessing?

The question is if that is only required for when Metal validation is enabled, and can safely be avoided for regular builds, with the assumption the shader isn't using this invalid/nil resources.

This point is valid. We could adopt a stance that the validation is overly sensitive, and that the practical effect is that it works anyway.

@alyssarosenzweig
Copy link

alyssarosenzweig commented Apr 17, 2024

Hmmm. MoltenVK doesn't do anything with the VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT.

The simplest thing to do would be for MoltenVK to disable descriptorBindingPartiallyBound feature flag, but it is a required feature for EXT_descriptor_indexing and Vulkan 1.2, in which case, if we can't implement this within MoltenVK, we would have to treat it as a portability limitation, similar to the way we do with robustBufferAccess

It is required for the EXT, but I don't think it is for Vulkan 1.2. descriptorIndexing and hence EDI features are optional in core, see spec section "Feature requirements". No portability bit needed.

@billhollings
Copy link
Contributor

It is required for the EXT, but I don't think it is for Vulkan 1.2. descriptorIndexing and hence EDI features are optional in core, see spec section "Feature requirements". No portability bit needed.

Agreed that descriptorIndexing is optional, but MoltenVK does support it, along with EXT_descriptor_indexing.

The OP indicates use of VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT, which requires descriptorBindingPartiallyBound, and both are provided by descriptor indexing.

If descriptorIndexing is supported, that "Feature requirements" section says that we can't disable descriptorBindingPartiallyBound.

The portability change would be to include spec wording that descriptorBindingPartiallyBound is not required if descriptorIndexing is supported, but portability is active.

@mbechard
Copy link
Contributor Author

Ah my mistake, I thought descriptorBindingPartiallyBound was optional. Anyways I think a solution should maintain support for that, so that's a moot point.

Hmmm...but doesn't your situation have a texture in index 1, but that the shader is not using it, and validation complains anyway? MoltenVK would have to somehow know the shader doesn't ever access element 1, and remove it as being set. How would MoltenVK know what indexes the shader might be accessing?

Right, neither MoltenVK nor Metal can know what the shader is actually using. The Metal validation issue occurs because there is a leftover incompatible binding at index 1, and the shader may use that binding, so it complains.
I would think MoltenVK can know all the bindings the shader may use, and compare that with the descriptors that are provided. If any binding indices have missing or invalid entries, then it should clear out the Metal bindings.

@billhollings
Copy link
Contributor

billhollings commented Apr 17, 2024

I would think MoltenVK can know all the bindings the shader may use, and compare that with the descriptors that are provided. If any binding indices have missing or invalid entries, then it should clear out the Metal bindings.

Hmmm...so MoltenVK would query the shader for texture formats, check the descriptor texture formats at runtime, and refuse to bind them to Metal if they are different. This is possible with effort (both dev and runtime cost), but also does seem expensive for an edge case.

Also, this will probably all become very different, and likely moot, once MoltenVK moves to a fully bindless model soon.

@mbechard
Copy link
Contributor Author

I wouldn't expect it to compare formats. If the descriptor has a resource for a binding, it's up to the programmer to ensure it's a compatible format. This would only be for bindings that don't have an entry or have an invalid entry, since that results in 'leftover' bindings in the argument table from previous operations, that may be incompatible.

If this isn't going to be a long term issue due to a different workflow coming in though, then I totally agree we can just ignore it until then.

@billhollings
Copy link
Contributor

billhollings commented Apr 17, 2024

I wouldn't expect it to compare formats.

Isn't that the Metal validation error you're getting above? The shader is expecting the texture format type to be float, but your descriptor has a texture with an uint format.

@mbechard
Copy link
Contributor Author

The descriptor only has an entry for binding 0, which is a float texture. There is nothing specified for binding 1, so it's left as-is in the Metal argument table. In this case it happens to be an int texture from a previous shader+render operation

@billhollings
Copy link
Contributor

Ah. Okay. Just check that the second DS is only setting one texture, but the array is larger and so all the other elements need to be nilled.

Can you post the Metal function declarations for your two respective shaders, so we can see how the arguments are being declared, please? We don't need the shader body, just the

fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[texture(0)]], sampler texSmplr [[sampler(0)]])

part for each.

You can generate those by setting a MVK_CONFIG_DEBUG=1 environment variable when you run.

@mbechard
Copy link
Contributor Author

Here is the shader that is used beforehand

fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> uTDCurveTexture [[texture(0)]], texture2d<uint> uTDBandTexture [[texture(1)]], sampler uTDCurveTextureSmplr [[sampler(0)]], sampler uTDBandTextureSmplr [[sampler(1)]])

The shader with this declaration causes the validation error to occur when trying to render, due to a uint texture being left bound to binding 1.

fragment main0_out main0(main0_in in [[stage_in]], constant _RESERVED_IDENTIFIER_FIXUP_gl_DefaultUniformBlock& _35 [[buffer(0)]], array<texture2d<float>, 128> sTDInstanceTextures [[texture(0)]], sampler sTDInstanceTextureSampler [[sampler(0)]])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants