-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash with partially bound texture arrays with variable descriptor count #2206
Comments
Thinking about this more. I think when a MVK resource is destroyed it needs to know all of the partially bound descriptors that reference it, and clears out those references. Otherwise dangling references to resources can occur when converting the MVK state into Metal calls. |
Perhaps I'm missing something here.
What behaviour are you seeing on other Vulkan platforms? What kind of crash are you seeing here? Do you have a call trace, or a small app that can demonstrate the issue you're encountering?
This is existing behaviour. The resources are retained until they are removed from the descriptors. However, Vulkan does require that an app not destroy resources while they are in-flight in an executing command buffer, and that a descriptor holding a destroyed resource must not be used further while it holds that resource. |
It works fine on Windows for all GPUs there. There are two crashes so far I've seen that show up.
Here is the call stack from the second. This one is much rarer. In this one the MVKImageViewPlane::_imageView is NULL.
I also notice that if I use MVK_CONFIG_PREFILL_METAL_COMMAND_BUFFERS_STYLE_IMMEDIATE_ENCODING, or ensure all my resource destructions happen in my main thread (instead of a worker thread), I can't make the crash occur. So that's another hint it's a race condition. What I think is happening is that the worker thread is clearing up resources at the same time the the descriptor binding operations are occurring inside of MoltenVK. For the first case the MVKGraphicsResourcesCommandEncoderState was given a MTLTexture that has since been deleted. For the second case the MVKImageView had it's _planes object cleaned up between the start of the getMTLTexture and the end of it (since it shouldn't be able to get into MVKImageViewPlane::getMTLTexture() when _planes.size() == 0), which it is when I inspect the stack at this point. So the question is, is it legal to destroy resources pointed to by a descriptor, while that descriptor is part of a descriptor set that is being used, even though that particular descriptor isn't referenced by the shader? |
As I mention above, Vulkan does require that an app not destroy resources while they are in-flight in an executing command buffer, and that a descriptor holding a destroyed resource must not be used further while it holds that destroyed resource:
It's possible that you may not be encountering this on other platforms, because the representation within the platform command encoder may not depend on the Vulkan resource object the way it does in MoltenVK and Metal. To improve performance, MoltenVK deliberately uses a Having said that, MoltenVK does retain the resource objects within descriptors, but if you also destroy the descriptor set on your worker thread, then it might all disappear too early. |
Thanks for your continued attention to this. The spec also states:
This clarification was only added recently, in spec v1.3.210. There is also this validation layer issue that mentions a similar workflow: I do understand why this would be problematic for MoltenVK though, as the translation layer between it and Metal requires accessing resources on the CPU to do the translation that would otherwise not be accessed by other implementations. Is my understanding correct? If so, I can try to think of a solution. |
If I have a texture array that has been sized 10, but I'm only providing 8 elements in a call to vkUpdateDescriptorSets(), it doesn't seem like the other elements in the array in MVKDescriptorSet get cleared out. This can cause a crash with deleted resources, since bindings for elements 8 and 9 will still try to get bound in
MVKDescriptorSetLayoutBinding::bind
. That function loops over the entire length of the variableDescriptorCount, and doesn't seem to account for partially bound sizes.Happy to do a PR, just looking for confirmation my assumptions are correct.
I would think the solution is to call
MVKImageDescriptor::reset()
on every element of the array that isn't set in the Update. Is that correct?Thanks
The text was updated successfully, but these errors were encountered: