Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: MVKCommandBuffer: Optimize management of command storage. #1678

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

js6i
Copy link
Collaborator

@js6i js6i commented Aug 11, 2022

Note: it's a large-ish change and probably would require some more clean up. For example, I think it no longer makes sense to have a separate MVKCmdRenderPassBase. Still, I'm submitting for overall review and suggestions.

The main motivation behind this commit is to reduce the time needed to free or
reset a command buffer. With the current scheme, we need to loop over all
recorded commands and return them to their pools. This can take a significant
amount of time in games, of the order of a millisecond.

With this change, instead of pooling individual commands by their type, we
instead keep chunks of memory in each command buffer, from which we bump
allocate memory for the commands to be recorded. MVKCommandVector is introduced
to use the same memory pool for dynamic arrays of data associated with the
commands.

Pointers to commands that actually have non-trivial destructors (i.e. they keep
references to other objects) are stored separately, so that we can call their
destructors when reseting the command buffer. All the memory simply gets
discarded by setting the current chunk's index and offset to zero.

As a side note, specific applications might opt to not retain pipeline layouts
in bind/push descriptor commands (introduced by 30113fc)
if it's known not to be needed, to further reduce time spent resetting command buffers.

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much boilerplate... I'm amazed you had the patience to slog through all those code changes!

Comment on lines 62 to 66
Allocator alc;


public:
Allocator alc;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you shouldn't add a constructor to MVKSmallVector<T> and its base to initialize the allocator. Then you wouldn't have to reach into the allocator in order to set the command buffer for the bump allocator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, to me that seems to add complication for no clear reason, thus I'm leaving it for others to decide. Are you thinking doing something of the sort:
_bindings = MVKCommandVector(MVKCommandVectorAllocator<BlahBlah>(cmdBuff));
in setContent functions, or passing the command buffer to all MVKCommand constructors and putting
_bindings(MVKCommandVector(MVKCommandVectorAllocator<BlahBlah>(cmdBuff)))
in the inializer list, or something else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdavis5e, @billhollings, any preferences on this? I'm largely indifferent, other than doing the first is less intrusive.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@js6i , I highly discourage this change to MVKSmallVector. The allocator should not be exposed at all and the overall design wasn't ment to be used with a specific constructor as well. The allocator type is a pure template argument.
I think the problem is, that MVKCommandVectorAllocator is already bad by design, it should not have the member cmdBuffer and it should not call mvkPushCommandMemory directly.
There are multiple ways of redesigning this, but since I am not really familiar with MoltenVK, I can't suggest a good solution right now, since I am afraid I will brake other things.
But please do not change MVKSmallVector with the above change, it will introduce all kind of issues in the future.

Copy link
Collaborator Author

@js6i js6i Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerofly, I don't see what's the argument you're making, other than saying it's bad. Setting the allocator with accompanying data for an expandable array is a very simple thing to do, and I don't understand how that's going to be a source of issues.

One thing that I'd probably be in favor of changing is removing the allocator as a template argument, and making a common interface for it, i.e. some sort of struct Allocator.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't mean to be rude here. What I tried to say is, that exposing the allocator and actually accessing allocator elements from another class is basically mixing separate concepts. The allocator as well as the actual vector should not be intertwined like this.

I have added MVKSmallVector to MoltenVK to have a std::vector compatible container that allows the easy use of pre allocated stack elements to be used, to avoid new/delete's.
The way it is used now would not be STL compatible.

Now for this special case, I recommend to write a special class that encapsulates everything that is intended here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the first point, I still don't get it. If you want to use some allocator for an array or whatever, you gotta set it and its things. I don't see how that's mixing concepts. If I put the pointer to command buffer directly into the vector class, then I'd agree it would be mixing unrelated things.

I don't know if we need STL compatibility for anything. If we do, that's probably a valid argument.

That said, I also don't mind having dedicated code for this.

@billhollings
Copy link
Contributor

Thanks for submitting this. I'll review this next week, after the SDK release this weekend.

@italomandara
Copy link
Contributor

italomandara commented Aug 18, 2022

Bad news: compiled libMoltenVK.dylib from this branch via xcodebuild build -project MoltenVKPackaging.xcodeproj -scheme "MoltenVK Package (macOS only)" -configuration "Debug" -arch "x86_64", even though it works fine for most games, it causes one particular game to freeze randomly. I wasn't able to trace the error i followed the instructions to enable moltenvk logging but it doesn't work with this build.
I'd love to help but i need to understand why logging doesn't work on crossover. (any help appreciated)

@billhollings billhollings changed the title MVKCommandBuffer: Optimize management of command storage. WIP: MVKCommandBuffer: Optimize management of command storage. Aug 18, 2022
@js6i
Copy link
Collaborator Author

js6i commented Aug 19, 2022

Bad news: compiled libMoltenVK.dylib from this branch via xcodebuild build -project MoltenVKPackaging.xcodeproj -scheme "MoltenVK Package (macOS only)" -configuration "Debug" -arch "x86_64", even though it works fine for most games, it causes one particular game to freeze randomly. I wasn't able to trace the error i followed the instructions to enable moltenvk logging but it doesn't work with this build. I'd love to help but i need to understand why logging doesn't work on crossover. (any help appreciated)

What is the game? You can save logs in CrossOver with the "run with options" dialog, which I think should include MoltenVK errors by default, but you can also set MVK_CONFIG_LOG_LEVEL=3 environment variable. It may be useful to also set METAL_DEVICE_WRAPPER_TYPE=1 to enable Metal API validation.

@italomandara
Copy link
Contributor

italomandara commented Aug 19, 2022

@js6i it worked, i was able to enable the logs, thanks.
Game: "Elite dangerous: Odyssey"

please find the log attached:
edody.7z.txt

@js6i
Copy link
Collaborator Author

js6i commented Aug 25, 2022

@italomandara Do you use any workarounds to launch this game with CrossOver? I'm getting an error saying "Unhandled Exception: mono-io-layer-error" on startup.

@italomandara
Copy link
Contributor

italomandara commented Aug 25, 2022

@italomandara Do you use any workarounds to launch this game with CrossOver? I'm getting an error saying "Unhandled Exception: mono-io-layer-error" on startup.

ok to install Elite dangerous you need to install microsoft .net framework 4.8
Also you need dxvk, it crashes immediately on wined3d (this time not because of this WIP changes)

@phush0
Copy link

phush0 commented Aug 26, 2022

This WIP also crash Guild Wars 2. The easiest way to reproduce is fast travel. Upon appear on new spot, loading textures and assets will make game crash or freeze entirely.

@js6i
Copy link
Collaborator Author

js6i commented Aug 26, 2022

@italomandara, how to trigger this freeze? Can I expect to hit it by playing for a few minutes through the tutorial?

@phush0
Copy link

phush0 commented Aug 26, 2022

you have to unlock fast travel point and travel from somewhere to it that's it for me triggers every time

@italomandara
Copy link
Contributor

italomandara commented Aug 26, 2022

@js6i the freeze is random in Elite dangerous odyssey, it frequently happens in the main menu, but also while playing, I'm using M1Max, can't confirm you get the same on x86 machines and discrete GPUs

@phush0
Copy link

phush0 commented Aug 26, 2022

I am also with m1 max

@js6i
Copy link
Collaborator Author

js6i commented Sep 15, 2022

@italomandara I think I'll need more info about your setup. On my M1 Air Elite Dangerous seems to work fine with this change. Which version of CrossOver are you using? Do you use the version of DXVK that ships with it? Did you change any potentially important registry settings or environment variables? Esync? High res/retina mode? Fullscreen, windowed fullscreen, other graphics settings in game? Unfortunately I don't have direct access to M1 Max specifically..

@js6i
Copy link
Collaborator Author

js6i commented Sep 16, 2022

I didn't reproduce the freeze in GW2 either, but I spotted a possible problem. This PR is now updated and also rebased against current master. Feel free to retest..
@italomandara @phush0

@phush0
Copy link

phush0 commented Sep 16, 2022

yes now it will not crash, also seems that helps on fps in group events. I have played with patch like hour - hour and a half and traveled several times in very populated zones where before will hang and is ok

@italomandara
Copy link
Contributor

@js6i It works now i also experimented with -Ofast still no crashes at all.
Also, noticeable speed improvements.

@phush0
Copy link

phush0 commented Sep 23, 2022

Sadly after prolonged gaming Guild Wars 2 hangs again, last time it stall whole laptop. I don't have such problems with current master.

edit: mistake is mine because I have forced MTLEvents for Rosetta if use single queue every think is working as expected

EDIT 2: It seems that problem is triggered if I use async version of DXVK with "DXVK_ASYNC" = "1" it bottle settings

@italomandara
Copy link
Contributor

When I tested it all went well, i turned on async in dxvk but never forced MTLEvents, so this might narrow down the issue to MTLEvents?

@phush0
Copy link

phush0 commented Oct 1, 2022

It happens with single queue also when async is enabled

@js6i
Copy link
Collaborator Author

js6i commented Oct 5, 2022

Rebased on current master and some minor clean ups.

@phush0, are you getting any error messages from DXVK or MoltenVK? What about Metal runtime? (run with METAL_DEVICE_WRAPPER_TYPE=1) Isn't dxvk-async just compiling shaders in a thread? That in principle shouldn't care about how commands are stored in buffers.

@phush0
Copy link

phush0 commented Oct 5, 2022

Will check and will report back. Basically display just hangs. Actually async will skip draw at first and introduce it later as it is compiled in background, so it is more a hack than solution. May be this skip will lead to crash I am not entirely sure.

@Gcenx
Copy link

Gcenx commented Oct 5, 2022

I’d applied the first version of this on 1.11.1 and haven’t seen any issue with Witcher3.

Tested in the training grounds where you’d usually get a lockup when hit (need to have MVK_CONFIG_RESUME_LOST_DEVICE=1 set) completed it without problem.

edit:
Also tested this on master and it's still working well for me, needed to also pull-in #1735

Also tested on Lego Starwars the skywalker saga still no issues both Witcher3 and this have async enabled.

@CLAassistant
Copy link

CLAassistant commented Oct 6, 2022

CLA assistant check
All committers have signed the CLA.

@Gcenx
Copy link

Gcenx commented Oct 9, 2022

@js6i any reason to still consider this WIP?

@italomandara
Copy link
Contributor

I tested this on my entire game library and it works

@phush0
Copy link

phush0 commented Oct 12, 2022

with latest version I don't have hangs also

@Gcenx
Copy link

Gcenx commented Oct 12, 2022

While it may be working for everyone it’s possible that @js6i still considering this wip for some issue they’ve encountered, that’s why I was asking.

@oscarbg
Copy link

oscarbg commented Oct 13, 2022

sadly seems will miss the next Vulkan SDK feature cutoff (#1736)
still interesting if can be merged soon after, seeing it works now..

@Gcenx
Copy link

Gcenx commented Oct 13, 2022

@oscarbg agreed but we don’t know if there’s a reason it’s still marked as WIP

Once the next release is marked I’ll provide the usual stock package along with another that includes this PR and a coupe of other interesting patches/hacks.

@oscarbg
Copy link

oscarbg commented Oct 13, 2022

@Gcenx nice.. it's one of these additional patches the UE4 black screen fix: https://github.com/nastys/MoltenVK/releases/tag/ue4-workaround-1?
I just opened an issue (#1741) asking if this workaround makes sense to be integrated into MoltenVK and enabled via some env variable given it fixes lots of games on MacOS..

@phush0
Copy link

phush0 commented Oct 19, 2022

I still have hangs in Guild Wars 2 even with the last patch. This is what I have in log as errors

[mvk-error] VK_ERROR_INVALID_SHADER_NV: Unable to convert SPIR-V to MSL:
MSL conversion error: Vertex attribute type mismatch between host and shader


[mvk-error] VK_ERROR_INVALID_SHADER_NV: Vertex shader function could not be compiled into pipeline. See previous logged error.
err:   DxvkGraphicsPipeline: Failed to compile pipeline
err:     vs  : VS_7628e2c6c3fd5510b370e8166966c9f45137cad7
err:     fs  : FS_6dbf94ee014a451d8ba108e51e15e38b602ce07a
err:     attr 0 : location 0, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 0
err:     attr 1 : location 1, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 12
err:     attr 2 : location 2, binding 0, format VK_FORMAT_R8G8B8A8_UINT, offset 0
err:     attr 3 : location 3, binding 0, format VK_FORMAT_R16G16_SFLOAT, offset 48
err:     binding 0 : binding 0, stride 52, rate 0, divisor 0
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Unable to convert SPIR-V to MSL:
MSL conversion error: Vertex attribute type mismatch between host and shader


[mvk-error] VK_ERROR_INVALID_SHADER_NV: Vertex shader function could not be compiled into pipeline. See previous logged error.
err:   DxvkGraphicsPipeline: Failed to compile pipeline
err:     vs  : VS_f89fed9e2085bf1ab8b74645384b6177cf16fa67
err:     fs  : FS_847fb2fbf55b3d81b0d05c9786d0576f485e7be9
err:     attr 0 : location 0, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 0
err:     attr 1 : location 1, binding 0, format VK_FORMAT_R32G32B32_SFLOAT, offset 12
err:     attr 2 : location 2, binding 0, format VK_FORMAT_R8G8B8A8_UINT, offset 0
err:     attr 3 : location 3, binding 0, format VK_FORMAT_R32G32_SFLOAT, offset 24
err:     attr 4 : location 7, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 48
err:     attr 5 : location 6, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 32
err:     attr 6 : location 5, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 16
err:     attr 7 : location 4, binding 1, format VK_FORMAT_R32G32B32A32_SFLOAT, offset 0
err:     binding 0 : binding 0, stride 32, rate 0, divisor 0
err:     binding 1 : binding 1, stride 64, rate 1, divisor 1
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Unable to convert SPIR-V to MSL:
MSL conversion error: Vertex attribute type mismatch between host and shader

after several such hangs of the app I reach the point where whole laptop freeze and reboots itself

everything other is info - create this create that nothing more, no errors logged or smth

@Gcenx
Copy link

Gcenx commented Oct 19, 2022

If your using CrossOver-22.0.1 did you use the shipped DXVK or replace it with a third-party build?

@phush0
Copy link

phush0 commented Oct 19, 2022

I am using latest build from your page and yes it is CrossOver-22.0.1

@Gcenx
Copy link

Gcenx commented Oct 19, 2022

I am using latest build from your page and yes it is CrossOver-22.0.1

The latest DXVK-macOS build would be 1.10.3 that includes DXVK-CX hacks (that are still relevant) along with DXVK-async that’s only auto enabled on select titles.

@phush0
Copy link

phush0 commented Oct 19, 2022

I don't use async for guild wars. Easiest way to reproduce e port to big city from somewhere. Loading objects will lead to hang.

@italomandara
Copy link
Contributor

italomandara commented Oct 20, 2022

That seems a dxvk error not a moltenvk one... still strange that you don't get errors with the stock MoltenVK tho...

@phush0
Copy link

phush0 commented Oct 20, 2022

this don't change the fact that Guild Wars hangs with this pr and not without it

@phush0
Copy link

phush0 commented Oct 27, 2022

-[MTLDebugBlitCommandEncoder dealloc]:89: failed assertion `Command encoder released without endEncoding.'
0174:02fc:trace:seh:dispatch_exception code=c000001d flags=0 addr=00007FF8004E0D5D ip=00007FF8004E0D5D tid=02fc
code=c000001d flags=0 addr=0x689d768a ip=689d768a tid=02fc

This is what I found when game hang. I can upload detailed log just it is 170 MB

last shader before hang:

Estimated original GLSL:
#version 450

layout(constant_id = 1) const bool cb0_bound = true;
layout(constant_id = 10) const bool t0_bound = true;
layout(constant_id = 11) const bool t1_bound = true;
layout(constant_id = 12) const bool t2_bound = true;
layout(constant_id = 13) const bool t3_bound = true;
layout(constant_id = 14) const bool t4_bound = true;
layout(constant_id = 15) const bool t5_bound = true;
layout(constant_id = 16) const bool t6_bound = true;
layout(constant_id = 17) const bool t12_bound = true;
layout(constant_id = 1216) const uint omap0 = 12816u;

layout(set = 0, binding = 1, std140) uniform cb0_t
{
    vec4 m[21];
} cb0;

layout(set = 0, binding = 2) uniform sampler s0;
layout(set = 0, binding = 3) uniform sampler s1;
layout(set = 0, binding = 4) uniform sampler s2;
layout(set = 0, binding = 5) uniform sampler s3;
layout(set = 0, binding = 6) uniform sampler s4;
layout(set = 0, binding = 7) uniform sampler s5;
layout(set = 0, binding = 8) uniform sampler s6;
layout(set = 0, binding = 9) uniform sampler s12;
layout(set = 0, binding = 10) uniform texture2D t0;
layout(set = 0, binding = 11) uniform texture2D t1;
layout(set = 0, binding = 12) uniform texture2D t2;
layout(set = 0, binding = 13) uniform texture2D t3;
layout(set = 0, binding = 14) uniform texture2D t4;
layout(set = 0, binding = 15) uniform texture2D t5;
layout(set = 0, binding = 16) uniform texture2D t6;
layout(set = 0, binding = 17) uniform texture2D t12;

layout(location = 0) in vec4 v0;
layout(location = 1) in vec4 v1;
layout(location = 2) in vec4 v2;
layout(location = 3) in vec3 v3;
layout(location = 4) in vec4 v4;
layout(location = 0, index = 0) out vec4 o0;
vec4 shader_in[6];
vec4 r0;
vec4 r1;
vec4 r2;
vec4 r3;

void ps_main()
{
    vec2 _80 = vec2(cb0.m[1].x) * vec2(cb0.m[12].y, cb0.m[12].x);
    r0 = vec4(_80.x, _80.y, r0.z, r0.w);
    vec2 _87 = fract(r0.xy);
    r0 = vec4(_87.x, _87.y, r0.z, r0.w);
    vec4 _97 = r0;
    _97.x = fma(r0.x, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r0 = _97;
    vec4 _109 = r0;
    _109.z = (-cb0.m[12].z) + cb0.m[12].w;
    r0 = _109;
    vec4 _120 = r0;
    _120.x = fma(abs(r0.x), r0.z, cb0.m[12].z);
    r0 = _120;
    vec4 _131 = r1;
    _131.z = fma(shader_in[0].y, r0.x, r0.y);
    r1 = _131;
    vec4 _139 = r1;
    _139.x = r0.x * shader_in[0].x;
    r1 = _139;
    vec2 _154 = mix(vec2(0.0), texture(sampler2D(t2, s2), r1.xzxx.xy).xy, bvec2(t2_bound, t2_bound));
    r0 = vec4(_154.x, _154.y, r0.z, r0.w);
    vec2 _164 = fma(r0.xy, uintBitsToFloat(uvec2(1073741824u)), uintBitsToFloat(uvec2(3212836864u)));
    r0 = vec4(_164.x, _164.y, r0.z, r0.w);
    vec2 _178 = vec2(cb0.m[1].x) * vec2(cb0.m[13].y, cb0.m[13].x);
    r0 = vec4(r0.x, r0.y, _178.x, _178.y);
    vec2 _183 = fract(r0.zw);
    r0 = vec4(r0.x, r0.y, _183.x, _183.y);
    vec4 _191 = r0;
    _191.z = fma(r0.z, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r0 = _191;
    vec4 _201 = r1;
    _201.x = (-cb0.m[13].z) + cb0.m[13].w;
    r1 = _201;
    vec4 _212 = r0;
    _212.z = fma(abs(r0.z), r1.x, cb0.m[13].z);
    r0 = _212;
    vec4 _222 = r1;
    _222.z = fma(shader_in[0].y, r0.z, r0.w);
    r1 = _222;
    vec4 _230 = r1;
    _230.x = r0.z * shader_in[0].x;
    r1 = _230;
    vec2 _241 = mix(vec2(0.0), texture(sampler2D(t3, s3), r1.xzxx.xy).xy, bvec2(t3_bound, t3_bound));
    r0 = vec4(r0.x, r0.y, _241.x, _241.y);
    vec2 _248 = fma(r0.zw, uintBitsToFloat(uvec2(1073741824u)), uintBitsToFloat(uvec2(3212836864u)));
    r0 = vec4(r0.x, r0.y, _248.x, _248.y);
    vec2 _255 = r0.zw + r0.xy;
    r0 = vec4(_255.x, _255.y, r0.z, r0.w);
    vec2 _265 = r0.xy * vec2(cb0.m[14].x);
    r0 = vec4(r0.x, r0.y, _265.x, _265.y);
    vec2 _279 = vec2(cb0.m[1].x) * vec2(cb0.m[10].y, cb0.m[10].x);
    r1 = vec4(_279.x, _279.y, r1.z, r1.w);
    vec2 _284 = fract(r1.xy);
    r1 = vec4(_284.x, _284.y, r1.z, r1.w);
    vec4 _292 = r1;
    _292.x = fma(r1.x, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r1 = _292;
    vec4 _302 = r1;
    _302.z = (-cb0.m[10].z) + cb0.m[10].w;
    r1 = _302;
    vec4 _313 = r1;
    _313.x = fma(abs(r1.x), r1.z, cb0.m[10].z);
    r1 = _313;
    vec4 _324 = r2;
    _324.z = fma(shader_in[0].y, r1.x, r1.y);
    r2 = _324;
    vec4 _332 = r2;
    _332.x = r1.x * shader_in[0].x;
    r2 = _332;
    vec4 _344 = r1;
    _344.x = fma(cb0.m[11].x, cb0.m[1].x, uintBitsToFloat(3204448256u));
    r1 = _344;
    vec4 _349 = r1;
    _349.x = fract(r1.x);
    r1 = _349;
    vec2 _357 = fma(r0.zw, r1.xx, r2.xz);
    r1 = vec4(r1.x, _357.x, _357.y, r1.w);
    vec4 _369 = r1;
    _369.y = t1_bound ? texture(sampler2D(t1, s1), r1.yzyy.xy).y : 0.0;
    r1 = _369;
    vec4 _378 = r1;
    _378.z = cb0.m[1].x * cb0.m[11].x;
    r1 = _378;
    vec4 _383 = r1;
    _383.z = fract(r1.z);
    r1 = _383;
    vec2 _391 = fma(r1.zz, r0.zw, r2.xz);
    r0 = vec4(r0.x, r0.y, _391.x, _391.y);
    vec4 _403 = r0;
    _403.z = t1_bound ? texture(sampler2D(t1, s1), r0.zwzz.xy).y : 0.0;
    r0 = _403;
    vec4 _411 = r0;
    _411.w = (-r0.z) + r1.y;
    r0 = _411;
    vec4 _418 = r1;
    _418.y = fma(r1.z, uintBitsToFloat(1073741824u), uintBitsToFloat(3212836864u));
    r1 = _418;
    vec4 _428 = r0;
    _428.z = fma(abs(r1.y), r0.w, r0.z);
    r0 = _428;
    vec3 _446 = mix(vec3(0.0), texture(sampler2D(t6, s6), vec4(cb0.m[4].x, cb0.m[4].y, cb0.m[4].x, cb0.m[4].x).xy).xyz, bvec3(t6_bound, t6_bound, t6_bound));
    r2 = vec4(_446.x, _446.y, _446.z, r2.w);
    vec3 _461 = fma(r2.xyz, vec3(cb0.m[5].x), vec3(cb0.m[6].x));
    r3 = vec4(_461.x, _461.y, _461.z, r3.w);
    vec4 _474 = r0;
    _474.w = dot(r2.xyz, uintBitsToFloat(uvec3(1050253722u, 1058474557u, 1038174126u)));
    r0 = _474;
    vec4 _480 = r1;
    _480.w = dot(r3.xyz, uintBitsToFloat(uvec3(1050253722u, 1058474557u, 1038174126u)));
    r1 = _480;
    vec3 _487 = (-r1.www) + r3.xyz;
    r2 = vec4(_487.x, _487.y, _487.z, r2.w);
    vec3 _499 = fma(vec3(cb0.m[7].x), r2.xyz, r1.www);
    r2 = vec4(_499.x, _499.y, _499.z, r2.w);
    vec3 _506 = r0.www * r2.xyz;
    r2 = vec4(_506.x, _506.y, _506.z, r2.w);
    vec3 _520 = r0.www * vec3(cb0.m[8].x, cb0.m[8].y, cb0.m[8].z);
    r3 = vec4(_520.x, _520.y, _520.z, r3.w);
    vec3 _527 = r3.xyz + r3.xyz;
    r3 = vec4(_527.x, _527.y, _527.z, r3.w);
    vec3 _534 = r2.xyz + r2.xyz;
    r2 = vec4(_534.x, _534.y, _534.z, r2.w);
    vec3 _541 = mix(mix(max(r3.xyz, r2.xyz), r2.xyz, isnan(r3.xyz)), r3.xyz, isnan(r2.xyz));
    r3 = vec4(_541.x, _541.y, _541.z, r3.w);
    vec3 _558 = fma(r2.xyz, vec3(cb0.m[9].x, cb0.m[9].y, cb0.m[9].z), -r3.xyz);
    r2 = vec4(_558.x, _558.y, _558.z, r2.w);
    vec3 _567 = fma(r0.zzz, r2.xyz, r3.xyz);
    r2 = vec4(_567.x, _567.y, _567.z, r2.w);
    vec4 _575 = r0;
    _575.z = r0.z + uintBitsToFloat(1028443341u);
    r0 = _575;
    float _581 = r0.z * uintBitsToFloat(1073892526u);
    float _992 = isnan(0.0) ? _581 : (isnan(_581) ? 0.0 : max(_581, 0.0));
    vec4 _584 = r0;
    _584.z = isnan(1.0) ? _992 : (isnan(_992) ? 1.0 : min(_992, 1.0));
    r0 = _584;
    vec4 _593 = r0;
    _593.w = fma(r0.z, uintBitsToFloat(3221225472u), uintBitsToFloat(1077936128u));
    r0 = _593;
    vec4 _600 = r0;
    _600.z = r0.z * r0.z;
    r0 = _600;
    vec4 _607 = r3;
    _607.w = r0.z * r0.w;
    r3 = _607;
    vec3 _613 = r2.xyz * r3.www;
    r2 = vec4(_613.x, _613.y, _613.z, r2.w);
    vec3 _621 = r2.xyz * uintBitsToFloat(uvec3(1072064102u));
    r3 = vec4(_621.x, _621.y, _621.z, r3.w);
    r2 = r3 * vec4(cb0.m[15].x);
    vec4 _641 = r0;
    _641.z = t4_bound ? texture(sampler2D(t4, s4), shader_in[1].xyxx.xy).x : 0.0;
    r0 = _641;
    vec4 _652 = r0;
    _652.w = (-cb0.m[16].x) + cb0.m[16].y;
    r0 = _652;
    vec4 _662 = r0;
    _662.z = fma(r0.z, r0.w, cb0.m[16].x);
    r0 = _662;
    vec2 _668 = r0.zz * r0.xy;
    r0 = vec4(_668.x, _668.y, r0.z, r0.w);
    vec2 _678 = r0.xy * vec2(cb0.m[17].x);
    r0 = vec4(_678.x, _678.y, r0.z, r0.w);
    vec2 _688 = fma(r0.xy, r1.xx, shader_in[0].zw);
    r0 = vec4(r0.x, r0.y, _688.x, _688.y);
    vec2 _698 = fma(r0.xy, r1.zz, shader_in[0].zw);
    r0 = vec4(_698.x, _698.y, r0.z, r0.w);
    vec4 _710 = r0;
    _710.x = t0_bound ? texture(sampler2D(t0, s0), r0.xyxx.xy).y : 0.0;
    r0 = _710;
    vec4 _721 = r0;
    _721.y = t0_bound ? texture(sampler2D(t0, s0), r0.zwzz.xy).y : 0.0;
    r0 = _721;
    vec4 _729 = r0;
    _729.y = (-r0.x) + r0.y;
    r0 = _729;
    float _738 = fma(abs(r1.y), r0.y, r0.x);
    float _1003 = isnan(0.0) ? _738 : (isnan(_738) ? 0.0 : max(_738, 0.0));
    vec4 _740 = r0;
    _740.x = isnan(1.0) ? _1003 : (isnan(_1003) ? 1.0 : min(_1003, 1.0));
    r0 = _740;
    r0 = r0.xxxx * r2;
    vec4 _755 = r1;
    _755.x = dot(shader_in[3].xyz, shader_in[2].yzw);
    r1 = _755;
    float _759 = abs(r1.x);
    float _761 = uintBitsToFloat(1065353216u);
    vec4 _763 = r1;
    _763.x = isnan(_761) ? _759 : (isnan(_759) ? _761 : min(_759, _761));
    r1 = _763;
    vec4 _770 = r1;
    _770.x = (-r1.x) + uintBitsToFloat(1065353216u);
    r1 = _770;
    vec4 _775 = r1;
    _775.x = log2(r1.x);
    r1 = _775;
    vec4 _784 = r1;
    _784.x = r1.x * cb0.m[18].x;
    r1 = _784;
    vec4 _789 = r1;
    _789.x = exp2(r1.x);
    r1 = _789;
    vec4 _796 = r1;
    _796.x = (-r1.x) + uintBitsToFloat(1065353216u);
    r1 = _796;
    float _800 = uintBitsToFloat(0u);
    vec4 _802 = r1;
    _802.x = isnan(_800) ? r1.x : (isnan(r1.x) ? _800 : max(r1.x, _800));
    r1 = _802;
    r0 *= r1.xxxx;
    vec2 _815 = shader_in[5].xy + vec2(cb0.m[3].x);
    r1 = vec4(_815.x, _815.y, r1.z, r1.w);
    vec2 _826 = r1.xy * vec2(cb0.m[0].x, cb0.m[0].y);
    r1 = vec4(_826.x, _826.y, r1.z, r1.w);
    vec4 _838 = r1;
    _838.x = t12_bound ? texture(sampler2D(t12, s12), r1.xyxx.xy).x : 0.0;
    r1 = _838;
    vec4 _850 = r1;
    _850.x = fma(r1.x, cb0.m[0].w, -cb0.m[0].z);
    r1 = _850;
    vec4 _859 = r1;
    _859.x = r1.x + (-shader_in[2].x);
    r1 = _859;
    float _867 = r1.x / cb0.m[19].x;
    float _1024 = isnan(0.0) ? _867 : (isnan(_867) ? 0.0 : max(_867, 0.0));
    vec4 _869 = r1;
    _869.x = isnan(1.0) ? _1024 : (isnan(_1024) ? 1.0 : min(_1024, 1.0));
    r1 = _869;
    vec4 _876 = r1;
    _876.y = fma(r1.x, uintBitsToFloat(3221225472u), uintBitsToFloat(1077936128u));
    r1 = _876;
    vec4 _883 = r1;
    _883.x = r1.x * r1.x;
    r1 = _883;
    vec4 _890 = r1;
    _890.x = r1.x * r1.y;
    r1 = _890;
    r0 *= r1.xxxx;
    vec4 _906 = r1;
    _906.x = t5_bound ? texture(sampler2D(t5, s5), shader_in[1].zwzz.xy).x : 0.0;
    r1 = _906;
    r0 *= r1.xxxx;
    r0 *= vec4(cb0.m[20].x);
    vec3 _929 = r0.xyz * vec3(cb0.m[2].x, cb0.m[2].y, cb0.m[2].z);
    r0 = vec4(_929.x, _929.y, _929.z, r0.w);
    r0 *= vec4(cb0.m[2].w);
    vec4 _944 = r1;
    _944.x = (-shader_in[4].w) + uintBitsToFloat(1065353216u);
    r1 = _944;
    o0 = r0 * r1.xxxx;
}

void main()
{
    shader_in[0] = v0;
    shader_in[1] = v1;
    shader_in[2] = v2;
    shader_in[3] = vec4(v3.x, v3.y, v3.z, shader_in[3].w);
    shader_in[4] = v4;
    vec4 _964 = gl_FragCoord;
    _964.w = 1.0 / _964.w;
    shader_in[5] = vec4(_964.xy.x, _964.xy.y, shader_in[5].z, shader_in[5].w);
    ps_main();
    o0 = vec4(o0[bitfieldExtract(omap0, int(0u), int(4u))], o0[bitfieldExtract(omap0, int(4u), int(4u))], o0[bitfieldExtract(omap0, int(8u), int(4u))], o0[bitfieldExtract(omap0, int(12u), int(4u))]);
}


End GLSL


[mvk-info] Compiling Metal shader with FastMath enabled.

@Gcenx
Copy link

Gcenx commented Oct 31, 2022

@phush0 I noticed at the end of the log [mvk-info] Compiling Metal shader with FastMath enabled. have you tried disabling FastMath?, on discord nas had shown some titles having graphical issues with this enabled so maybe that's a contributing factor.

@phush0
Copy link

phush0 commented Oct 31, 2022

@phush0 I noticed at the end of the log [mvk-info] Compiling Metal shader with FastMath enabled. have you tried disabling FastMath?, on discord nas had shown some titles having graphical issues with this enabled so maybe that's a contributing factor.

yes no change same assert and hang

@Gcenx
Copy link

Gcenx commented Dec 10, 2022

@js6i could you please rebase this

@marzent
Copy link

marzent commented Dec 12, 2022

Took the liberty of rebasing here https://github.com/marzent/MoltenVK/tree/cmdbuffer-storage2 if anyone needs it

The main motivation behind this commit is to reduce the time needed to free or
reset a command buffer. With the current scheme, we need to loop over all
recorded commands and return them to their pools. This can take a significant
amount of time in games, of the order of a millisecond.

With this change, instead of pooling individual commands by their type, we
instead keep chunks of memory in each command buffer, from which we bump
allocate memory for the commands to be recorded. MVKCommandVector is introduced
to use the same memory pool for dynamic arrays of data associated with the
commands.

Pointers to commands that actually have non-trivial destructors (i.e. they keep
references to other objects) are stored separately, so that we can call their
destructors when reseting the command buffer. All the memory simply gets
discarded by setting the current chunk's index and offset to zero.

As a side note, specific applications might opt to not retain pipeline layouts
in bind/push descriptor commands (introduced by
30113fc) if it's known not to be needed, to
further reduce time spent resetting command buffers.
@italomandara
Copy link
Contributor

italomandara commented Dec 14, 2022

@js6i this is not from your code changes but maybe your changes revealed some existing issues
in MVKCmdTransfer.mm this line and this line i can see [mtlRendEnc popDebugGroup] comes before [mtlRendEnc endEncoding]
now if the error is failed assertion 'Command encoder released without endEncoding.' could it be because [mtlRendEnc endEncoding] should be called before releasing the command encoder instead?
Correct me if I'm wrong.

@cdavis5e
Copy link
Collaborator

could it be because [mtlRendEnc endEncoding] should be called before releasing the command encoder instead?

No. [mtlRendEnc popDebugGroup] doesn't release the command encoder. It undoes the effect of a prior -pushDebugGroup:.

@italomandara
Copy link
Contributor

could it be because [mtlRendEnc endEncoding] should be called before releasing the command encoder instead?

No. [mtlRendEnc popDebugGroup] doesn't release the command encoder. It undoes the effect of a prior -pushDebugGroup:.

right it's even done in the same order in the apple docs, sorry.

@Gcenx
Copy link

Gcenx commented Dec 14, 2022

@js6i is there and issues you’ve encountered when using this that’s caused it to be still marked as WIP?

@Gcenx
Copy link

Gcenx commented Mar 23, 2023

@js6i could you rebase this PR to master, also is there any reason this is sitting in limbo?

Gcenx pushed a commit to The-Wineskin-Project/MoltenVK that referenced this pull request Mar 24, 2023
@js6i
Copy link
Collaborator Author

js6i commented May 19, 2023

I'll most likely rebase on a newer version soon, the problem is that I haven't been able to reproduce the issue @phush0 was getting, and understandably @billhollings is not eager to merge regressive changes. Perhaps it's something that will go away on its own after rebasing.

@Gcenx
Copy link

Gcenx commented May 19, 2023

@js6i here is an easy example of a game crashing quickly with this applied.

Jedi: Fallen Order, fails to load a save without the game will load the save though the game will eventually crash out when remembering an ability (first one being wall running) when using DXVK not the case when using WineD3D Vulkan though it’s unplayable with that. The issue is probably with spirv-cross.

Though the above may have been resolved via 75b4d26 as something fails to compile causing the pipeline to get locked up.

I suspect this PR was just uncovering these issues much sooner than without.

@phush0
Copy link

phush0 commented May 19, 2023

Just rebase will test again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants