Page 1 of 1

Rigidbody sticking to mesh like fly paper

Posted: Tue Jan 21, 2020 6:28 pm
by phr00t
I'm having a very weird issue that only seems to happen on one tester's machine (out of 4 or 5 others where it works fine): any time a rigidbody touches a static mesh, all rigidbodies lock up, stutter and "stick". Here is a video of it running on the tester's machine:

https://www.youtube.com/watch?v=ZHmkl_37K08

To try and rule out any odd behavior I might be doing to explain this, I reverted all of my multithreading modifications to the bepuphysics engine and had the player shoot a simple ball to roll around the level. Still, either when the ball or the player touches the static mesh level, BOTH rigidbodies consistently get stuck and lag terribly like in the video (even if other rigidbodies were in the air).

I haven't been able to reproduce this lag / sticking behavior on any of my test machines or with any other testers with the same builds. However, it seems like a huge issue that might still affect a small population. Any thoughts or ideas are really appreciated!

Re: Rigidbody sticking to mesh like fly paper

Posted: Tue Jan 21, 2020 10:19 pm
by Norbo
Given the fact that it happens inconsistently and only on some testers, environmental issues or nondeterministic issues are most likely. I'd recommend tracking down a few leads:
1) Gather information on operating system/CPU. If possible, find another tester with the same configuration to see if it reproduces.
2) Check .net versions. If using self contained deployment, make sure it's a stable release and not a beta (very unlikely unless you've been trying out nightly builds or something). May want to try updating your SDK and redeploying with a target of that more recent SDK. Low likelihood of helping; this would imply a bug in the runtime. They do happen (and several were hit in v2's development), but they're not common.
3) If deployment isn't already self contained (shipping with its own runtime), it could be launching with a client installed SDK which could be old enough that some problems persist. Unlikely, but worth ruling out.
4) Confirm that absolutely no asynchronous access of any kind is happening. Completely sequentialize all logic. A failure that doesn't reproduce consistently and is sensitive to environmental factors is a big red flag for sneaky race conditions.

Unfortunately, there's not much else I can come up with without more information. As always, reproducing the problem in a minimal stripped down demo (preferably just a small snippet in the Demos project) for me to look at is the quickest way for me to figure it out.

Re: Rigidbody sticking to mesh like fly paper

Posted: Wed Jan 22, 2020 2:30 am
by phr00t
1) i7-4790 CPU, Windows 7 Home 64 bit, 16GB RAM. NVIDIA GTX 770.
2 & 3) It is not self-contained. I'll check that the client didn't install any weird SDK. It is interesting to note that the machine is running a rather old OS... but still should be compatible.
4) The funny thing is, the problem is very reliable and consistent on this tester's machine. 100% of the time any contact happens, there is a huge CPU spike and physics processing gets backed up and laggy. If it was related to asynchronous access or race conditions, I'd expect it to be much less predictable.

Seems to be CPU spike related: https://www.youtube.com/watch?v=DbjIrPC5Mck

Will post as I gather more information...

Re: Rigidbody sticking to mesh like fly paper

Posted: Wed Jan 22, 2020 5:40 pm
by phr00t
Solution found!

The tester's machine seemed to not have enough free cores available to process contact processing and everything else I was doing with the 3D engine. I had to reduce thread count and thread overhead of the engine itself, which freed up enough cores for physics to properly process.

What led to this breakthrough was disabling my cores, down to just 3 for this game (using Windows 10 Task Manager), to reproduce this lagging issue on my machine.

Re: Rigidbody sticking to mesh like fly paper

Posted: Thu Jan 23, 2020 12:29 am
by Norbo
Hmm... careful, monsters may still lurk.

Oversubscription can definitely hurt performance, but it typically manifests as a gradual degradation. For example, if I launch 8 jobs that each use every reported hardware thread (so a total of 16 threads per physical core on 2-way SMT), it'll be worse than 8x slower per job, but not 500x worse.

For the kind of massive performance cliff showed in the video, I would guess something really nasty is going on, especially given how simple the simulation is. It's hard to say exactly what- overprioritized threads stealing timeslices, synchronization primitives joining into a dark ritual circle, or even more indirect causes like oversubscription->increased frequency of race conditions->corruption (I keep harping on it, but such things always have to be kept in mind when dealing with 'weird inexplicable behavior').