Relatively empty BEPU scene, 100% CPU, possibly my own fault

Discuss any questions about BEPUphysics or problems encountered.
Post Reply
mcmonkey
Posts: 92
Joined: Fri Apr 17, 2015 11:42 pm

Relatively empty BEPU scene, 100% CPU, possibly my own fault

Post by mcmonkey »

So I have a scene entirely devoid of entities.

Instead, it has a ton of staticmeshes.

However, my i7's CPU usage is at 100%... when I trace things with slimtune, I catch this:

Image

37% of all calculation time is... BEPU. Doing... nothing?

So when I said I had a ton of staticmeshes, I meant I had a few thousand, each with a few thousand triangles.

So... millions of triangles.

However, they are /static/ triangles. Not worth so much as a quick and easy loop-through!

I'm currently putting off the 'make my own physics objects for BEPU so I don't have static mesh objects everywhere eating RAM' plan... in part because that would theoretically create /more/ static objects. That BEPU seems to loop through needlessly or something like that.


(Here's the best bit: Almost all the rest of the time used is Thread.Sleep(). Don't ask me how sleeping the majority of the time creates 100% CPU usage.)

When I reduce the StaticMesh count to a few hundred, BEPU takes ~8% of the tick time, and CPU usage reverts to sane amounts (around 2%)

So, it's something about BEPU not liking big numbers of staticmesh's.



NOTE: "a few thousand" and "a few hundred" are terrible and probably massively overestimated guesses.
It's probably actually closer to less than one thousand and less than one hundred. I haven't bothered doing the math to figure it out. Surface area of a giant cuboid...
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Relatively empty BEPU scene, 100% CPU, possibly my own f

Post by Norbo »

1) That trace seems to imply a very low per-timestep cost. For example, the deactivation manager uses almost as much time as the broad phase despite there being no entities. That isn't something you would see in a time step lasting nontrivial amounts of time.

Further, if Space.Update is using 8% of execution time at 2% cpu usage and 37% execution time at 100% cpu usage, then physics CPU usage would be jumping from 0.16% to 37%, a factor of 230. Going from a 'few hundred' to a 'few thousand' static meshes might explain a factor of 10, but not a factor of 200.

Check for anything that might be running the update more often than it should. That applies to both the Space.Update and the game's update too.

2) Update(dt) is being called, which performs internal time stepping. Make sure the time being passed in is correct, otherwise the engine could end up trying to simulate much more time than it should. A common case is passing in milliseconds instead of seconds, essentially asking the engine to simulate 1000x real time. In practice, the Space.TimeStepSettings.MaximumTimeStepsPerFrame limits this, but it will still cause performance issues.

3) I measure 10,000 StaticMeshes at around 3-7 milliseconds per time step on a single thread of my 3 year old desktop. 1,024 StaticMeshes is around 0.8-1ms per time step on a single thread. (The number of triangles in each mesh is irrelevant in this scenario since nothing is being tested against them. In this case, each mesh happens to have over 20,000 triangles.)

If you observe per timestep times significantly higher than this on a desktop computer, something weird may be happening. Note that measuring the time of Space.Update(dt) will measure the cost of potentially multiple time steps, so you may want to change it to the parameterless Space.Update() to measure individual time steps.

If unusual times are confirmed, check for weird stuff like a few hundred objects all overlapping each other. For example, a mere 1,024 mutually overlapping objects cause many hundreds of thousands of bounds test every time step, eating around 10ms per time step on a single thread of my desktop.

4) The reason that the times in #3 are above zero at all is that the broad phase doesn't know that the static meshes can't collide until the broadphase gets to them. That's a weakness of the current broad phase that the StaticGroup is designed to work around. If all the StaticMeshes are in a StaticGroup, the broadphase will only see the upper level StaticGroup, eliminating essentially all the broadphase-related cost.
(Here's the best bit: Almost all the rest of the time used is Thread.Sleep(). Don't ask me how sleeping the majority of the time creates 100% CPU usage.)
The OS won't schedule a sleeping thread again until the Thread.Sleep duration has elapsed, but if 0 is passed in for the duration, it basically just tries to sacrifice the current time slice. If there's no other threads to run, it'll continue immediately without actually going idle. (It is similar to Thread.Yield, with some subtle differences about which threads can be scheduled).

So, if Thread.Sleep(0) is in a tight loop- the kind you'd often see in game loops waiting around to start the next frame- Thread.Sleep would show up in the profile results. Also, in the windows task manager, a tight loop on Thread.Sleep(0) will show up as pinging the the thread running the loop at 100% usage.

Calling Thread.Sleep with a parameter of 1 or more will keep the thread from being scheduled for at least a millisecond. If you have a tight loop on Thread.Sleep(1), CPU usage will be pretty close to zero because that thread won't be scheduled frequently enough to saturate the CPU. (Importantly, you can't count on it lasting exactly as long as you specify- the OS has some wiggle room related to clock tick rates.)
Post Reply