Performance question

ruggy · Post by **ruggy** » Tue Sep 11, 2012 10:11 am

Hi,
Just trying to do a bit of optimisation on xbox because I seem to be getting quite a lot of slow downs coming from Space.Update(). If I can describe my simulation. I'm using the v1.2 source compiled for release.

There is one character controller.

There are 8 other characters. Each character has 3 'hit zones' which are kinematic spheres (so about 27 spheres inc another set on the charactercontroller guy). Each sphere has CollisionInformation.CollisionRules.Personal = CollisionRule.NoBroadPhase; because all I need them for is to raycast against. Which the weapons do using Space.RayCast(ray, range, rayCastResults); The positions of these 'hit zones' are manually updated by setting their Position each frame.

When a character dies they ragdoll using 9 boxes and a sphere for the head. These don't collide with the character controller only the map mesh (by using a CollisionGroupPair with the CharacterController body cylinder).

The main map collision mesh is 956 polys, it's indoors so this includes a roof. It's fed to Bepu using TriangleMesh.GetVerticesAndIndicesFromModel and added to the Space.

So when level is being played and characters are occasionally ragdolling and there's some weapon raycasts going off I'm seeing around 50 active objects and 60 collision pairs and roughly a Space.Update time of ~10ms (seems to vary a lot between 4ms and 20ms). When the collision pairs reaches ~100 I occasionally see the Space.Update time take upwards of 100ms! And the whole game slows down massively. The spike is weird and doesn't always happen.

I've tried adding 3 or 4 threads to the thread manager but it didn't make much difference.
I tried the ConfigurationHelper stuff but it made my ragdoll look bad (which is broken at the moment anyway so it's hard to know exactly the effect).
I tried passing in 0.016667f to Space.Update() so that it never needs to catch up but it still spiked to ~100ms when there were 70-100 collision pairs.
Don't think it's garbage related since I monitor that already and there's no correlation between that and slow downs.

Does that sound as expected or should it be faster? Ideally I'd like to smooth out the spikes and reduce the overall time if possible.

Post by **Norbo** » Tue Sep 11, 2012 5:08 pm

Does that sound as expected or should it be faster? Ideally I'd like to smooth out the spikes and reduce the overall time if possible.

It sounds like the non-spiking performance should be better than it is, and spikes should not exist.

Here's a few observations:
-Simulation cost should be roughly linear with the number of collision pairs and active objects. If 50 objects and 60 pairs takes 10 ms, then 100 pairs should take a bit less than 20 ms. Internal time stepping when using Space.Update(dt) could definitely cause a spike when the simulation is too large to handle in the specified time. There's a maximum number of timesteps per update which defaults to 3, though; if multiple time steps were at fault I wouldn't expect to see above 3x20 ms per frame.

-Calling Space.Update(1/60f) with a Space.TimeStepDuration of 1/60f should never spike since it is, on average, equivalent to just calling Space.Update() with no parameter which performs one time step.

-The WallDemo, with 100 boxes and around 230 collision pairs, runs at around 10-12 milliseconds on the Xbox360. The performance is fairly consistent so long as activity is consistent. This simulation is two or three times more stressful than the one described in the game. It sounds like there's something eating up more time than expected even when it's not spiking.

-If adding threads doesn't help, it implies a nonstandard source of slowdown (assuming the threads were allocated between the hardware threads appropriately).

Given all of this, I'd start by looking at nonstandard sources of slowdowns. Since it's easy to test, I'd recommend starting with NaNs and infinities. Sometimes, when they don't crash the program completely, invalid values can just cause things to slow down a whole lot. Compile BEPUphysics with the CHECKMATH symbol. It will watch the common access points of the engine and throw an exception if some NaN or infinite value is used. Performance problems due to NaNs in the simulation usually persist for more than a frame, though; I'd expect it to stay slow until it either crashed or you restarted it.

If that fails to find anything, then try compiling BEPUphysics with the PROFILE keyword. This inserts timing code around most of the processing stages. Check to see if any stage or set of stages is using more time than expected. What is expected depends on the simulation, but in general, you can expect the NarrowPhase and Solver to typically be the biggest costs. For a very sparse simulation with a lot of objects but not many collisions, the BroadPhase can become a significant chunk of time, possibly outdoing even the NarrowPhase and Solver. The other stages like BoundingBoxUpdater and PositionUpdater should all be quite tiny. If one of the supposedly tiny stages is taking a huge amount of time or the NarrowPhase is taking 10x longer than the Solver or something equivalently strange, take note of it.

If that fails to find any productive hints, try to reproduce it on the PC. If you can get it to show up there, you'll have more powerful tools to debug it. Grab a profiler and see where it's actually slowing down. Assuming it even happens, the spike could be hidden by the PC's immense speed relative to the Xbox360.

If the PC can't reproduce it, then you don't have many options left. Try chopping out chunks of the game until it goes away, and then add pieces back until it starts happening again. Try to figure out exactly what part is responsible for triggering the spike, if there is such an individual chunk.

ruggy · Post by **ruggy** » Tue Sep 11, 2012 6:45 pm

Tried CHECKMATH but didn't get anything unfortunately.
Doesn't seem to show up on PC unfortunately.
I tried adding one player and one enemy so there was 8 active objects (2x3 hit zones (no broadphase) and 2xchar controllers) and 2 collision pairs normally, space.update was about 1-2ms. Then when enemy is killed and ragdoll enabled there are 18 active objects, ~20-24 collision pairs and the Update takes ~10ms. Not sure if this means it's specific to the ragdoll or not, the base update seems slow for only 2 collision pairs? Yet 10ms for one ragdoll seems rather a lot too.

I tried removing the CollisionGroup that stops the character controller and ragdoll colliding (just in case it was causing weirdness) and when I walked into the ragdoll I managed to get the update to take 139ms!

I couldn't find any mention of PROFILE int he Bepu code so not sure where these timers are?

Post by **Norbo** » Tue Sep 11, 2012 7:13 pm

I tried adding one player and one enemy so there was 8 active objects (2x3 hit zones (no broadphase) and 2xchar controllers) and 2 collision pairs normally, space.update was about 1-2ms. Then when enemy is killed and ragdoll enabled there are 18 active objects, ~20-24 collision pairs and the Update takes ~10ms. Not sure if this means it's specific to the ragdoll or not, the base update seems slow for only 2 collision pairs? Yet 10ms for one ragdoll seems rather a lot too.

There will be a little overhead regardless of the number of objects in the simulation if multithreading is enabled. This generally ranges from <0.5 ms to 2 ms depending on the platform. This isn't necessarily a flat additive baseline, but more like a minimum based on the threads not being used effectively. In other words, a range of small simulations will run at almost the same speed with multithreading on.

If multithreading is completely disabled, I would expect the overhead to be a handful of microseconds. I don't actually remember the non-multithreading baseline on the Xbox360 though. If you still see 1-2 ms with a basically empty simulation without multithreading, it might just be related the the resolution of the timer being used.

10ms for one ragdoll is indeed way too much, even for the Xbox360. If I remember right, the Xbox360 can handle the ragdoll demo's 8 ragdolls at a decent update rate. The ragdolls in that demo are quite a bit more complicated than the ActionFigureDemo's ragdoll, too.

I tried removing the CollisionGroup that stops the character controller and ragdoll colliding (just in case it was causing weirdness) and when I walked into the ragdoll I managed to get the update to take 139ms!

Unless there was some oddity causing there to be multiple thousands of collisions or constraints active, 139ms is way beyond reasonability and is definitely not expected. You can take comfort in the fact that there's some shenanigans going on, at least

I couldn't find any mention of PROFILE int he Bepu code so not sure where these timers are?

I forgot to mention that the PROFILE compilation symbol support was added in the development version.

There's also one more handy approach to finding the source of things like this. In addition to trying to cut back the existing simulation, it can help to try to build a simplified representation of the simulation in a modified BEPUphysicsDemos demo. If you can reproduce the performance issue in isolation like that, determining the source becomes much, much easier.

ruggy · Post by **ruggy** » Wed Sep 12, 2012 10:07 am

ok, so I got the latest dev version and added those profile stats to my debug drawing. Here's a couple of pics. Was quite awkward to get screenshots of the xbox at the worst cases so I think in reality it's often a bit worse than these show. This was compiled in Release and I am using threads 1,3,4,5.

Just wondering if you see anything particularly wrong in these figures?

Out of the 18 active objects, 6 have no broadphase, 2 are the character controller cylinder and the other 10 form the simple ragdoll.

Here's one where a few more ragdolls in the map.

Post by **Norbo** » Wed Sep 12, 2012 4:01 pm

The larger simulation actually has a shorter solver time. That is not expected if everything is active. If things are deactivated, then it's hard to say what is reasonable. For the purposes of narrowing down the problem, you may want to force everything to be active (entity.ActivityInformation.IsAlwaysActive = true) so that times are more consistent with the size of the simulation.

The BeforeSolverUpdateables stage is taking a lot longer in the second simulation. This would suggest there are more characters or that the characters are in more complex states. There aren't many states where a single character can take up 2+ milliseconds by itself though. If there's only one character, there might be something happening with where it's standing.

The FPS seems to be primarily driven by non-physics systems. 10 fps with a 14ms physics update implies the rest of the systems took longer than the physics by a factor of 6. Of course, this could just be the screenshot process interfering.

ruggy · Post by **ruggy** » Thu Sep 13, 2012 8:41 am

Yeah the framerate took a nose dive for some reason, I'm optimising other parts of the game now, that may give more time for physics. One question though, I display Active Objects using code from the demo, and since about 15 of those objects are kinematic spheres with no broadphase 60 seems to high, I wouldn't expect them to show as active but they seem to be doing? The code to work out that stat was this:

Code: Select all

            int countActive = 0;
            for (int i = 0; i < m_Space.Entities.Count; i++)
            {
                if (m_Space.Entities[i].ActivityInformation.IsActive)
                    countActive++;
            }

Can I force them inactive. If I manually update their position does that make them active? (which is what I'm doing). This is how I create them

EDIT: I now see that it is setting Position that makes the hitzones active so that code is probably fine.

Code: Select all

			m_HitZone_Torso = new Sphere(new Vector3(0, 5, 0), 0.8f);	//no mass param means kinematic by default
			m_HitZone_Torso.CollisionInformation.CollisionRules.Personal = CollisionRule.NoBroadPhase;
			m_HitZone_Torso.Tag = new HitTag(m_Id, Player.HITZONE_TORSO);
			m_Space.Add(m_HitZone_Torso);

The good news is I've reduce the ragdoll by one entity, I'm using the 'super speedy' configuration plus setting DefaultMinImpulse to 0.1f and although I did get a massive slowdown once it does seem quite a bit better than it was overall. So thanks for all that help, much appreciated.
I did an NProf on the PC and this showed up as quite high: public Matrix WorldTransform (EntityBase.cs) (measured as part of my player's update not bepu's). I 'get' it for 9 entities every frame while they're in ragdoll and apply it to my characters bones. Seemed a bit odd that it showed up. Any idea on optimising that?

Post by **Norbo** » Thu Sep 13, 2012 4:04 pm

If I manually update their position does that make them active? (which is what I'm doing). This is how I create them

EDIT: I now see that it is setting Position that makes the hitzones active so that code is probably fine.

That's right; setting one of the entity properties like position, orientation, or velocity will activate the entity even if it's kinematic. For a kinematic entity, the next frame will check the velocity and see if it's zero. If it is, it will go inactive- unless it has been forced awake again by setting the position etc. which prevents it from deactivating for a frame.

I did an NProf on the PC and this showed up as quite high: public Matrix WorldTransform (EntityBase.cs) (measured as part of my player's update not bepu's). I 'get' it for 9 entities every frame while they're in ragdoll and apply it to my characters bones. Seemed a bit odd that it showed up. Any idea on optimising that?

That sounds like a red herring. The WorldMatrix getter combines the Orientation quaternion and Position into a world transform. It takes a little math, but it's on the scale of nanoseconds. If it's in a really tight loop running thousands of times, it could be worth caching it outside the tight loop. But 9 gets is several orders of magnitude too few for it to be a problem.

I'd recommend trying with some other profilers like SlimTune, ANTS, EQATEC, and DotTrace.

ruggy · Post by **ruggy** » Sat Sep 15, 2012 10:35 am

Ok, so the good news is that I transplanted my modified action figure code and my static mesh into the Bepu demo framework and on the XBox (even in debug) with 3 action figures it never goes above 5ms and I never see any spikes. Where as in my game it seems that the space.update goes to around 10ms as soon as an action figure is created (active?) and (quite rarely) it will slow down by 100ms (that could be the charactercontrollers or anything, I've no idea). It was good to confirm that it's not my static mesh that's deformed or causing slow downs, it rendered cleanly in the demo framework even though it wasn't specifically built as convex hulls or anything.
Not sure where to look next.

EDIT: I managed to get a couple of screenshots that illustrate the problem. I had to run it in debug to get these but you can see the collision pairs are similar in number but the Space.Update time is hugely different. The screenshots show these to be the min and max, often the time was between 10ms and 18ms.

Must be something I'm doing. When a character respawns I remove the 'action figure' entities from the Space using Remove() and then they are re-added when they next die. I just can't see what triggers these slow downs.

Post by **Norbo** » Sat Sep 15, 2012 4:40 pm

There's always the option of continuing to add bits and pieces of the core simulation featureset to the isolated demo until the problem starts happening again. If it doesn't start happening again, then either the new implementation magically fixed it or the problem was caused by something else in the code. If it does start happening again, then you can more easily determine what particular feature is at fault. Going the other direction by cutting pieces out of the simulation could also work.

BEPUphysics

Performance question

Performance question

Re: Performance question

Re: Performance question

Re: Performance question

Re: Performance question

Re: Performance question

Re: Performance question

Re: Performance question

Re: Performance question

Re: Performance question