InstancedModelDrawer eats tons of RAM

diabloqfdb · Post by **diabloqfdb** » Tue Feb 12, 2013 2:04 pm

I am currently having problems with my game eating up both too much RAM in general and sometimes more than available. I am tracking down these issues which are probably mostly my fault, but I also detected that InstancedModelDrawer eats up tons of RAM.

Loading just the physics for a 2048 heightmap (split into 16x16 chunks) eats up about 80 MiB of RAM which is very low. Adding the same objects to an InstancedModelDrawer instance bumps up the RAM to 1247.

I understand that this is largely a debugging feature and it needs not be efficient, but almost 1200 MiB is a tad excessive

.

Post by **Norbo** » Tue Feb 12, 2013 8:35 pm

At a glance, the final GPU memory load should be around 190-200 megabytes for that setup. This may show up as a (possibly temporary) duplicate cost in the process as the driver handles it. On top of that, there's all the vertex data used to create the final GPU data to begin with; it's not all thrown out. 1200 certainly still seems high, though. If this is measured from the process explorer, that might just be Windows expanding the process's allocation liberally despite the process not continuously using that much.

I'm hesitant to spend much time investigating or making incremental improvements to the current ancient drawer, though. It's going to be rewritten soon when the main fork moves off of XNA anyway.

diabloqfdb · Post by **diabloqfdb** » Wed Feb 13, 2013 3:40 pm

OK, I can understand that.

What is the main fork moving to? I know that XNA is dead and I am even tempted to move away from C# to straight DirectX with C++, but I am kind of attached to BEPU and the way physics behaves here.

Another question: how can you handle tens of thousands of physics enabled objects in a large sandbox game. I am making incremental changes and optimizations, every time increasing the world size. To handle memory requirement, I started using InstancedMesh instead of StaticMesh. What are the performance implications of this? I know for a fact that RAM use is drastically reduced, but what about performance?

I have the world represented as rectangular chunks and I need some system to enable/disable physics for distant chunks. But I a robust way. I am afraid of having objects resting one on top of other and once you move far away, physics gets disabled, you come back, and you find things in different positions or even clipped into each other. And what about moving objects that have yet to settle at the point you cross the distance threshold and shut down the simulation for that chunk. Populating about 1/4 of the world with objects causes physics to take quite a bit. I am not even rendering a large majority of them, switching to low LOD mesh at a distance and fading out at a greater distance.

Post by **Norbo** » Wed Feb 13, 2013 10:47 pm

What is the main fork moving to? I know that XNA is dead and I am even tempted to move away from C# to straight DirectX with C++, but I am kind of attached to BEPU and the way physics behaves here.

I'll be pulling the dependency free version fork into the main branch. XNA support will stick around, but as another fork like SlimDX and SharpDX. All of the forks will switch to using a new renderer. Currently, the plan is to base the new demos renderer on either SharpDX or, for more cross platform support, MonoGame.

More information can be found here.

To handle memory requirement, I started using InstancedMesh instead of StaticMesh. What are the performance implications of this? I know for a fact that RAM use is drastically reduced, but what about performance?

InstancedMeshes are a little slower because their acceleration structure is in local space. In order to find which triangles are near a potentially colliding object, that object must be pulled into the local space of the mesh and tested against the hierarchy rather than just directly using a world space hierarchy like the StaticMesh.

The cost of that transformation isn't zero, but it's fairly low compared to hierarchy traversal, contact generation, and boundary analysis.

There is one caveat to remember though: while it's tempting to throw instanced meshes every which way as they're nearly free in terms of memory, thousands of entries in the broad phase will slow things down a little bit eventually. Bundling them in StaticGroups can address this; instead of having 1000 entries in the broad phase, it would be just a single entry in the broad phase and then 1000 instanced meshes in a static group. This helps because the acceleration structures used by these shapes have more guarantees about functionality than the broad phase and can just skip a bunch of work.

Of course, it's always important to actually measure and test the performance to see what kind of boost you need and what kind of boost you get; the above are just high level rules of thumb.

how can you handle tens of thousands of physics enabled objects in a large sandbox game.
...
I have the world represented as rectangular chunks and I need some system to enable/disable physics for distant chunks. But I a robust way. I am afraid of having objects resting one on top of other and once you move far away, physics gets disabled, you come back, and you find things in different positions or even clipped into each other. And what about moving objects that have yet to settle at the point you cross the distance threshold and shut down the simulation for that chunk. Populating about 1/4 of the world with objects causes physics to take quite a bit.

This is a tricky problem because it generally involves implementing a 'meta' world on top of the physics world due to the efficiency constraints, and that's not a natural operation. It tends to involve discontinuities of some sort no matter how it's managed; it's just a matter of hiding those discontinuities most effectively. There's usually a nice big dollop of tedium involved.

Here's one possible model that could work well for some single player games:
1) Keep all static objects within the space so long as there exists dynamic objects near them.
2) Keep all active (i.e. not sleeping) dynamic objects in the space.
3) Keep all objects (static and dynamic) that are near the player in the space, even if sleeping.

After enforcing the above rules, you're left with a guarantee that whatever the player is interacting with is actually there, nothing will freeze mid-fall as the player moves away, and nothing will fall through the ground.

The next task is to figure out some heuristic that minimizes the cost of the simulation while guaranteeing the above 3 requirements. Trivially, the heuristic 'never remove anything from the space' satisfies the above, but is not particularly helpful for performance.

A better heuristic would be to examine dynamic objects outside of the interactive range of the player and check if they have gone to sleep yet (entity.ActivityInformation.IsActive). If they have, they can be removed.

However, entities do not go inactive in isolation. Their activity is determined by islands. An island is an interacting group of dynamic entities- the interactions can be contact points or constraints. If any one entity in the interaction island is active, the whole island is active.

So, you've got two choices if an inactive entity is found: remove it in isolation, or search out the whole island with a breadth first search through constraints or enumeration of all entities.
Removing the entity in isolation is certainly a lot simpler, but removing an entity wakes up the island that the entity belongs to. That's not good for a sleeping stack of objects- removing the bottom object would make the upper objects fall. You can fight this by manually setting the entity's former island's activity to false after removing the entity from the space. An entity's island is accessible in the entity.ActivityInformation.SimulationIsland property; it will go null once the entity is removed, though. Cache the object before entity removal and, if and only if the MemberCount (requires development fork) of the island is greater than 1 prior to removal, immediately set the island's IsActive property to false after removing the entity.

Because the above analysis doesn't need to be performed instantly, you can spread it over multiple frames. Just checking a few hundred objects per frame would keep costs down. Using the 'remove whole island at once' approach would make things a little more expensive in the worst case, as you would have to finish finding the island.

A similar pass can be performed for static objects. Enumerating through static objects and checking if anything depends on them can be done by analyzing collision pairs or by performing broad phase queries to search for nearby entities. If there is nothing nearby, they can be safely removed.

As the player moves around, it will require that previously removed objects are added back to the simulation. In this case, reverse the process: plop all needed static objects in first, and then plop all dynamic objects back in. To avoid a sudden performance hit due to a bunch of new active simulation elements, it would be a good idea to immediately force all the newly added objects (which were previously removed due to inactivity) to inactive.

My preferred solution tends to be 'just make BEPUphysics faster.' A big part of one of our own projects is a fairly immense multiplayer simulation which is going to require some significant work in this direction, as detailed on the version roadmap. There are limits to this approach, though

diabloqfdb · Post by **diabloqfdb** » Thu Feb 14, 2013 11:55 am

Thanks for the info! A lot to consider and design! And as a parallel low priority task I started looking over SharpDX. And researching differences to SlimDX.

I made a test and for 17000 objects, using StaticGroup and grouping those items in a 16x16 grid, where each grid has one StaticGroup.

The only problem is creation time. It is a requirement of the engine to start "instantly" with the bare minimum and quickly stream everything in. Without StaticGroup loading a 4 square kilometer map was very fast, with terrain physics creation + 17000 boulders done almost instantly and then level geometry and textures streamed in in a few seconds. Now with StaticGroup it takes minutes! And eats up enough RAM to crash the application. Without StaticGroup RAM was around 500 with 200 of that textures.

Also, when using StaticGroup, even with only a few hundred items, making a mobile entity collide with something from the group causes massive low performance spikes, even up to 200ms out of a second being spent on physics. This does not happen without StaticGroup.

I think I am trying things that are too ambitious.

Post by **Norbo** » Thu Feb 14, 2013 9:37 pm

Those performance numbers in relation to StaticGroups do not sound right. Check out the StaticGroupDemo for an example of its usage. For reference, on my 3770K, it takes 2-3 seconds to create 157500 boxes and 180000 instanced meshes and create a StaticGroup out of them. (Note: I disabled the debug graphics creation. Creating those is much slower and uses much more ram than the physics.) Further, dropping 1000 dynamic boxes on the StaticGroup averages around 4-6ms per time step (a brief spike to 7-12ms when the boxes pile up on the meshes at the start).

The memory allocated for the entire process by windows (the private working set, anyway) is still below 400 megabytes. I should stress, however, that this number isn't necessarily how much RAM the program is currently using, but it is an upper bound on all private allocations.

In addition, acceleration structure build times are O(nlogn). By splitting it into a grid of separate static groups, performance should be improved due to the nonlinear time. In other words, a mere 17000 objects split into a grid of StaticGroups should be constructed more than 20 times faster than the StaticGroup I created above- somewhere around 0.1 to 0.15 seconds. On top of that, the independent nature of the constructions would let you perform the hard work in parallel, trivially boosting it by another factor of available thread count. If you actually needed the performance, getting below 40ms on a 3770K for generating all the StaticGroups should be quite doable.

So..

I think I am trying things that are too ambitious.

I think you're well within feasible limits

diabloqfdb · Post by **diabloqfdb** » Fri Feb 15, 2013 11:36 am

Finally solved the issues! It was my fault. Had to fix bugs all over the engine! Thank you very much for you support!

Using StaticGroup reduced physics time considerably and currently there is no longer a reason to implement that complicated add/remove objects from space scheme. BEPUphysics is an absolute beast and if you use it correctly it can handle ridiculous amounts of work. It laughs at the 17000 boulders spread over 4km when using StaticGroup. It even handles it quite well without, but not quite good for interactive framerates. I'm actually very impressed, since it is in C#, which caries some overhead, especially when working with arrays.

I'm curious on how large the map should be so that issues start popping up again

.

Post by **Norbo** » Fri Feb 15, 2013 9:57 pm

Glad to hear it's working well!

If you want it to be even faster, consider giving throwing a vote over on connect for SIMD support in .NET

https://connect.microsoft.com/VisualStu ... ort-for-ne

BEPUphysics

InstancedModelDrawer eats tons of RAM

InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM

Re: InstancedModelDrawer eats tons of RAM