Static Mesh construction speed

mcmonkey · Post by **mcmonkey** » Mon Jun 22, 2015 2:00 am

So I need to generate a ton of static meshes on the fly.

My current meshes have roughly 5400 vertices... (30 * 30 * 6 * 6) but can end up with way more than that.

I need some efficient way to generate StaticMesh's with vertex counts in this range. I know it's possible - all calculations that generate the mesh in the first place take milliseconds, and I could do this without a physics engine in milliseconds as well)

The problem is, it takes ~200 MS to generate each and every static mesh instance. I need at least below 50 ms. Ideally, below 16 ms.

If it helps, I don't need any transformations - the vertex positions are final world positions. Everything about it is very static.

What exactly happens when creating a StaticMesh?

Perhaps, whatever data it generates, I can generate with my own code rather than generating a vertex list and get some better efficiency that way?

...

Worst case scenario, is there a tutorial on writing my own static object with its own collision tracing code? I can handle a simple ray->my object sweep as well as an AABB -> my object sweep. No idea how I'd do the rest though...

Post by **Norbo** » Mon Jun 22, 2015 3:11 am

My current meshes have roughly 5400 vertices... (30 * 30 * 6 * 6) but can end up with way more than that.
...
The problem is, it takes ~200 MS to generate each and every static mesh instance. I need at least below 50 ms. Ideally, below 16 ms.

Something sounds odd here. The playground model from the BEPUphysicsDemos is 20730 triangles and only takes ~25 milliseconds to build on my desktop.

If it's running on a phone, 200ms might be more reasonable. If you see that kind of time on the desktop, something might be wrong. Does the content contain a bunch of degenerate triangles, or anything else strange? Maybe a ton of triangles all in one spot overlapping?

What exactly happens when creating a StaticMesh?

Perhaps, whatever data it generates, I can generate with my own code rather than generating a vertex list and get some better efficiency that way?

The cost associated with constructing a StaticMesh is in almost entirely in creating the acceleration hierarchy. If you are able to pull in already-built acceleration hierarchies (if these chunks are known ahead of time), that would speed it up a huge amount. Otherwise, you'd just have to build a faster acceleration hierarchy builder- which is very possible, but not easy. (Incidentally, BEPUphysics v2 will have that.)

Worst case scenario, is there a tutorial on writing my own static object with its own collision tracing code? I can handle a simple ray->my object sweep as well as an AABB -> my object sweep. No idea how I'd do the rest though...

That would be very difficult. While you could reuse the most complicated parts (e.g. actual low level narrow phase tests and triangle bounding smoothing) from other pair handlers, you would still need to create your own acceleration structures and hook in the custom pair handlers. Avoid this if you can.

I should also mention the standard solution for this kind of problem: just toss it on some background thread and let it work without blocking the game thread. It slightly complicates the setup, but it's usually worth it to avoid frame hitches. (Of course, it won't work if your computer simply doesn't have enough throughput for the number of meshes being built.)

mcmonkey · Post by **mcmonkey** » Mon Jun 22, 2015 4:02 am

Maybe a ton of triangles all in one spot overlapping?

... I did not /intend/ to have such a thing, no >.>
I, uh... /did/ have that, however. I forgot to add a variable to the vertex locations, so it was spawning tons of triangles in roughly the same spot.

Fixing that ends the ridiculous times, thanks... but now I'm curious /why/ that caused lag.

I'd like to learn more about how things work, if you're willing to spend some time explaining:

acceleration hierarchy

What exactly is that? What's it do? How is it built? How would I input a custom one? (I have a /ton/ of information pre-available about the object I'm building - EG it's general AABB limits and so on... perhaps I could generate this hierarchy more quickly than manually reading through the full vertex list could?)

I should also mention the standard solution for this kind of problem: just toss it on some background thread and let it work without blocking the game thread. It slightly complicates the setup, but it's usually worth it to avoid frame hitches. (Of course, it won't work if your computer simply doesn't have enough throughput for the number of meshes being built.)

I'm still unclear on BEPU's threading model.
- If I wanted to spawn an object in a space, would I need to lock on anything, or could I safely do that from any given thread? What about building the object (new StaticMesh() type lines)?
- viewtopic.php?t=1028 <-- this post implies that there is no multithreading by default, and has to be enabled manually with a small bit of code (written out in the link) - is this correct?
- Does the threading code merely do something along the lines of / similar to "Parallel.ForEach" inside the update method - IE, running all the updates in parallel instead of one-by-one, or is there more to it than that? (Does it accelerate ConvexCast? ...? )

Post by **Norbo** » Tue Jun 23, 2015 2:37 am

What exactly is that? What's it do? How is it built?

It's a structure designed to accelerate queries by pruning out all of the irrelevant objects.

Consider a mesh with 100,000 triangles. Without any acceleration structure, you'd have to test each of the 100,000 triangles in sequence. That would be very, very slow.

Instead, those 100,000 triangles could be put into a bounding volume hierarchy. If a ray hits the root node, test the ray against the root's children. If it hits root.ChildA but not root.ChildB, intersect root.ChildA and ignore root.ChildB. Now, do it again for the children of root.ChildA, and so on. Each step has a high chance of eliminating a huge number of potential intersections. So, instead of doing 100,000 tests, it only takes around log(n) ray-bounding box tests (i.e. probably less than 100, and as low as 15-20).

That sort of tree is what the StaticMesh uses. The construction process is extremely simple at the moment- it just takes each object and tries to greedily insert it into the tree. That insertion process traverses the tree, taking the path down to the leaf nodes which fits the new object the best. This sort of construction, being simple and O(nlog(n)) in non-pathological cases, tends to be very quick relative to heavier/'smarter' constructions like SAH optimization.

Binary bounding box hierarchies are certainly not the only kind of acceleration structure, but they're pretty common. Other options include sort and sweep/sweep and prune, uniform grids, quad/octrees, k-d trees, simd-friendly N-ary trees, and a whole lot more. Each has its own strengths, weaknesses, and peculiarities- for example, sweep and prune is often used by physics engines for the broadphase thanks to good collision pair detection performance, but it doesn't provide great ray query acceleration and is vulnerable to a few corner cases.

How would I input a custom one? (I have a /ton/ of information pre-available about the object I'm building - EG it's general AABB limits and so on... perhaps I could generate this hierarchy more quickly than manually reading through the full vertex list could?)

If you wanted to make your own acceleration hierarchy from scratch, you would basically be making your own version of the StaticMesh. It's possible, but difficult as mentioned before. It requires a lot of familiarity with the internals of the engine. If you're curious, you can look up a few of the data structures I mentioned up above.

I'm curious /why/ that caused lag.

The heuristics used to build the acceleration tree have some weaknesses. Specifically, it's possible to make the tree highly unbalanced. Rather than having O(log(n)) accesses to reach a leaf node, it's possible to end up with a tree that's more like a long line where it may take O(n) accesses. The existing tree construction is insertion-based, so the longer the path to the desired position is, the longer the construction takes. In the pathological case of one long line, that would take O(n^2) instead of O(nlog(n)). Most likely, this is what you encountered.

A better heuristic wouldn't have this sort of weird corner case. I'll probably fix it in the rewrite.

If I wanted to spawn an object in a space, would I need to lock on anything, or could I safely do that from any given thread? What about building the object (new StaticMesh() type lines)?

An object can be constructed on any thread at any time.

Only one thread at a time should make any changes to the Space (which includes adds/removes), and no changes should occur while the Space is updating. Further, be careful about reading from Space-handled data while the Space is updating. (This is technically more strict that it has to be, but it's a good rule of thumb.)

viewtopic.php?t=1028 <-- this post implies that there is no multithreading by default, and has to be enabled manually with a small bit of code (written out in the link) - is this correct?

That's a bit old. All you have to do now is pass the Space constructor an IParallelLooper that has a ThreadCount of 2 or more.

- Does the threading code merely do something along the lines of / similar to "Parallel.ForEach" inside the update method - IE, running all the updates in parallel instead of one-by-one, or is there more to it than that? (Does it accelerate ConvexCast? ...? )

The engine's Update is essentially a series of for loops, and parallelizing it is indeed running those for loops in parallel. Of course, there's a lot of trickiness involved to make that work efficiently.

The multithreading only affects the Space.Update's stages. So things like the BroadPhase, NarrowPhase, and Solver are all multithreaded, but an individual ray cast or convex cast is not. There's no real way to efficiently multithread something so tiny due to overhead. Of course, if there are a bunch of casts to perform, a parallel loop across all the casts would provide a great speedup.

BEPUphysics

Static Mesh construction speed

Static Mesh construction speed

Re: Static Mesh construction speed

Re: Static Mesh construction speed

Re: Static Mesh construction speed