StaticMesh Constructor Overhead
-
- Posts: 32
- Joined: Sat Jan 19, 2013 8:20 pm
StaticMesh Constructor Overhead
I've noticed that my game has slown down considerably when the terrain (deformable; created from blocks) mesh is recreated. I've been experiencing drops of ~10-20 FPS whenever the game rebuilds the terrain (which is divided into 16x128x16 chunks).
Essentially, what I do is a simple occlusion test; I loop through all the blocks, and if a block has a neighbor to one side, I won't build that face. After running this algorithm, I end up with a list of indices, a list of vertices, and a list of the positions of those vertices (for BEPU). The average vertex count is approximately 3,000,000 to 4,000,000. Then, I remove the old mesh if it exists, and I replace it with the new mesh.
However, I've pinpointed that the constructor, StaticMesh(Vector3[] vertices, int[] indices), is giving me some considerable overhead. Is there any way I can fix this, or am I just going to have to work around it?
Essentially, what I do is a simple occlusion test; I loop through all the blocks, and if a block has a neighbor to one side, I won't build that face. After running this algorithm, I end up with a list of indices, a list of vertices, and a list of the positions of those vertices (for BEPU). The average vertex count is approximately 3,000,000 to 4,000,000. Then, I remove the old mesh if it exists, and I replace it with the new mesh.
However, I've pinpointed that the constructor, StaticMesh(Vector3[] vertices, int[] indices), is giving me some considerable overhead. Is there any way I can fix this, or am I just going to have to work around it?
Re: StaticMesh Constructor Overhead
Meshes require acceleration structures for speedy collision detection. Creating a new StaticMesh requires that an acceleration structure be created. That's where the CPU time goes.
If the modification just amounts to scooting some vertices around without adding or removing any, the mesh's acceleration structure can just be refit instead. Calling refit on the mesh's hierarchy is many times faster than rebuilding the full acceleration structure. Refit will not produce valid results if the topology of the mesh has changed.
Mutable terrains, as commonly found in blockworld games, often require topology changes and thus hierarchy rebuilds. There's no way around this fact when using the hierarchy-based mesh types.
The ideal solution, disregarding development effort, would be a specialized collidable built for the mutable terrain that operates with more guarantees than the arbitrary mesh types have. This can reduce the overhead associated with changing to virtually nothing. Unfortunately, this also requires a relatively large amount of experience with the engine, the collision detection pipeline, and collision detection in general.
A more common 'hackier'/'practical' solution is to handle hierarchy rebuilds asynchronously, or in tiny deferred packets of work that can be managed within a single frame's budget.
If the modification just amounts to scooting some vertices around without adding or removing any, the mesh's acceleration structure can just be refit instead. Calling refit on the mesh's hierarchy is many times faster than rebuilding the full acceleration structure. Refit will not produce valid results if the topology of the mesh has changed.
Mutable terrains, as commonly found in blockworld games, often require topology changes and thus hierarchy rebuilds. There's no way around this fact when using the hierarchy-based mesh types.
The ideal solution, disregarding development effort, would be a specialized collidable built for the mutable terrain that operates with more guarantees than the arbitrary mesh types have. This can reduce the overhead associated with changing to virtually nothing. Unfortunately, this also requires a relatively large amount of experience with the engine, the collision detection pipeline, and collision detection in general.
A more common 'hackier'/'practical' solution is to handle hierarchy rebuilds asynchronously, or in tiny deferred packets of work that can be managed within a single frame's budget.
-
- Posts: 32
- Joined: Sat Jan 19, 2013 8:20 pm
Re: StaticMesh Constructor Overhead
Thanks for the info. For now, I've decided to just make my chunk size smaller (8x128x8).
Do you think it would be possible or complicated to merge the mesh regeneration process into my chunk rebuild process so that there would be only one loop? (And hence, negligible overhead)
Looking at the code:
The only part that I wouldn't be able to "port" over would be the length of the indices, as that is only known after generation.
Do you think it would be possible or complicated to merge the mesh regeneration process into my chunk rebuild process so that there would be only one loop? (And hence, negligible overhead)
Looking at the code:
Code: Select all
/// <summary>
/// Reconstructs the tree based on the current data.
/// </summary>
public void Reconstruct()
{
root = null;
for (int i = 0; i < data.indices.Length; i += 3)
{
//Use a permuted version of the triangles instead of the actual triangle list.
//Permuting makes the input basically random, improving the quality of the tree.
Insert((int)(((982451653L * (i / 3)) % (data.indices.Length / 3)) * 3));
}
}
Re: StaticMesh Constructor Overhead
An index count is not fundamentally required; it is just used for permutation purposes. It isn't necessary for correctness to scramble the inputs, either. Coming up with some ordering that doesn't result in pathological trees given a 'malicious' dataset would be helpful in the worst case, though.
If it fits the structure of your update, extracting the incremental construction is a fine thing to do. It shouldn't introduce huge amounts of extra complexity as most of the system only cares about the end result, not how it is computed.
There is one possible issue down the road: there's a not-insignificant chance that a variety of the tree structures will be overhauled. Binding logic too tightly to the current internal implementation could cause problems if I end up replacing it all. Of course, this is likely months away, and there will likely be some way to salvage the old approach.
If it fits the structure of your update, extracting the incremental construction is a fine thing to do. It shouldn't introduce huge amounts of extra complexity as most of the system only cares about the end result, not how it is computed.
There is one possible issue down the road: there's a not-insignificant chance that a variety of the tree structures will be overhauled. Binding logic too tightly to the current internal implementation could cause problems if I end up replacing it all. Of course, this is likely months away, and there will likely be some way to salvage the old approach.
-
- Posts: 32
- Joined: Sat Jan 19, 2013 8:20 pm
Re: StaticMesh Constructor Overhead
Thanks for the reply.
After doing some profiling, I have come to the conclusion that I would be able to sacrifice some Update time in exchange for less mesh construction time (Update() function takes anywhere from 0.005 ms to 0.01 ms and Draw() takes 1-2 ms, which leaves me with out 14-15 ms for longer collision detection times). Are the BEPU acceleration structures constructed bottom-up? If so, I would be willing to sacrifice some quality in the trees for a faster tree generation time.
Do you think it's possible to reduce the mesh generation time to < 20ms ( < 10ms would be best, but it sounds like it'll either require extremely extremely optimized algorithms or be impossible) by sacrificing some tree quality and moving the computational time to the space.update() function?
After doing some profiling, I have come to the conclusion that I would be able to sacrifice some Update time in exchange for less mesh construction time (Update() function takes anywhere from 0.005 ms to 0.01 ms and Draw() takes 1-2 ms, which leaves me with out 14-15 ms for longer collision detection times). Are the BEPU acceleration structures constructed bottom-up? If so, I would be willing to sacrifice some quality in the trees for a faster tree generation time.
Do you think it's possible to reduce the mesh generation time to < 20ms ( < 10ms would be best, but it sounds like it'll either require extremely extremely optimized algorithms or be impossible) by sacrificing some tree quality and moving the computational time to the space.update() function?
Re: StaticMesh Constructor Overhead
The StaticMesh hierarchy is constructed incrementally by insertion. There are no quality variables to tune; it will cost roughly a fixed amount provided that it is given a nonmalicious data set.
So, reducing the total amount of computation it takes to create a hierarchy would require a fundamental shift in algorithm (or a more highly optimized implementation of the same algorithm, though it is unlikely that this would give a huge boost).
However, provided that it's okay for the player to wait a few frames for the structure to update, the insertion-based nature of the algorithm could be exploited to spread the computations over multiple frames. For example, only insert total/10 elements each frame. This would help prevent any one frame from exceeding its time budget, even though the total computation time is the same. Of course, this is very similar to just offloading the processing to a separate thread and waiting for it to complete, but there's a bit more predictability.
[Incidentally, in some research on real time path tracing, I'm using a slightly goofy algorithm to dynamically rebuild hierarchies from scratch in only a technically constant amount of time more than a hierarchy refit. It's completely parallel and extremely fast, but it produces fairly nasty trees by itself. Part of the work involves bundling such algorithms with better SAH-guided approaches and has produced somewhat promising results. This work will probably trickle down into new structures for BEPUphysics if it pans out.]
So, reducing the total amount of computation it takes to create a hierarchy would require a fundamental shift in algorithm (or a more highly optimized implementation of the same algorithm, though it is unlikely that this would give a huge boost).
However, provided that it's okay for the player to wait a few frames for the structure to update, the insertion-based nature of the algorithm could be exploited to spread the computations over multiple frames. For example, only insert total/10 elements each frame. This would help prevent any one frame from exceeding its time budget, even though the total computation time is the same. Of course, this is very similar to just offloading the processing to a separate thread and waiting for it to complete, but there's a bit more predictability.
[Incidentally, in some research on real time path tracing, I'm using a slightly goofy algorithm to dynamically rebuild hierarchies from scratch in only a technically constant amount of time more than a hierarchy refit. It's completely parallel and extremely fast, but it produces fairly nasty trees by itself. Part of the work involves bundling such algorithms with better SAH-guided approaches and has produced somewhat promising results. This work will probably trickle down into new structures for BEPUphysics if it pans out.]
-
- Posts: 32
- Joined: Sat Jan 19, 2013 8:20 pm
Re: StaticMesh Constructor Overhead
Is there any reason why it's constructed by insertion, and not, say, top-down?
Also, what can you explain what you mean by a "malicious" data set?
I'm actually currently using a thread (pseudocode):
In the build function, I simply generate the faces (removing occluded faces), which takes ~2 ms, and, as I mentioned before, creating the static mesh can take 40-50 ms.
Also, what can you explain what you mean by a "malicious" data set?
I'm actually currently using a thread (pseudocode):
Code: Select all
constructor()
{
thread t = new thread(buildchunks);
t.isbackgroundthread = true;
t.priority = low;
}
function buildchunks()
{
while(true)
{
// build all priority chunks
// priority chunks are neighboring chunks of the chunk that was modified.
// they are built first so that there aren't any gaps in the world created because of face occlusion.
Chunk c;
while((c = priorityQueue.dequeue()) != null)
{
c.Light();
c.Build();
}
// now scan all chunks within a radius of the player
for(int rx = -CHUNK_BUILD_RADIUS; rx <= CHUNK_BUILD_RADIUS; rx++)
{
// sweep from the center (the player's current chunk) and out
for(int rz = 0; rz <= CHUNK_BUILD_RADIUS; rz++)
{
Chunk c1 = ChunkManager.Get(player.currentchunk.chunkcoords + vector2(rx, rz));
Chunk c2 = ChunkManager.Get(player.currentchunk.chunkcoords + vector2(rx, -rz));
if(c1.Dirty)
{
c1.Light();
c1.Build();
}
if(c2.Dirty)
{
c2.Light();
c2.Build();
}
}
}
sleep(20); // sleep for 20 ms
}
}
Re: StaticMesh Constructor Overhead
Originally, it was top down. Construction was quite slow as a function of both algorithm and implementation. Later, I reworked the implementation and also created this insertion-based approach. The insertion based approach had significantly faster build times, with negligible loss in quality, so I just used it exclusively.Is there any reason why it's constructed by insertion, and not, say, top-down?
Of course, that is only a statement about those particular implementations, not about top-down vs insertion based methods in general. The next revision could end up being (at least partially) top down, just with a superior implementation.
There exist some algorithms such that, if you know how the algorithm works, you can design input that makes the algorithm perform exceptionally poorly. In this case, if a mesh had a very particular configuration and its triangles were added in a very particular order, the resulting tree could end up being horrible. Worse than brute force, even. Such a data set is 'malicious' because, for it to exist, it virtually requires exploitative intent.Also, what can you explain what you mean by a "malicious" data set?
Merely 'bad' insertion orders that happen by chance can emulate malicious data, though not quite to the same extreme.
Scrambling the input decouples insertion order from the final tree quality, vastly improving malicious or bad data sets that would have been terrible due to insertion order.
This is primarily a concern for the quality of the resulting tree, though it likely does affect the build time to some degree. I haven't fiddled with the insertion builder in quite a while, so I don't remember the corner case test results.
If it's already running on a separate thread and thus never blocking the main thread, it sounds like the main issue is reducing latency as opposed to needing to entirely eliminate it. In addition to any hierarchy construction speedups, up to 20 milliseconds of latency can be cut out by using synchronization primitives to signal the worker thread rather than a sleep-and-poll pattern.I'm actually currently using a thread (pseudocode):
-
- Posts: 32
- Joined: Sat Jan 19, 2013 8:20 pm
Re: StaticMesh Constructor Overhead
Good point about the threading thing. I also noticed that I was locking the static mesh constructor too, so I moved the lock to only when the mesh was being removed/added, and the slow down is a lot less now; I only drop from 60 to 55 now. For the time being, I guess this is all right.
I have noticed something weird though: the first time the chunk rebuilds, it's quite laggy (fps drops to 40-50). Is this because of BEPU, or some sort of VS debugging thing?
I have noticed something weird though: the first time the chunk rebuilds, it's quite laggy (fps drops to 40-50). Is this because of BEPU, or some sort of VS debugging thing?
Re: StaticMesh Constructor Overhead
That could be a number of things, but the most common is JIT compiling, or perhaps expanding allocations. JIT should not cause a persistent slowdown- it's a one shot cost for a given chunk of compiled code. If a bunch of memory pools are filling up for the first time, that could last a little longer.I have noticed something weird though: the first time the chunk rebuilds, it's quite laggy (fps drops to 40-50). Is this because of BEPU, or some sort of VS debugging thing?
However, it's strange that you're encountering significant slowdowns in apparent FPS while using an asynchronous worker to handle the processing. Unless whatever the main thread is doing requires every bit of the CPU and will bottleneck without it, that isn't expected behavior. There may be something else going on related to updates. Profilers would be handy here.
-
- Posts: 32
- Joined: Sat Jan 19, 2013 8:20 pm
Re: StaticMesh Constructor Overhead
Just a follow up in case anybody else encounters a problem like this in the future:
What I did to solve this was to simply down-res the physics mesh from 25 vertices (16 squares; 32 triangles) to 4 vertices. Though this causes some discrepancies from the actual rendered terrain, I've found that someone would have to pay extremely close attention to notice the difference. Doing this reduced the physics mesh build time from 60-70 ms to 10ms.
What I did to solve this was to simply down-res the physics mesh from 25 vertices (16 squares; 32 triangles) to 4 vertices. Though this causes some discrepancies from the actual rendered terrain, I've found that someone would have to pay extremely close attention to notice the difference. Doing this reduced the physics mesh build time from 60-70 ms to 10ms.
