This took me a good bit of time to figure out, so I'm posting here. Not sure if it is an issue with Bepu, or user error.
I have a bunch of dynamically created entities that are created through user input. I noticed that these entities were occasionally stalling my entire physics thread. It was a very rare stall, and hard to reproduce. When I was able to reproduce it, it always was occurring in a particular area of my level. Strange.
It turns out that the stall was occurring when the dynamically created entities were falling onto geometry that wasn't tessellated enough. I opened up Maya, sub-divided the problematic areas to create more triangles, re-exported, and problem solved.
Physics Stalling Issue
Re: Physics Stalling Issue
That's pretty strange; I'm not sure what the mesh would have to do with it. When adding an entity though, a lock is acquired that keeps entities from being added/removed in the middle of a space update. If a physics frame lasts more than a few milliseconds and an entity add request occurs in the middle of it, the entity adding thread can be blocked noticeably. A good solution to this problem is to enqueue entity additions to an intermediate queue that the physics thread will briefly lock and read from so that the main thread never has to wait for the duration of a space update. I'd like to have a solution for this internally; hopefully it'll make it into v0.10.0.
Do the stalls take place after all the entities are added to the space?
Do the stalls take place after all the entities are added to the space?
Re: Physics Stalling Issue
Yes. All the entities are added to the space @ the beginning of the simulation. When an entity is requested, our pooling system returns and entity that is not currently being used. This entity then has it's data updated via the thread-safe accessors.
Re: Physics Stalling Issue
How long were the stalls, and once you found the mesh issue, were you able to reproduce them reliably? Did you happen to try using the PersistentUniformGrid broadphase type instead of DynamicBinaryHierarchy (or vice versa)? Are there any other queries going on, specifically things like space.broadphase.getEntities or maybe something with the mesh?
I don't have any definite idea of what is happening, but my best guess is that one of these queries might be locked out due to the space update.
I don't have any definite idea of what is happening, but my best guess is that one of these queries might be locked out due to the space update.
Re: Physics Stalling Issue
The stalls lasted from 3 seconds to indefinite, and I was able to repro them reliably after spawning anywhere from 2 to 20 objects. Now that the mesh is fixed, I haven't been able to repro at all (thank goodness ). I'm currently using the PersistentUniformGrid heuristic, didn't try with DBH. There were no other queries going on, other than position/velocity/momentum type queries.
I was able to get a look at the call stack after one of the stalls, and it was in Triangle.getExtremePoint(), at a call to Vector3.Transform.
My dynamic entities are boxes, so I originally thought I was reading/writing positional data in a non-threadsafe manner. But then when I saw it was getting stuck in a Triangle method, this is what lead me to the solution of the complex mesh tesselation.
Edit: It has also occurred in a different location in the Bepu DLL, but I don't remember the exact location.
I was able to get a look at the call stack after one of the stalls, and it was in Triangle.getExtremePoint(), at a call to Vector3.Transform.
My dynamic entities are boxes, so I originally thought I was reading/writing positional data in a non-threadsafe manner. But then when I saw it was getting stuck in a Triangle method, this is what lead me to the solution of the complex mesh tesselation.
Edit: It has also occurred in a different location in the Bepu DLL, but I don't remember the exact location.
Re: Physics Stalling Issue
Well, hopefully you won't see it again! That stalling time duration pretty much destroys my thread-based theories since any non-deadlock would be milliseconds and a deadlock would never end.
The only other thing I can think of right now is some sort of geometry that the system didn't like. With the PersistentUniformGrid, inserting a large (or invalid) triangle could force the system to manage a huge (or infinite) number of cells, possibly causing the stall. Subdividing the mesh may have fixed the mesh or allowed the system to process it correctly. This doesn't explain the callstack, though. A hang in Vector3.Transform is... weird.
The only other thing I can think of right now is some sort of geometry that the system didn't like. With the PersistentUniformGrid, inserting a large (or invalid) triangle could force the system to manage a huge (or infinite) number of cells, possibly causing the stall. Subdividing the mesh may have fixed the mesh or allowed the system to process it correctly. This doesn't explain the callstack, though. A hang in Vector3.Transform is... weird.