Something doesn't seem right. While a capsule special case would be faster, the general convex case should not be that much slower than the sphere-sphere case. I have avoided adding the capsule special cases because the performance was good enough for most purposes.
I modified the TerrainDemo to test the performance:
Code: Select all
//x and y, in terms of heightmaps, refer to their local x and y coordinates. In world space, they correspond to x and z.
//Setup the heights of the terrain.
int xLength = 256;
int zLength = 256;
float xSpacing = 8f;
float zSpacing = 8f;
var heights = new float[xLength, zLength];
for (int i = 0; i < xLength; i++)
{
for (int j = 0; j < zLength; j++)
{
float x = i - xLength / 2;
float z = j - zLength / 2;
heights[i, j] = (float)(10 * (Math.Sin(x / 8) + Math.Sin(z / 8)));
}
}
//Create the terrain.
var terrain = new Terrain(heights, new AffineTransform(
new Vector3(xSpacing, 1, zSpacing),
Quaternion.Identity,
new Vector3(-xLength * xSpacing / 2, 0, -zLength * zSpacing / 2)));
Space.Add(terrain);
for (int i = 0; i < 20; i++)
{
for (int j = 0; j < 20; j++)
{
CompoundBody cb = new CompoundBody(new CompoundShapeEntry[]
{
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(0, 0,0)),
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(2, 0,0)),
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(4, 0,0)),
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(6, 0,0)),
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(8, 0,0)),
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(10, 0,0)),
new CompoundShapeEntry(new CapsuleShape(2, 1), new Vector3(12, 0,0)),
}, 10);
cb.Position = new Vector3(i * 15, 20, j * 4);
//CompoundBody cb = new CompoundBody(new CompoundShapeEntry[]
//{
// new CompoundShapeEntry(new SphereShape(1), new Vector3(0, 0,0)),
// new CompoundShapeEntry(new SphereShape(1), new Vector3(2, -2,0)),
// new CompoundShapeEntry(new SphereShape(1), new Vector3(4, 0,0)),
// new CompoundShapeEntry(new SphereShape(1), new Vector3(6, -2,0)),
// new CompoundShapeEntry(new SphereShape(1), new Vector3(8, 0,0)),
// new CompoundShapeEntry(new SphereShape(1), new Vector3(10, -2,0)),
// new CompoundShapeEntry(new SphereShape(1), new Vector3(12, 0,0)),
//}, 10);
//cb.Position = new Vector3(i * 15, 20, j * 4);
cb.ActivityInformation.IsAlwaysActive = true;
Space.Add(cb);
}
}
The capsules are organized such that when they settle, the maximum number of contacts is being created between them and the terrain (side by side and laying flat). The spheres are offset to prevent rolling.
For 400 such compounds on my (old) computer:
-Capsule version takes ~22 milliseconds per time step.
-Sphere version takes ~12 milliseconds.
(The objects are close enough that a significant number of compound-compound collisions also occur. In both cases, the total number of top-level collision pairs is similar (~750).)
For 49 such compounds on my Samsung Focus:
-Capsule version takes ~100 milliseconds per time step.
-Sphere version takes ~43 milliseconds.
For 7 such compounds on my Samsung Focus:
-Capsule version takes ~13 milliseconds per time step.
-Sphere version takes ~6 milliseconds.
These results can be explained relatively easily by three factors in no particular order:
-Sphere shapes can only generate one contact with an individual triangle within a mesh or with other spheres. Capsules generate more and need more. More contacts means more contact constraints, which means longer solving times.
-Sphere shapes have a special narrow phase case for both meshes and other spheres which is faster than the general case.
-The capsules are quite a bit larger (length 2, radius 1) than the spheres (radius 1), so they will be tested against more triangles.
To accentuate testing against many triangles, the triangle density is increased by 64x (1x1 triangles instead of 8x8 triangles). For 7 compounds on my Samsung Focus:
-Capsule version takes ~37 milliseconds.
-Sphere version takes ~9.2 milliseconds.
A sphere might be within the bounding box of 18 triangles at once with this density while a capsule may be within the bounding box of up to 50 at once (though 30-40 is more common in this test).
To isolate this effect, degenerate capsules (length 0, radius 1) are tested with the 1x1 triangles. For 7 compounds on my Samsung Focus:
-LongCapsule version takes ~37 milliseconds.
-SphereyCapsule version takes ~17 milliseconds.
-Sphere version takes ~9.2 milliseconds.
Quite an improvement! Now, back to 8x8 triangles again. For 7 compounds on my Samsung Focus:
-LongCapsule version takes ~13 milliseconds.
-SphereyCapsule version takes ~7.6 milliseconds.
-Sphere version takes ~6 milliseconds.
One final test, back on the PC. With 8x8 triangles and 400 compounds on my (old) computer:
-LongCapsule version takes ~22 milliseconds.
-SphereyCapsule version takes ~14 milliseconds.
-Sphere version takes ~12 milliseconds.
Comparing against the earlier long capsules on 8x8 triangles, it appears that the performance difference is largely (sometimes mostly) caused by overlapping more triangles as opposed to the narrow phase algorithms themselves.
So, from this, the general case collision detection system isn't significantly slower than the sphere special cases overall. Somewhere around a factor of 2 overall frame time, at the high end, for comparable shapes. (This matches expectations from earlier tests: the box-box special case is not significantly faster than the general case. In fact, with some tuning in some situations, the general case can sometimes beat the box-box special case by a tiny amount. However, box-box is very robust, well-behaved in numerical extremes, and handles sliding contacts better, so I picked it anyway.)
Capsule special cases would be slower than sphere special cases, so the benefit would be lower than the factor of 2 or less found above.
My Xbox isn't currently easily accessible for testing, but it should produce consistent results.
Long story short: 40 ms for capsules versus 1ms for spheres smells like shenanigans
I am thinking maybe the performance cost is a byproduct of the compound bodies but I need to do more testing to find out for sure.
If you can reproduce the performance loss in isolation (like in the BEPUphysicsDemos), I could take a look. I don't have any good theories at the moment, though
