It sounds like there's a pretty massive number of StaticMeshes in the simulation.
Removal of objects from some systems can take linear time. If there's many thousands of objects and multiple removes happening each frame, that will balloon the cost.
As you can See, for some reason the Broadphase is taking up a huge amount of time, even once the level has finished loading at no more StaticMesh objects are being created!
The BroadPhase is a unified, dynamic, general purpose acceleration structure. When there's thousands of BroadPhaseEntry objects in the BroadPhase, the acceleration structure is pretty big. As the BroadPhase has no guarantees about what each object is, it must traverse the structure to find overlaps even if those overlaps turn out to be between two static objects which cannot generate a collision pair with each other.
Any suggestions for how I can improve performance here?
Try to avoid this broad phase pollution. There can be hundreds of thousands of triangles in a StaticMesh, but the BroadPhase only sees a single StaticMesh. The StaticMesh's internal acceleration structure can make assumptions about its contents. For example, the triangles can never collide with each other, so there is no need to perform 'internal' tests. Similarly, a special structure could be created which contains the data which is now stored across thousands of StaticMeshes.
The best approach to reduce broad phase pollution depends on the simulation.
Blockworld games typically benefit hugely from a custom 'voxel grid' collidable which can perform collision based directly on its 3D grid of cubes. The grid provides an implicit data structure that makes modifications possible in constant time and offers a very speedy query to identify which cubes are involved in a collision pair.
For a game with less guarantees about regular geometry, it might be good to create what amounts to an intermediate acceleration structure between the broad phase and static meshes. Some kind of simple hash grid would do the trick if modifications are common (and especially if the geometry is aligned to a grid at any level).
The more assumptions you can build into the structure, the better.
Implementing such a custom structure would amount to creating a new child of the Collidable class, alongside Terrain, StaticMesh, and InstancedMesh. This class would contain the acceleration structures and query support needed to check for colliding geometry and raycasts. Then, create CollidablePairHandler types which handle the collision between the collidable and other objects. Give those pair handler types to the NarrowPhaseHelper.
A more in-depth description of this process can be found in my last post on this thread about a blockworld game:
viewtopic.php?f=4&t=1383
If a "Group of actual StaticMeshes" collidable was used, it will probably end up creating subpairs. The GroupPairHandler (inherited by compound-related pairs) and related classes shows an example of creating subpairs using the NarrowPhaseHelper.