BufferPool problem

Discuss any questions about BEPUphysics or problems encountered.
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

BufferPool problem

Post by wanghongliang »

Hi,Norbo,thanks for your tremendous effort on the Bepu engine used in my game project that involves a lot of physical entities. Recently I upgraded the project(based on Bepu 2.0,.Net Core 3.0) to Bepu 2.1 and .Net Core 3.1, and afterwards System.ExcecutionEngineException had always been thrown whenever Timestep method of simulation instance was called. After an exhaustive debugging, I tried to set the MinimumBlockAllocationSize parameter of the constructor method of the BufferPool class to 131072 * 16, and everything works fine now. I think this is weird because BufferPool should automatically
resize when needed. Could you please tell me why?
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

That's pretty odd. ExecutionEngineException means 'something has gone terribly wrong', and shouldn't happen under anything close to normal circumstances. Widespread memory corruption could do it.

I'd be concerned that bumping up the MinimumBlockAllocationSize is just hiding the true bug somehow. Can you reproduce the bug in the demos?
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

Thanks for quick reply. All demos work fine. My application involves more than 20,000 physical entities and my computer has sufficient memory(16G RAM) to run it. I implemented partially manual management in my c# app, maybe it conflicts with the Bepu memory management. I don't have a clue yet.
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

I've tried widely varying block allocation sizes on simulations from ~15000 to >300000 bodies, no luck reproducing so far.

Other forms of manual memory management shouldn't interfere with the BufferPools- they don't do anything particularly fancy, they just allocate blocks from the GC. The only concern would be things like multiple threads pulling from/returning to the same BufferPool at the same time. That'd definitely break stuff.
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

I try to test the PyramidDemo to locate the bug in my game app. I set MinimumBlockAllocationSize to defaut value. When I set pyramids count to 40(default value), my app works fine. If I increase pyramids up to 400, the weird thing happens as follows : when the demo is created and updated inside my game loop, it crashes; but when it's created and updated outside of the game loop(in an isolated context), it runs smoothly. So I guess there must be something wrong in my game logic, especially in the field of manual memory manangement. Any help would be greatly appreciated.
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

I'm afraid I can't be too much help- by the sounds of it, there is indeed some memory corruption going on elsewhere in the program. Best guess is that the simulation is just revealing corruption through stress testing, but the corruption is there either way. Likely some buffer getting overrun and stomping memory.
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

It seems as if I have solved the problem. I allow Bepu to use my DarkAllocator (which is capable of allocating managed arrays of blittable types to unmanaged memory) to engage in memory management. I have added the following implementations to Bepu:

public unsafe interface INativeAllocator
{
byte[] Allocate(int bytesCount, out byte* pointer);
void Free(ref byte[] array);
}

public static class NativeMemory
{
private static INativeAllocator allocator;

public static void Register(INativeAllocator allocator)
{
NativeMemory.allocator = allocator;
}

public unsafe static byte[] Allocate(int bytesCount, out byte* pointer)
{
return allocator.Allocate(bytesCount, out pointer);
}

public static void Free(ref byte[] array)
{
allocator.Free(ref array);
array = null;
}
}


and the following to my game app:

public sealed class NativeAllocator : INativeAllocator
{
public unsafe byte[] Allocate(int bytesCount, out byte* pointer)
{
var array = DarkAllocator<byte>.Alloc(bytesCount, zeroMemory:true);
pointer = (byte*)Unsafe.AsPointer(ref array[0]);
return array;
}

public void Free(ref byte[] array)
{
DarkAllocator.Free(ref array);
}
}

Now, everything works like a charm. No crash any more and passed the stress testing with 300,000 bodies. Besides, Bepu has no need to pin and unpin memory and is able to run in a more deterministic style. I know this is risky, but I still hope Bepu will support custom memory management one day. Thanks again!
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

I do intend to eventually open it up, expanding usage of the IUnmanagedMemoryPool:
https://github.com/bepu/bepuphysics2/bl ... oryPool.cs

It's already used for Quick* collections, and I intend to eventually move many ephemeral allocations to more purpose-built allocators.

However... I remain concerned. There's nothing about the interaction of two different allocators that should cause an ExecutionEngineException.

You mentioned allocating managed arrays of blittable types to unmanaged memory- do you mean there's actually a managed array type being directly allocated in unmanaged memory? That may very well cause an ExecutionEngineException under some conditions; it's happened when I've accidentally done similar.
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

Yes, my DarkAllocator do directly allocate a managed array type (and other class types that contain blittable structs only) into unmanaged memory to avoid garbage collection. I have extended SharpDX to implement a custom heap allocator (SharpDX.Native) that takes charge of most memory allocations of my app.Here is the DarkStructArrayAllocator (partially borrowed from other implementations):

[StructLayout(LayoutKind.Explicit)]
internal struct Array32
{
public const int SyncBlocSize = 4;
public const int HeaderSize = 12;

[FieldOffset(0)]
public IntPtr SyncBlockIndex;

[FieldOffset(4)]
public IntPtr MethodTablePointer;

[FieldOffset(8)]
public int Length;
}



[StructLayout(LayoutKind.Explicit)]
internal struct Array64
{
public const int SyncBlocSize = 8;
public const int HeaderSize = 24;

[FieldOffset(0)]
public IntPtr SyncBlockIndex;

[FieldOffset(8)]
public IntPtr MethodTablePointer;

[FieldOffset(16)]
public long Length;
}



internal static class ArrayHeaderHelper<TElement>
{
private static readonly Func<IObjectHeader> _headerExtractor = GenerateHeaderExtractor();

public static IObjectHeader CaptureHeader()
{
return _headerExtractor.Invoke();
}

private static Func<IObjectHeader> GenerateHeaderExtractor()
{
var headerType = Environment.Is64BitProcess ? typeof (ObjectHeader64) : typeof (ObjectHeader32);
var arrayHeaderSize = Environment.Is64BitProcess ? Array64.HeaderSize : Array32.HeaderSize;

var dynamicMethod = CreateDynamicMethod();
var ilGenerator = dynamicMethod.GetILGenerator();

var pinnedElement = ilGenerator.DeclareLocal(typeof (byte*), true);

ilGenerator.Emit(OpCodes.Ldc_I4_1);
ilGenerator.Emit(OpCodes.Newarr, typeof (TElement));
ilGenerator.Emit(OpCodes.Ldc_I4_0);
ilGenerator.Emit(OpCodes.Ldelema, typeof (TElement));
ilGenerator.Emit(OpCodes.Stloc, pinnedElement);

ilGenerator.Emit(OpCodes.Ldloc, pinnedElement);
ilGenerator.Emit(OpCodes.Ldc_I4, arrayHeaderSize);
ilGenerator.Emit(OpCodes.Sub);

ilGenerator.Emit(OpCodes.Ldobj, headerType);
ilGenerator.Emit(OpCodes.Box, headerType);
ilGenerator.Emit(OpCodes.Ret);

return (Func<IObjectHeader>)dynamicMethod.CreateDelegate(typeof(Func<IObjectHeader>));
}

private static DynamicMethod CreateDynamicMethod()
{
var returnType = typeof (IObjectHeader);
var parameterTypes = Type.EmptyTypes;
var ownerType = typeof (ArrayHeaderHelper<TElement>);

return new DynamicMethod("CaptureArrayHeaderOf" + typeof (TElement[]).Name, MethodAttributes.Static | MethodAttributes.Public, CallingConventions.Standard, returnType, parameterTypes, ownerType, true);
}
}


public static unsafe class DarkStructArrayAllocator<T> where T : unmanaged
{
private static int sizeOfElement = sizeof(T);
private static bool is64Bit = Environment.Is64BitProcess;
private static int pointerSize = IntPtr.Size;

private static readonly IObjectHeader _arrayHeader = CaptureArrayHeader();

[MethodImpl(Performance.MaxOptimization)]
public static T[] Alloc(int length, bool zeroMemory = true)
{
var nativeAllocatedArraySize = ComputeStackAllocationSize(length);

void* nativeAllocatedArray;
if (nativeAllocatedArraySize <= 4096)
nativeAllocatedArray = SharpDX.Pool.Allocate(nativeAllocatedArraySize);
else
nativeAllocatedArray = SharpDX.Native.Allocate(nativeAllocatedArraySize, zeroMemory);

var nativeAllocatedArrayObject = GetArrayObject(nativeAllocatedArray, length);

return nativeAllocatedArrayObject;
}

[MethodImpl(Performance.MaxOptimization)]
private static int ComputeStackAllocationSize(int arrayLength)
{
var headerSize = is64Bit ? Array64.HeaderSize : Array32.HeaderSize;
return headerSize + (arrayLength * sizeOfElement);
}

[MethodImpl(Performance.MaxOptimization)]
private static T[] GetArrayObject(void* nativeAllocatedArray, int arrayLength)
{
if (is64Bit)
{
var array64Pointer = (Array64*)nativeAllocatedArray;
array64Pointer->SyncBlockIndex = _arrayHeader.SyncBlockIndex;
array64Pointer->MethodTablePointer = _arrayHeader.MethodTablePointer;
array64Pointer->Length = arrayLength;
var array64ObjectPointer = ((byte*)array64Pointer) + Array64.SyncBlocSize;

return SharpDX.GenericOperator.PtrToObj<T[]>(array64ObjectPointer);
}
else
{
var array32Pointer = (Array32*)nativeAllocatedArray;
array32Pointer->SyncBlockIndex = _arrayHeader.SyncBlockIndex;
array32Pointer->MethodTablePointer = _arrayHeader.MethodTablePointer;
array32Pointer->Length = arrayLength;
var array32ObjectPointer = ((byte*)array32Pointer) + Array32.SyncBlocSize;


return SharpDX.GenericOperator.PtrToObj<T[]>(array32ObjectPointer);
}



}

[MethodImpl(Performance.MaxOptimization)]
private static IObjectHeader CaptureArrayHeader()
{
if (is64Bit)
return ArrayHeaderHelper<T>.CaptureHeader();

return ArrayHeaderHelper<T>.CaptureHeader();
}


public static void Free(ref T[] array)
{
byte* ptr = (byte*)SharpDX.GenericOperator.ObjToPtr(array);

if (is64Bit)
ptr -= Array64.SyncBlocSize;
else
ptr -= Array32.SyncBlocSize;

SharpDX.Native.Free(ptr);

array = null;
}



}


SharpDX.GegenicOperator.ObjToPtr and PtrToObj methods are implemented using IL weaving:

private void CreatePtrToObj(MethodDefinition method)
{
method.Body.Instructions.Clear();

var IL = method.Body.GetILProcessor();

IL.Emit(OpCodes.Ldarg_0);
IL.Emit(OpCodes.Ret);
}

private void CreateObjToPtr(MethodDefinition method)
{
method.Body.Instructions.Clear();

var IL = method.Body.GetILProcessor();

IL.Emit(OpCodes.Ldarg_0);
IL.Emit(OpCodes.Ret);
}
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

Well, that's certainly an approach :D

Putting aside the implementation dependencies (specific offsets and whatnot), it's hard for me to say anything conclusive. Storing a managed reference to an instance sneakily allocated in unmanaged memory definitely does trigger ExecutionEngineExceptions at least under some conditions (it's easy to intentionally trigger), so all I can say with certainty is "here there be dragons".
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

Yes, What I have done is evil and dangerous. But I think Bepu could be based mostly on unmanaged memory management while keeping a small number of managed objects. In doing so, Bepu should gain some performance improvements and
be more deterministic. In fact, Unity Engine is encouraging developers to use NativeArray<T> as much as possible. 8)
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

Unfortunately, it would not make it any faster, nor make it more deterministic, because bepuphysics2 already uses a custom allocator, uses very few managed instances, and does not generate garbage :)

The underlying pinned block allocations come from the GC large object heap as an implementation detail, but it does not trigger garbage collections on its own and doesn't carry any runtime overhead. The overall heap complexity is extremely low and does not scale with the number of bodies, so collections triggered by the user's application systems remain quick.
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

Thank you very much for the explanation. You deserve a microsoft MVP! :)
wanghongliang
Posts: 18
Joined: Sun May 05, 2019 2:01 pm

Re: BufferPool problem

Post by wanghongliang »

Have upgraded to Bepu 2.2. Still using DarkAllocator. Everything works fine. I think it would be convenient to implement
implicit casting between int and BodyHandle(or StaticHandle). Thanks for your assiduous work.
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: BufferPool problem

Post by Norbo »

I've gone back and forth on that a bit. The value of implicit conversions is silently doing things that should work, but converting a handle to an integer loses the type information that is the entire point of the type. When you intend to do that, it's perfectly fine, but implicit conversions don't do a good job of signaling that kind of intent. An explicit cast would be closer, though the equivalent functionality already exists through other means (they're just structs with an exposed int). For now I'm just letting it go for a while and seeing how users interact with it.
Post Reply