Release only crash

Discuss any questions about BEPUphysics or problems encountered.
Post Reply
lidgren
Posts: 21
Joined: Mon May 27, 2019 6:28 pm

Release only crash

Post by lidgren »

Hi! First; thanks for creating this library; it's awesome!

However, I'm getting a pretty weird crash, in release only; debug works fine. The app throws Access violation reading location 0x0000000000000F21 and this is the callstack I'm seeing:
clrjit.dll!Compiler::fgAssignSetVarDef(GenTree * tree) Line 8983 C++
clrjit.dll!Compiler::fgMorphSmpOp(GenTree * tree, Compiler::MorphAddrContext * mac) Line 12387 C++
clrjit.dll!Compiler::fgMorphTree(GenTree * tree, Compiler::MorphAddrContext * mac) Line 15086 C++
clrjit.dll!Compiler::fgMorphStmts(BasicBlock * block, bool * lnot, bool * loadw) Line 15817 C++
clrjit.dll!Compiler::fgMorphBlocks() Line 16071 C++
clrjit.dll!Compiler::fgMorph() Line 17096 C++
clrjit.dll!Compiler::compCompile(void * * methodCodePtr, unsigned long * methodCodeSize, JitFlags * compileFlags) Line 4476 C++
clrjit.dll!Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned long * methodCodeSize, JitFlags * compileFlags, CorInfoInstantiationVerification) Line 6015 C++
clrjit.dll!Compiler::compCompile(CORINFO_METHOD_STRUCT_ * methodHnd, CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned long * methodCodeSize, JitFlags * compileFlags) Line 5360 C++
clrjit.dll!jitNativeCode(CORINFO_METHOD_STRUCT_ * methodHnd, CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned long * methodCodeSize, JitFlags * compileFlags, void * inlineInfoPtr) Line 6643 C++
clrjit.dll!CILJit::compileMethod(ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, unsigned int flags, unsigned char * * entryAddress, unsigned long * nativeSizeOfCode) Line 332 C++
[Managed to Native Transition]
> BepuPhysics.dll!BepuPhysics.ConstraintBatch.CreateNewTypeBatch(int typeId, BepuPhysics.Constraints.TypeProcessor typeProcessor, int initialCapacity, BepuUtilities.Memory.BufferPool pool) Line 107 C#
BepuPhysics.dll!BepuPhysics.ConstraintBatch.Allocate(int handle, ref int constraintBodyHandles, int bodyCount, BepuPhysics.Bodies bodies, int typeId, BepuPhysics.Constraints.TypeProcessor typeProcessor, int initialCapacity, BepuUtilities.Memory.BufferPool pool, out BepuPhysics.ConstraintReference reference) Line 139 C#
BepuPhysics.dll!BepuPhysics.Solver.AllocateInBatch(int targetBatchIndex, int constraintHandle, ref int bodyHandles, int bodyCount, int typeId, out BepuPhysics.ConstraintReference reference) Line 460 C#
BepuPhysics.dll!BepuPhysics.Solver.TryAllocateInBatch(int typeId, int targetBatchIndex, ref int bodyHandles, int bodyCount, out int constraintHandle, out BepuPhysics.ConstraintReference reference) Line 500 C#
BepuPhysics.dll!BepuPhysics.CollisionDetection.ContactConstraintAccessor<BepuPhysics.Constraints.Contact.Contact1, BepuPhysics.CollisionDetection.TwoBodyHandles, BepuPhysics.Constraints.Contact.Contact1AccumulatedImpulses, BepuPhysics.CollisionDetection.ContactImpulses1, BepuPhysics.CollisionDetection.ConstraintCache1>.FlushWithSpeculativeBatches<Lidgren.Physics.NarrowPhaseCallbacks>(ref BepuPhysics.CollisionDetection.UntypedList list, int narrowPhaseConstraintTypeId, ref BepuUtilities.Memory.Buffer<BepuUtilities.Memory.Buffer<ushort>> speculativeBatchIndices, BepuPhysics.Simulation simulation, BepuPhysics.CollisionDetection.PairCache pairCache) Line 163 C#
BepuPhysics.dll!BepuPhysics.CollisionDetection.NarrowPhase<Lidgren.Physics.NarrowPhaseCallbacks>.PendingConstraintAddCache.FlushWithSpeculativeBatches(BepuPhysics.Simulation simulation, ref BepuPhysics.CollisionDetection.PairCache pairCache) Line 170 C#
BepuPhysics.dll!BepuPhysics.CollisionDetection.NarrowPhase<Lidgren.Physics.NarrowPhaseCallbacks>.ExecutePreflushJob(int workerIndex, ref BepuPhysics.CollisionDetection.PreflushJob job) Line 183 C#
BepuPhysics.dll!BepuPhysics.CollisionDetection.NarrowPhase<Lidgren.Physics.NarrowPhaseCallbacks>.PreflushWorkerLoop(int workerIndex) Line 117 C#
Lidgren.Physics.Bepu2.dll!Lidgren.Physics.SimpleThreadDispatcher.DispatchThread(int workerIndex) Line 51 C#
I've stepped thru CreateNewTypeBatch() and the problem seems to be that what TypeBatches.AllocateUnsafely() returns is placed in bad location. Adding the ValidateUnsafeAdd() normally only performed only in debug does not help/trigger; putting a breakpoint there shows that Span does indeed have room for Count (in AllocateUnsafely)... yet when it returns, even trying to look at the value in the debugger will cause an access violation. OTOH the callstack seems to point to the jitter sooo... I'm a bit stumped.

It's a 100% repro on multiple computers. Any hints on what could be wrong, or actions I can take to work around this. Debug works decently well (which is a huge boon!) but Release needs to work eventually.

--michael
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Release only crash

Post by Norbo »

Hi! First; thanks for creating this library; it's awesome!
Glad you like it :)
It's a 100% repro on multiple computers. Any hints on what could be wrong, or actions I can take to work around this. Debug works decently well (which is a huge boon!) but Release needs to work eventually.
Historically, this kind of error has been associated with one of three things:
1) A regular ol' access violation that stomped the runtime's memory.
2) A bug in the JIT.
3) An abuse of undefined behavior that made the JIT barf.

The fact that debug mode doesn't catch any bad access and the error is highly consistent shifts the probability more towards #2 or #3. If it doesn't happen in release mode when optimizations are explicitly disabled, that would push it even further.

Also, there are a few different release profiles- does it occur on both Release and ReleaseStrip, or only on ReleaseStrip? ReleaseStrip removes local variable zeroing for some extra performance. That can lead to use-before-init bugs, though I'm not aware of any remaining and they typically didn't manifest as JIT barfs.

It's worth noting that I encountered more than a few JIT bugs during development. They're all fixed now, but if you happen to be targeting an older runtime, it may be that it's just one of those old bugs.

If you can reproduce it in the demos or in a minimal console application, I could take a closer look.
lidgren
Posts: 21
Joined: Mon May 27, 2019 6:28 pm

Re: Release only crash

Post by lidgren »

Thanks for replying! The issue occurs in Release as well as ReleaseStrip. I'm using .net core 3 preview 5; so might very well be bugs in the preview.

Setting the projects to .net standard 2.1 and using System.Numerics from there did not solve the issue.

The issue arises if BOTH BepuPhysics and BepuUtilities active configuration are set to "Optimize code" - if either are unchecked, there is no crash. Compile constants does not affect the issue.

The problem arose after my upgrading to the latest commit of Bepu2; unfortunately I don't know exactly when the previous drop was taken. I might sync back to earlier changelists (if my git skills do not fail me) and see if I can pinpoint where it broke for me.

--michael
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Release only crash

Post by Norbo »

Sortagoodnews, I've reproduced the same failure on 3.0 preview 5 locally. Time for some fun :P
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Release only crash

Post by Norbo »

Good news, this specific bug, or at least this manifestation of it, was apparently maybe fixed in the latest daily! Bad news:
3.0 Preview 6 daily (3.0.0-preview6-27727-02): fails silently after a few seconds
3.0 Preview 5: fails with access violation at CreateNewTypeBatch
3.0 Preview 4: fails with access violation at CreateNewTypeBatch
3.0 Preview 3: fails silently after a few seconds (plus one instance of "Common Language Runtime detected an invalid program")
3.0 Preview 1: fails silently after a few seconds
2.2.3: works
2.1.11: works

So it seems likely that 3.0 has introduced a bug, or there is some undefined behavior abuse that is making newer versions barf. If you do happen to find a version of bepuphysics that wasn't triggering this behavior, that would be very useful. I'm going to start looking for minimal repro candidates around CreateNewTypeBatch and friends under preview 5 in hopes that the issues are related.

I'm glad you reported this when you did- I was about to start using v2 with 3.0 in a new project, and these sorts of failures could have easily made me spend a dozen hours trying to debug the new project parts fruitlessly :D
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Release only crash

Post by Norbo »

As a short term workaround, it looks like the preview 5 failure can be avoided by tagging TypeProcessor<TBodyReferences, TPrestepData, TProjection, TAccumulatedImpulse>.InternalResize with [MethodImpl(MethodImpl.NoOptimization)]. Fortunately the problem seems pretty localized.

Edit:
Even better, remove the [MethodImpl(MethodImpl.AggressiveInlining)] on BufferPool.ResizeToAtLeast(ref RawBuffer ...)
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Release only crash

Post by Norbo »

It appears Unsafe.CopyBlockUnaligned is somehow involved. The conditions in which it fails are still unclear, but this should address the preview 5 crash at least.
lidgren
Posts: 21
Joined: Mon May 27, 2019 6:28 pm

Re: Release only crash

Post by lidgren »

Woot! I can confirm that the change in the commit you mentioned fixes the crash in my application as well. 5/5 - top notch support; would post again! :D
Norbo
Site Admin
Posts: 4929
Joined: Tue Jul 04, 2006 4:45 am

Re: Release only crash

Post by Norbo »

Repro was blessedly easy; issue's now up on the coreclr github: https://github.com/dotnet/coreclr/issues/24846
5/5 - top notch support; would post again!
:P
Post Reply