First of all I have to scale down my simulation, instead of 100 objects the simulation should have max 10 or 20 objects to simulate.
That would help, yes. However, you can be a little tricky about it. Rarely does every object need to be fully synchronized as frequently as possible. Many objects that can possibly be interacted with but are currently not being interacted with can be given a lower priority, even if they are still moving around.
While you should assume the physics is nondeterministic in the long term, it will generate reasonably consistent results over relatively small intervals. For objects not undergoing direct interaction, their client-simulated path will likely be very similar (within some error boundary) to the server's results. Missing a few hundred milliseconds between synchronizations for low priority objects won't have a very large impact on the overall simulation.
The end effect of this would be that you can have a large simulation with a network overhead primarily dictated by the small currently interacting set of objects as opposed to the much larger complete set of objects.
This is an application of 'scoping' where you are only considering relevant objects for a given player, which is talked about a bit more in this (amongst other things):
http://physics.hardwire.cz/mirror/Netwo ... DC2008.pdf
Some physical elements are not interresting to gameplay and need not to be synchronized
This is a very useful method for any objects that cannot interact with the player, which are generally the same objects which are uninteresting. Good candidates for this sort of optimization are pop cans (commonly strewn about many FPS levels) and tiny debris. If these objects cannot affect any other objects, you won't run into any weird situations.
If a client starts shooting the server MUST check for hits (cheating issues are to obvious here).
The "lag compensation" brought up in the valve article would probably be useful here for hitscan style shooting to reduce the perceived latency. Other than that, yes, the server would need to verify it.
There is one major issue here; if a client has authority over an object but it runs at a VERY low framerate. The entire simulation(of the clients authorized objects) will run at a low framerate. And what if the client that has autority crashes?
If a client is having horrible FPS problems, they probably won't notice a little more latency. In these cases, the server can take into account what little information it is receiving from the client about its state and integrate that into its own simulation. The server can pick up the slack, sending out updates to other clients based on its own simulation even if it might not be exactly what the authoritative client is doing. The rest of the work would be in ensuring that the correction mechanism used is robust so that the likely corrections that take place would not be overly jarring to observers.
If the client crashes completely but the server doesn't yet know that it is essentially disconnected, it is as if the client is just undergoing a very severe version of the above case. That is, the server can step in and do the simulation on behalf of the client smoothly.
BepuPhysics handles extra and/or interpolation internally between timesteps right? Is it possible to do this manually?
If you have internal time stepping enabled (space.simulationSettings.timeStep.useInternalTimeStepping = true), yes, it will do interpolation. It won't do any extrapolation, though. You can turn off the interpolation per entity with the allowInterpolation field. If allowed, the interpolation is baked into the buffered properties of entities, like centerPosition and orientationMatrix. None of the internal-prefixed entity properties are ever interpolated.
Interpolation/extrapolation can be done externally, yes. You'd need to compute the interpolation/extrapolation component and then manually take it into consideration for any system needing a modified property.
Just keep in mind you have to set velocities, not positions, every frame for physics to work correctly.
To clairfy, in order to maintain consistency, you should correct both velocities and positions. Velocities can generally be 'immediately' corrected when a packet is received since it's harder to see the rapid change. Positions are generally better corrected incrementally.