question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ideas For a High-Performance System Design

See original GitHub issue

Hey there,

I’ve recently done a bit of work on physics-related tasks and in the process spent some thoughts on how Farseer performance could be improved.

Performance Observations

First, some assumptions on certain performance aspects that hopefully are uncontroversial enough to agree on as a precondition:

  • The fastest way to process data is to iterate over an array of structs.
  • It tends to be faster to bulk-process batches of data in a tight loop than to process one item at a time.
  • Parallelization has a synchronization overhead, except where no synchronization is necessary.
  • Executing user code has a guard overhead, except where no context-specific guard is necessary.
  • There is a non-zero overhead to abstractions like List<T> vs. T[] when it comes to bulk-processing large amounts of data.

Potential Performance Sinks

Next, here is a list of Farseer system design decisions that potentially drag performance below the theoretical maximum:

  • Shape, Joint and Body are all classes stored in lists or similar data structures.
  • Users can subscribe to events that are invoked from the middle of the physics simulation.
  • User events are invoked callback style, item-by-item.
  • Parallelizing physics simulation internally (i.e. not by putting Farseer into its own thread) is complicated because every object may alter the state of every other.

Design Decision Draft for Maximizing Efficiency

Finally, I’ll just throw in some rather radical ideas on how to restructure the overall system to address the above issues:

  • Make Body a struct and store all bodies in a World-global (potentially sparse) array.
  • Make Joint a struct (generalized) and store all joints in a World-global (potentially sparse) array.
  • Make Shape a struct (discriminated union) and store all shapes in a Body-local (dense) array.
  • Double-buffer all of the above data within World, so a physics step / update can strictly read-only from buffer A (current frame) and write-only to buffer B (next frame). This enables internal parallelization without synchronization using Parallel.For and similar.
  • Grant users direct access to all of the above World-global arrays so they can perform efficient bulk operations.
  • References between Body, Joint and Shape take the form of int-based handles that represent the access index in their world.
  • Instead of event callbacks, gather all occurrences (structs!) to “subscribed” events in an array over an update step and deliver them in bulk once the update step is finished. This enables users to do efficient batch-processing and frees Velcro from the need for state guards and their overhead. See issue #8.
  • User code that absolutely requires to be executed within the physics frame (such as collision filters, but be really strict here) is required to consist of a pure function that gets all required data read-only via parameter and returns a value.

As a side effect to improved efficiency, less overhead and internal parallel execution, the above changes could also make serialization (see issue #6) a lot easier. It would also make pooling (see issue #5) obsolete.

On the downside, this would require users to adopt a new API and potentially be more careful due to less safety options being around. There will be a bigger need for good documentation so users are primed for certain aspects of the system. This might also generate a large amount of issues that need to be resolved in order to adapt to the new design, but if any of these changes should make it into Velcro, now would likely be a better opportunity to tackle them than later.

I’m totally aware that the above are quite radical suggestions and that I don’t have any idea about most of the internal structures and design decisions of Farseer, so please read it as a collection of ideas by an interested observer 😃

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:26 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
Genboxcommented, Sep 14, 2017

I’ve had the multi-threading discussion more times than I care to count 😃

First of all, It is certainly possible to multi-thread the engine in specific scenarios, but years of experiments have taught us that a general solution hurts the common case. The things that have made a difference was cache-coherency and modern CPU features such as prefetch and pipelining. Keep small structures in CPU cache and you are suddenly running 2x speed. If I add just a bit of complexity such as keeping track of locks and have state objects for threads, then performance drops.

The most common case for this engine is not large games with incredibly advanced physics - it is small to medium simulations with 10-500 objects, and if I add on multithreading to speed up scenarios where you have 5000 objects, then we slow down the 10-500 objects case by x2. Not a great trade off.

In the 5000 objects case, you can certainly find a hotspot like ray-casts and multi-thread it in order to juggle more. We just have to remember that more threads do not equal faster simulation, it equals more simultaneous objects. In my tests, I had a ~20% increase in throughput by multi-threading with 4 cores; a terrible waste of resources gone to locking, state copying etc.

However, if you create a world and put that into a thread by itself, you suddenly have a 100% increase in throughput, and that scales linearly! So in larger games that actually need threading, it is much easier to copy state manually between the worlds on the macro level, than trying to squeeze anything out on the micro level.

With regards to islands - I did try to multithread (MT) them but keeping track of state across islands again degraded performance for the common case, while increasing performance for large simulations. I ended up with a version with A LOT of compiler conditions in order to get around it, and I was about to make a “MT edition” and a single thread edition, but I figured most people would choose the MT and then disregard the engine due to its bad performance in PoCs.

That being said, I’m very interested in what people can come up with. I’d love a multi-threaded version of the engine 😃

2reactions
ilexpcommented, May 15, 2017

In an unrelated project, I’ve done some research into RyuJIT64 as well as the CLR runtime and optimizations. Let’s just say that I’m deeply troubled by my findings, as it seems even trivial optimizations are not performed. […] I tried searching both corefx and coreclr on Github to see if anybody mentioned these issues, but it seems nobody cares.

There’s the tiered JIT issue in coreclr. To me, it seems like the biggest blocker for more JIT optimizations is the requirement to be fast at compiling vs. fast at executing. A tiered JIT has the potential to work around this in a way. But in any case, that’s kind of future / hypothetical stuff. The other part is extending C# to allow writing more efficient code in the first place.

Anyway - really glad you think some of the above ideas are worth pursuing. I’m currently a bit short on time, but I’ll take an occasional look at this issue (and project) to see how things develop, or whether there’s opportunity for me to chime in with an occasional comment. And from my side, as a long-term Farseer user, feel free to break as much as you want with Velcro as long as it’s for the better. 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

Designing a High-Performance Work System
Designing a HPWS involves putting all the HR pieces together. A HPWS is all about determining what jobs a company needs done, designing...
Read more >
How to design a system to scale to your first 100 million ...
There are many ways to improve scalability and high performance as follows. Combining Sharding and Replication techniques. Long-polling vs ...
Read more >
Exciting System Design Project Ideas & Topics for All
System design projects involve creating a plan for a complex system that will meet specific requirements or solve a particular problem.
Read more >
A Step-by-Step Guide to Designing High-Performance ...
Consider various architectural styles such as monolithic, client-server, microservices, or event-driven, based on the requirements and ...
Read more >
16.7: Designing a High-Performance Work System
A HPWS is all about determining what jobs a company needs done, designing the jobs, identifying and attracting the type of employee needed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found