Potential Performance Issue Tracking
See original GitHub issueCurrent potential performance issues in Nu, in no particular order -
Issue 1 - Event handlers in a dictionary are slower than handlers on the subscribed object a la C#. This means a look up for every publish. However, this is an artifact of a publisher-neutral event system rather than anything related to FP.
Possible Solution - A lot of optimization is already done to avoid publish calls that won’t have a useful effect. Beyond these, I have yet to think of further solutions.
Issue 2 - SDL2 rendering finally now batches sprites, yielding a 30% rendering improvement, but SDL rendering is still the clear bottleneck in Nu. FIXED.
Possible Solution - We need to write an OpenGL renderer from scratch whose context is instantiated on a separate thread. Hopefully the existing SDL text renderer can still be used from OpenGL since it renders to a texture rather than to an SDL back buffer.
Issue 3 - Farseer Physics Engine doesn’t scale to 1000s of interacting bodies - https://github.com/VelcroPhysics/VelcroPhysics/issues/29
Possible Solution - Presumably Farseer could be replaced with a much faster 2D physics lib, perhaps one written in C or C++. Of course, the question then becomes about the overhead of the required marshalling.
Issue 4 - Scripting language is purely interpreted, slowing it down by a couple orders of magnitude. MITIGATED: Scripting is no longer a primary way of using Nu - now you pretty much use Elm-style / MVU for everything.
Possible Solution - Partial dynamic compilation down to byte code would make the scripting language at least an order of magnitude faster. This is a lot of work, tho. Another alternative solution is the pseudo-compilation as described below.
Issue 5 - Much reflection used when creating / loading entities.
Possible Solution - Some of the reflection features during entity instantiation can be elided with the use of the NoOverlay case, but I think this might be taken further.
Issue 6 - Asset loading is still blocking.
Possible Solution - Put asset loading concurrent to the renderer.
Issue 7 - Stateful event streams like fold store all state in a dictionary, requiring look-ups. MITIGATED: Event streams are mostly replaced by the new Elmish programming model, so the impact of this should be reduced to the margins.
Possible Solution - Perhaps the event stream data could be cached in the key itself provided we can conceive of a working invalidation scheme.
Issue 8 - Engine is not yet compiling in .NET Native or Mono AOT. OBVIATED: It’s several months into 2020 and it looks like .NET native isn’t going to be a widely applicable solution for the general .NET ecosystem. Mono is also in many ways going away. We have to look to .NET Core and its capabilities for the future.
Possible Solution - I’m not sure if .NET native is ready for this type of project yet, but to solve this issue, we probably just need to put some time into it. This is just a matter of effort.
Issue 9 - Unfortunately, entire bitmaps are loaded just to get their metadata. This just needs fixed. FIXED (but not verifed)
Possible Solution - Need to find a .NET library that lets us inspect image metadata without instantiating the whole image buffer into memory.
Issue 10 - The string hashing required for each Xtension property look-up is suboptimal.
Possible Solution - Not many practical ones. This issue wouldn’t exist if C# lazy-cached hashes in strings, but there’s no reason to believe it ever will. At one point I used an alternative type to string called ‘Lun’ (later called ‘Name’) which contained a string and its eagerly-computed hash, but it wasn’t very friendly to use. I decided to get rid of it in favor of C# strings to simplify Nu’s API. I’m pretty sure this was the right decision, but I can’t prove it one way or another without making large speculative changes to the engine.
Update - Now that F# finally has implicit ctors, reintroducing the Name type shouldn’t cause as many changes as it previously would have. This might now be a practical experiment to run.
Issue 11 - ECS is relatively space-inefficient. The amount of memory for book-keeping required to store 3M correlated Vector2 components is almost an order of magnitude greater than the memory for the components themselves. Admittedly, a Vector2 is a very small component, but it would be nice for the ratio to not be so skewed. FIXED: with new ECS
Possible Solution - The primary issue with memory overhead is caused by nearly all component associations needing to be manually indexed. With Flecs, for example, this is unnecessary because of the way component associations are implicit in the archetype container and dynamically graphed for traversal. I don’t think this is possible in .NET without Unsafe mode code and / or code generation, neither of which is desirable in Nu.
Issue 12 - LOH threshold is perhaps too small.
Possible Solution - Upgrading to >= .NET 4.8 will allow us to configure it via GCLOHThreshold
- https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gclohthreshold-element
Issue 13 - Potentially a lot of events when a subscribed entity transforms - https://github.com/bryanedds/Nu/blob/e125ffac9a59bec3c98cabcfcf7b49d446641d41/Nu/Nu/WorldModuleEntity.fs#L305-L332
Possible Solution - Probably nothing great. Could selectively disable a chunk of transform events depending on the application. Not real sure what to do here other than assess that this is part of the cost of doing business declaratively.
Issue 14 - Synchronizing entity properties via World.setEntityPropertyFast
requires a small and likely cache-local dictionary look-up via WorldModuleEntity.EntitySetters
, which is surprisingly fast.
Possible Solution - A faster alternative might be hard-coding a duplicate of the EntitySetters table in a match expression or using a loftier technique such as code generation in the MVU implementation.
Issue 15 - Nu Text rendering might be quite inefficient due to not caching target render buffers. IIRC, render buffers use for text are allocated and deallocated on a one-off basis. I do not see how that could possibly scale well.
Possible Solution - Code it properly. 😃
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (11 by maintainers)
Top GitHub Comments
I’m currently working on putting the main subsystem processing on separate threads. If this works well, it should at least double performance.
I was unable to utilize SDL_gpu due to this issue - https://github.com/grimfang4/sdl-gpu/issues/15#issuecomment-851590757
I don’t know if the maintainer, @grimfang4 is aware of the issue tho?