Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

v8.0.0 Roadmap

See original GitHub issue

v8.0.0 .NET Client Roadmap

On August 10th, we released Elasticsearch 8.0.0-alpha1, and we wanted to provide this high-level roadmap of what we are planning for the v8.0.0 client, laying the groundwork for its future. We’ve been working towards the next client release for several months, with some elements well underway. We’re excited to begin sharing our vision in this issue. A major release is an excellent opportunity to reflect on any existing design limitations and introduce improvements that may be difficult or impossible without breaking changes. This document highlights important details about how we plan to maintain the client in the future. These drive some of our decisions for this release.

NOTE: This roadmap outlines ideas, concepts and features that we hope to include in the v8.0.0 client release. Some items may change as we investigate them and/or drop them to future releases as appropriate.

Themes

Let’s begin with some of the main themes we’ve identified as key objectives for this release.

User-friendly - The client should be approachable for consumers of all experience levels. The API surface should be reviewed to identify areas for improvement, particularly for common scenarios. Public types and their members should include helpful XML comments which appear in IDE tooling to guide their use. Additional helpers should be introduced, for example, helpers for using Point In Time for optimised and simplified data egress.
Performance - .NET Core and .NET 5+ introduced huge performance improvements within the runtime and Base Class Libraries (BCL). This includes methods and types geared towards reducing allocations, such as Span<T>. The client should leverage these to reduce allocations on hot paths. The client should introduce overloads accepting these types where it can further offer a benefit for improved performance or convenience. Development of the client should continuously seek further performance improvements which consumers can benefit from, simply by upgrading to the latest version.
Best Practices - The client should continue to apply Microsoft best practices around API design. This includes ensuring that its design aligns well with the latest API design standards used by Microsoft themselves. The client should also guide consumers to leverage the latest best practices in Elasticsearch by preferring more optimal APIs where applicable. For example, favouring Point In Time APIs over the deprecated Scroll APIs.
Efficient to maintain - This entry may be surprising as it appears less user-focused at first glance, but bear with us. Elasticsearch introduces many great new features in each minor release. These features add new endpoints and expand requests and responses for existing APIs. For the low-level client, we automatically generate code to support these APIs on day one. For NEST, the high-level client, we must manually maintain many types to implement the strongly-typed support. This requires significant engineering time. With the next release, we are aiming to reduce this overhead (see below “Code Generation” section for further details). Removing this overhead creates more time to work on value-add features such as helpers, performance, and documentation.
Diagnostics - The client should make diagnosing issues as simple as possible. We already have excellent diagnostics in the form of audit trails, debug information and DiagnosticSource events. These can be configured to understand the causes of any problems. The next client version should build on this foundation to further improve the diagnostics story.
Documentation - The documentation should be clear and guide consumers toward the path of success when using the client. It should include more detail for common scenarios and include recommendations for best practice usage. Where we have frequently asked questions, the documentation should be expanded to address those more clearly. We want consumers of the client library and Elasticsearch to be productive with as little friction as possible.

Primary Changes

Before we get to the roadmap items, it’s worth calling out a few core changes we are planning, which influence design decisions and some of the items which appear on the roadmap.

Code Generation

As indicated in the “Efficient to maintain” theme above, manually maintaining request/response types requires much engineer time. It can also introduce a lag for the implementation of some APIs in the high-level NEST client. It’s a predominantly manual process, and as such, things can be missed.

The Elastic language clients team are excited to be working on a type specification internally, which will provide a fantastic resource to document the endpoints of Elasticsearch. This includes defining representations of the requests and responses, and all subtypes needed to (de)serialise API request and response bodies. We have a fantastic opportunity to leverage this specification by building advanced code-generators for our clients.

This work is already underway for .NET. We have a generator prototype that uses the Roslyn APIs to produce far more of the code required within the high-level client. We intend to continue with this work to code generate all Elasticsearch endpoints and their corresponding types within the client. Once this work is complete, new server features will be added via automated PRs using GitHub actions. This is extremely exciting as it ensures timely inclusion and support of new endpoints in Elasticsearch. Code generation also helps ensure all fields on requests and responses are supported and represented accurately. Automation for the win!

A side-effect of code generation is that it may require some type names and namespaces to change. When manually crafting the types, engineers have carefully avoided naming conflicts. The code generator needs to be more generic in its approach and will leverage namespaces to distinguish types from one another. The intent is to try to limit the breaking changes this introduces. As we progress with code generator work, we will have a complete understanding of what this may involve for the consumption of the library and the upgrade process.

System.Text.Json

Currently, the high-level client uses an internalised and modified version of Utf8Json for request and response (de)serialisation. This was introduced for its performance improvements over Json.NET, the more common JSON framework at the time.

While Utf8Json provides good value, we have identified minor bugs and performance issues that have required maintenance over time. Some of these are hard to change without more significant effort. This library is no longer maintained, and any such changes cannot easily be contributed back to the original project.

With .NET Core 3.0, Microsoft shipped new JSON APIs that are part of .NET. Initially, the feature set was quite limited, but each subsequent release of .NET has filled more of the functionality gaps. For v8.0.0, we plan to adopt the System.Text.Json (STJ) APIs for all (de)serialisation. Consumers will still be able to plug in their own serialisation for their document types.

By adopting a Microsoft supported library, we can better depend on and contribute to its maintenance. STJ is designed from the ground up to support the latest performance optimisations in .NET and, as a result, is both fast and low-allocation (de)serialisation. Further work is included for .NET 6, which will continue to optimise serialisation through source generators which we can leverage to gain further performance boosts in our .NET client.

This is a significant piece of work as we require many custom converters for more complex types and JSON structures. Requests and responses, for example, search, include polymorphic properties. We are prototyping these changes in the code generated client with good success so far.

Transport

The .NET client includes a transport layer responsible for abstracting HTTP concepts and to provide functionality such as our request pipeline. This supports round-robin load-balancing of requests to nodes, pinging failed nodes and sniffing the cluster for node roles.

As part of v8.0.0, we are moving this transport layer out into its own dedicated package and repository. This supports reuse across future clients and allows consumers with extreme high-performance requirements to build upon this foundation. We already have the master branch of the existing client repository migrated to this new Transport package.

Before release, we are investigating further enhancements to support other scenarios and optimise performance. We also plan to ensure that we can implement future HTTP improvements from Microsoft, including a proposed set of lower-level APIs (LLHTTP) for further allocation reductions.

High-level Roadmap

Below you will find some of the core units of work we are undertaking for the next client version. These are roughly broken into stages representing the priorities and dependencies of these items.

Stage 1

Code generation of a majority of the client from the new Elasticsearch specification.
- Generate request/response types.
- Generate client methods to invoke requests to server endpoints.
Switch request/response serialisation to System.Text.Json for high-level (NEST) generated client.
Support nullable reference types as far as compatible with the supported target frameworks.
Add IAsyncDisposable support where applicable.
Support compatibility headers in the new Transport package.
Support new Elasticsearch security enhancements and defaults.
Perform profiling and guided reduction of allocations on hot paths for common scenarios.

Stage 2

Review and make it easier to use custom JSON serializer settings.
Introduce IAsyncEnumerable support on appropriate APIs and helpers.
Introduce ValueTask return types where appropriate for performance gains.
Design and introduce additional helpers:
- Point In Time
- Snapshots
- Tasks
Reassess how field inference causes virality of generic args.

Stage 3

Remove conditionless queries (by default).
Drop generic type constraint Query<T>.
Review story around updating documents with nulls.
Rename methods and types where appropriate to improve clarity and usability.
Investigate the use of structs for basic types (Ids).
Consider accepting a M.E.L ILogger/ILoggerFactory to libraries (inc. Transport) to provide a richer diagnostics story.
Investigate support for client registrations for (de)serialising custom token filters, queries, aggregations, plug-in features and filter paths.
Introduce support for OpenTelemetry metrics in transport
Investigate F# interop.

Stage 4

Investigate C# record support.
Retire overloads with deprecated .NET types.
Low Level HTTP Client support in transport layer (LLHTTP).
Detailed benchmarking of core functions with a view to CI integration.
Documentation and code examples.

Summary

We’re incredibly excited about the work we have begun towards the next version of the .NET client for Elasticsearch. We have a lot of work ahead and will share more as we are nearer a final product. We welcome your feedback and ideas which can help shape the future of the .NET client.

Issue Analytics

State:
Created 2 years ago
Reactions:18
Comments:16 (8 by maintainers)

Top GitHub Comments

3reactions

ejsmithcommented, Oct 20, 2022

Hi, @michael-budnik.

There are no plans to reintroduce the removed interfaces in the v8 design.

Ultimately I don’t believe these interfaces serve a purpose and they generally get misused for convenient testing. As we only ever expect there to be a single concrete implementation of the client, I personally feel an interface is an unneccesary abstraction. It also technically means that each addition to the client is breaking (should anyone be implementing the interface) without falling over to default interface implementations. I took the decision to remove the ambiguity they introduce given the extensive changes and code-gen work for this new client. In the redesign, we avoided the internal requirement that some of the interfaces were helping solve, further making them redundant.

I don’t feel it should be the responsibility of the client to provide abstractions for the purpose of testing consumer code. Mocking frameworks tend to make this super convenient but it bloats the assembly and potentially catches people out if the interface has to change to match the evolving implementation.

What I would advocate for in your scenario is introducing your own abstraction in the form an an interface and an implementation which forwards onto the ElasticsearchClient. Your interface can act as a simplified facade of the APIs your application actually uses ont the client. At this point your code can depend on your abstraction and not be coupled directly to our library. You can mock your abstraction as neccessary for testing.
public interface ISearchClient // abstraction limited to the methods actually in use by dependants
{
   Task<SearchResponse<TDocument>> SearchAsync<TDocument>(Action<SearchRequestDescriptor<TDocument>> configureRequest, CancellationToken cancellationToken = default);
}

public class ElasticsearchSearchClient : ISearchClient // basic wrapper implementation
{
   public ElasticsearchSearchClient(ElasticsearchClient client) => Client = client;

   public ElasticsearchClient Client { get; }

   public Task<SearchResponse<TDocument>> SearchAsync<TDocument>(Action<SearchRequestDescriptor<TDocument>> configureRequest, CancellationToken cancellationToken = default) =>
      Client.SearchAsync(configureRequest, cancellationToken);
}
You may even want to avoid using our request (or descriptor) and response types in your abstraction and map those in your implementation. That further decouples you from changes by encapsulating our client and your implementation can also handle more complex application-level decisions like exception handling, retries etc.

For pure unit testing, the above should avoid the need to get as low as the transport interfaces and use of InMemoryConnection etc. In the cases where you must call the client directly and have no layer of indirection in between depending on an ElasticsearchClient with an InMemoryConnection is a viable alternative for unit testing.

For other advanced scenarios, full integration testing is a further consideration.

If there are specific examples of where unit testing is extremely difficult, even with the above approaches in mind, I’m happy to review those on a case-by-case basis. I also plan to document some example scenarios in more detail once the v8 client work is completed.

This is just a tiny sample of how you’ve not made our lives as consumers of this library simpler or better, but much harder because of your perceived gains in protecting yourself from support issues. Your massive breaking changes and barriers are costing people who have invested heavily in this library an incredible amount of cost, effort and pain if we don’t want to be stuck using the legacy client forever.

I know you wanted to modernize this library and have all of the models and methods code generated, but IMO, you’ve taken WAY too much liberty in breaking everything and I don’t think you realized the cost you would be inflicting on your users.

3reactions

ejsmithcommented, Oct 19, 2022

Please don’t assume you can control everything and that you know what all the use case scenarios are by sealing every class and making all the properties read only.