question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fundamental API design flaw -- with respect to encoding and serialization

See original GitHub issue

Expected behavior

The stated purpose of the library behaves as documented and expected – i.e. users should not need to be aware of implementation and it should “just work”. One stated goal/priority for reddison is that users can use existing or new java classes ‘As Is’ and simply use reddison as a transparent native object database. This is a very compelling feature (which has been attempted many times over the last 30+ years). Considering the maturity and feature rich nature of the product I nievely assumed that this would ‘just work’ as intended (of course accepting some edge cases that are problematic.)

In particular, with the variety of codecs, including Jackson – which is if not the single most used JSON data mapper/serialization library, atleast in the top 3, that if a class could round-trip serialize with Jackson that class would be equally usable with Reddison – this is explicitly stated (with some caveots). By leveraging the work of existing encoders (like jackson) it is natural to assume that this is not magic – it would work ‘as good as’ whatever encoder was configured, so if one has domain knowlege in that encoder and has created classes which behave nicely with that encoder then Reddison would indeed be a transparent ‘Drop In’ – and be able to use the the other really awesome features provided.

Actual behavior

It simply does not work this way, and can not work as stated with the existing APIs. I believe this is due to a fundamental difference in how various encoders/codecs work, and not fully understanding the subtleties. Some encoders work by always storing the type information with data so that deserialization is possible from the data alone with no external context or special per-instance configuration. Other encoders (such as Jackson) prioritize round trip to user controllable serialization formats. The concept being that the codec can be configured such that the output (JSON in this case) is produced in a form that either pre-exists (must be matched exactly) or ‘user friendly’ (intentionally undefined – but again, assumed to exist and must be matched).

To achieve this, the default mode, and in fact almost all features of Jackson are designed to NOT embed type into the JSON data itself, but rather to make use of the serialization or deserialization dynamic context (and other out-of-band configurations) to be able to parse ‘arbitrary’ JSON ‘as is’ into the expect type. This is fundamentally different approach and goals then codecs which produce ‘black box’ output and instead focus on minimal or no out-of-band context needed to round trip Java objects.

The side effect is that in order for Jackson to function properly (i.e. in the use cases it was primarily designed for), type information MUST be provided out-of-band in order to deserialize. The exception, which is not commonly used, and which recently has been shown to be a severe security problem, is the special mode of embedding type information with each JSON element. These modes are not interchangeable and have non-compatible requirements, and in general are ‘all or nothing’ – you must code for one or the other mode explicitly and consistently.

The current Reddison APIs cannot handle either mode correctly in many cases. For example, RBucket, without extreme hacks, cannot round trip most non-primitive objects – even basic Java Bean or POJO. This is NOT solvable by simply overriding the codec and tweaking it somehow – because internally, in intentionally non-exposed implementation code, the necessary usage of the core Jackson APIs are incorrect and inconsistent in at least a few places. The default Jackson codec uses a non-standard configuration of the ObjectMapper which is not able to round-trip consistently given the implementation of the deserializer (which in many places pasess a literal Object.class as the target type)

I have found that it is possible to hack this on a case by case basis by ‘creative’ (i.e. probably will break many other things) mangling of the codecs – on a per use basis – It still requires every API call to use a distinct codec taylored for the type, and I have no idea what will happen with nested RObjects – or other internal code – What this demonstrates, in my opinion, is that the API itself needs to be enhanced slightly to accommodate the behaviour of codecs which need out of bound type information, not by making this a burden on the users – particularly since this relies on understanding many details of the private implementation of Reddison to work – with no assurance that ones understanding is fully correct or that the undocumented internal implementation will not change. This essentially makes Reddison unsuitable for large classes of its intended use cases. That is sad, as I was looking forward to making use of the AWESOME features it does provide – to be blocked by a few fundamental API flaws.

Recomendation: Solving this problem is not complex, but it cannot be done cleanly by end users, it needs to be implemented internally and exposed through the APIs.
There are various other products which do this exact thing that can be used as a model for what is needed. Essentially – every API which deserializes a user defined object MUST have some mechanism of providing the target class explicitly. For example: Jackson itself has a Class argument for every method which does deserialization. This is not just an ‘advanced’ feature, its necessary for all but a few special cases.

cache2k https://cache2k.org/ solves the problem by making use of a rarely known, subtle feature of Java which provides a way to prevent type erasure of generics. E.g.

Cache<String,String> cache = new Cache2kBuilder<String, String>() {}

This subtle ‘trick’ of creating a anonymous type explicitly (note the “{}” at the end – critical), allows the cache to access the generic parameter types and use them later as needed for deserialization.

Using the first suggestion, I believe all it would take to make RBucket<Type>.get() to work properly is to add the class argument to get ( RBucket<Type>.get( Type.class ) ) Yes this is ugly and yes this breaks some of the simplicity of implementing interfaces that do not have this signature – but at least it works reliably and doesn’t require the end user to have deep domain knowledge of both the codecs and reddison to handle basic simple java classes as the framework is designed to do. This would have to be done likewise to all the methods which deserialize top level objects. Once the top level class is properly deserialized, all its internal references will work fine (if you allow Jackson to do it for you) – Jackson will use the declared type of the parents reference variable to deserialize the children. All the subtleties of how to get this to work are then defered to jackson and whatever customization one needs to do via annotations and configurations are all Jackson problems.

That way you could truly state that “If a class round trips through Jackson using the provided ObjectMapper then Reddison will work correctly for that class without changes or custom configurations needed”

I do not know the extent this affects internal code – this is why I am not attempting to implement this as a PR – I suspect that I would need to fully understand all the intricacies of Reddison to make sure I got all the cases correct – I believe the author is the best source of this domain knowledge. By the time I fully understood all the code enough to make this change, I would probably end up writing my own version – which is exactly what I did NOT want to do when I discovered Redisson.

Note that these suggested approaches are NOT simply having the user customizing the codec – the codec is not the right place to store per-type per-use information, that needs to come from the callsite directly, either in the creation of the R<object> or in its methods – One could still store the necessary class information in the codec as is done in the JsonJacksonMapCodec – but ‘hide’ this from the user – that is an implementation choice.

For JVM Languages which handle generics ‘better’ , such as Kotlin, one could write a wrapper around Reddison to provide the simpilier APIs and manage the explicit class arguments in the wrapper. This is what the Jackson Kotlin module does

https://github.com/FasterXML/jackson-module-kotlin

Most of this module is a single source file that provides ‘extension functions’ which convert the generic template types into explicit class types to pass to Jackson. E.g.

inline fun <reified T> ObjectMapper.readValue(src: File): T = readValue(src, jacksonTypeRef<T>())

This extension function exposes the intended user experience of

val myclass = mapper.readValue<MyClass>(File("file.json"))

and under-the-hood converts this to roughly mapper.readValue( File , MyClass.class )

This is not possible in pure Java, which is why cache2k uses its subtle ‘trick’ during the cache creation to extract the generics class types.

Note: that the problem exists in Reddison even without user supplied classes begin generic/templates – this is an issue with even basic POJO type classes with no inheritance, generics, or other ‘advanced’ features – Reddison simply does not supply the API’s necessary to deserialize these properly using codecs like Jackson which require out of band (or context provided) type information.

You just cant get there from here …

Note: I was very surprised that such a simple use case failed – until I checked the test cases in the examples and all of them which would have triggered these problems were using basic types (String, Int, Long) not custom classes. Those types do not need any out of band type information so they do work just fine. Similarly, the Reference class handles this problem in its own way (and may in fact be unnecessary if the above suggestions are implemented – not sure)

Steps to reproduce or test case

Redis version

Redisson version

Redisson configuration

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
MartinDevillerscommented, Jan 28, 2022

I just wanted to echo what @momomo and @DALDEI have already said in this thread. @mrniko you’re definitely a brilliant engineer creating a library of this complexity, but there are also some gaps that need to be addressed. For my project, we’re using RLiveObjectService to handle a wide variety of different Java-based entities. I think the general expectation of a service like this that as a developer I can throw in any Java POJO and “it just works™”. Unfortunately, in reality there are a lot of “gotchas”:

  • The default codec FST doesn’t handle any type of changes to POJOs. So if you’re changing your data model, but your Redis database still has data in the previous format, expect to see exceptions on read. Moreover, FST doesn’t seem to support Java 17 at the moment, blocking our Java 17 upgrade.
  • Codecs that rely on Jackson can handle model changes, but the default config provided by Redisson only works for POJOs that use basic Java primitives. If you want to use java.time types like LocalDateTime or ZonedDateTime (which IMO is a very common use-case) you’ll quickly end up rolling your own JsonJacksonCodec and you carefully have to setup its internals in order to correctly roundtrip data. Adding JavaTimeModule is not enough, for instance ZonedDateTime will deserialize but without the original timezone. Ouch. Moreover, using a type like this on the “root” object (the one annotated with REntity) will lead to a ClassCastException. This is the fundamental design flaw @DALDEI talks about, and makes sense considering how Redisson uses the ObjectMapper in this case: https://github.com/redisson/redisson/blob/master/redisson/src/main/java/org/redisson/codec/JsonJacksonCodec.java#L99 telling Jackson to deserialize to Object without providing any other details, meaning there’s no way for Jackson to know it’s dealing with a ZonedDateTime.
  • Behavioral differences between the root-entity and any nested classes within that root-entity. The “live” behavior only applies to the getters and setters on the root-entity, but doesn’t propagate down to nested classes. Unless, it just happens to be a List or Map that automagically gets turned into something else by Redisson, which comes with its own rules (e.g. properly handling cascading deletes when the root-entity gets deleted).

I think, rather than trying to support every (de)serialization mechanism out there, I would focus on providing one (or a few) that are preconfigured to work well with Redisson and support a wider set of uses cases:

  • Support for Java primitives, and commonly used types from the JDK, such as java.time. With support I mean developers should be able to use these types out-of-the-box and without any issues.
  • Support for non-breaking changes to POJOS (e.g. additive changes).
  • Adequate tradeoff between storage-footprint and performance. I know I can wrap a JsonJacksonCodec in a SnappyV2Codec, but why not provide one by default? Or make it a configuration setting to use compression and optionally specify a compression method.

Again, Redisson is a great library and you’ve done excellent work in creating this. I hope these suggestions help you to improve your project.

0reactions
mrnikocommented, Jan 29, 2022

@MartinDevillers

Thank you for your feedback.

FST doesn’t seem to support Java 17 at the moment, blocking our Java 17 upgrade.

FstCodec is deprecated since it’s too buggy. Default codec has been changed to MarshallingCodec since 3.13.0 version. Also you can try Kryo5Codec. Anyway you can always implement own codec.

Moreover, using a type like this on the “root” object (the one annotated with REntity) will lead to a ClassCastException.

The “live” behavior only applies to the getters and setters on the root-entity, but doesn’t propagate down to nested classes.

Can you create an issues for these tasks?

I know I can wrap a JsonJacksonCodec in a SnappyV2Codec, but why not provide one by default?

Since for some data can’t be compressed then it cause only excessive CPU consumption. It depends on use case and only developer can decide if compression is really necessary. JsonJacksonCodec is not the only codec. There are 9 codecs as well as few compression codecs which developer should have ability to choose from.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Archives and Serialization | Apple Developer Documentation
This protocol adds protection against security vulnerabilities introduced by instantiating arbitrary objects as part of the decoding process. Many system ...
Read more >
An Introduction and Comparison of Several Common Java ...
This article compares open-source serialization frameworks in the industry by the universality, usability, scalability, performance, ...
Read more >
Best Practices for Designing a Pragmatic RESTful API
An API is a user interface for a developer. Learn the best practices to make an API that is easy to adopt and...
Read more >
Data Serialization - Devopedia
Data serialization is the process of converting data objects present in complex data structures into a byte stream for storage, transfer and distribution ......
Read more >
Serialization in Object-Oriented Programming Languages
Serialization tool either forces its chosen encoding, which might lead to both time- and memory-inefficient encoding, or leaves character encoding unchanged and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found