Consider using `String::intern` when deserializing strings (K/JVM)
See original GitHub issueWhat is your use-case and why do you need this feature?
Sometimes we have a lot of equal strings in incoming data like JSONs that we deserialize to Kotlin/JVM objects. However, the current kx.serialization implementation creates separate objects for such strings (which, for example, results in not-so-optimal RAM usage).
Examples:
@Serializable
data class Data(val m: Map<String, String>)
val ser = Data.serializer()
val s = "abcdef".repeat(1050)
val original = Data(mapOf(s to s))
println(original.m.toList().first().first === original.m.toList().first().second) // prints true! (same key and value in the original data?)
val data = Json.Default.encodeToString(ser, original)
val decoded = Json.Default.decodeFromString(ser, data)
println(decoded.m.toList().first().first === original.m.toList().first().first) // prints false! (same keys in the original and the decoded data?)
println(decoded.m.toList().first().first === decoded.m.toList().first().second) // prints false! (same key and value in the decoded data?)
// other cases are equal strings in any other places, like in a list, in different lists, in different maps or objects and so on
Describe the solution you’d like
Internally in kx.serialization, before returning strings to the user code after deserialization, the .intern()
method could be used. This would make previous examples return true
and reduce the amount of memory used (as well as improve the comparison performance if those strings are used in any comparison). This can be checked by adding .intern()
calls in the example code.
I guess the current solution for this on the user side is to create a custom String serializer that just calls .intern()
before returning a string, but I see this as a useful change for all kx.serialization users, so I think it could be implemented from the framework side by default.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
I can note that, in fact, this feature request was rather about string deduplication.
String.intern()
is not the correct way to do it, but we may consider others in the future.Been there, done this: https://github.com/FasterXML/jackson-core/issues/332
TL;DR
intern
was never intended to be public-use primitive and tends to seriously bloat and degrade both runtime footprint and performance over time.The standard approach for high-loaded applications that know what they are doing is their own string interning – typically a Guava/Caffeine-based cache with an eviction policy tailored for end application needs. But in such scenarios,
kotlinx.serialization
is not an answer – the logic should either be in class’ constructors or even in setters on the postprocessing phase after the deserialization.