question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider using `String::intern` when deserializing strings (K/JVM)

See original GitHub issue

What is your use-case and why do you need this feature?

Sometimes we have a lot of equal strings in incoming data like JSONs that we deserialize to Kotlin/JVM objects. However, the current kx.serialization implementation creates separate objects for such strings (which, for example, results in not-so-optimal RAM usage).

Examples:

@Serializable
data class Data(val m: Map<String, String>)

val ser = Data.serializer()
val s = "abcdef".repeat(1050)
val original = Data(mapOf(s to s))
println(original.m.toList().first().first === original.m.toList().first().second)  // prints true! (same key and value in the original data?)
val data = Json.Default.encodeToString(ser, original)
val decoded = Json.Default.decodeFromString(ser, data)
println(decoded.m.toList().first().first === original.m.toList().first().first)  // prints false! (same keys in the original and the decoded data?)
println(decoded.m.toList().first().first === decoded.m.toList().first().second)  // prints false! (same key and value in the decoded data?)

// other cases are equal strings in any other places, like in a list, in different lists, in different maps or objects and so on

Describe the solution you’d like

Internally in kx.serialization, before returning strings to the user code after deserialization, the .intern() method could be used. This would make previous examples return true and reduce the amount of memory used (as well as improve the comparison performance if those strings are used in any comparison). This can be checked by adding .intern() calls in the example code.

I guess the current solution for this on the user side is to create a custom String serializer that just calls .intern() before returning a string, but I see this as a useful change for all kx.serialization users, so I think it could be implemented from the framework side by default.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
sandwwraithcommented, Sep 21, 2022

I can note that, in fact, this feature request was rather about string deduplication. String.intern() is not the correct way to do it, but we may consider others in the future.

1reaction
qwwdfsadcommented, Sep 19, 2022

Been there, done this: https://github.com/FasterXML/jackson-core/issues/332

TL;DR intern was never intended to be public-use primitive and tends to seriously bloat and degrade both runtime footprint and performance over time.

The standard approach for high-loaded applications that know what they are doing is their own string interning – typically a Guava/Caffeine-based cache with an eviction policy tailored for end application needs. But in such scenarios, kotlinx.serialization is not an answer – the logic should either be in class’ constructors or even in setters on the postprocessing phase after the deserialization.

Read more comments on GitHub >

github_iconTop Results From Across the Web

string Intern on serializer.Deserialize<T>() - Stack Overflow
If you know your 4 standard strings in advance, you can intern them with String.Intern() (or just declare them as string literals somewhere ......
Read more >
get rid of String.intern · Issue #332 · FasterXML/jackson-core
So, app suffers of using string.intern implicitly via GC phases. There is no use cases of check the equality of strings using ==...
Read more >
String interning - Wikipedia
In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning...
Read more >
Java String intern(): Interesting Q & A - Oracle Communities
intern () is an interesting function in java.lang.String object. intern() function eliminates duplicate string objects from the application ...
Read more >
Deserializing Millions Of Messages Per Second/Core - Medium
Therefore, interning strings (i.e. keeping Map[String, String] as tags and String as metric) will harm deserialization performance because:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found