Issue with JsonTextReader as of version 10.0.1 (unchanged code was working fine on v.9.0.1)
See original GitHub issueHi,
I have a very large file (≈ 20GB) that contains an array of JSON objects, like this:
{ "MyJsonObjects" : [{"Id":"1"},{"Id":"2"},{"Id":"3"}]}
The array contains a few million records of “MyJsonObject” and I need to parse the file very fast. In order to achieve that, I have a parsing application running on several different machines and each instance is starting to read the file from a specific byte (different for every instance), so each instance is parsing a specific chunk of the file. So, if we say we had 10 million records to begin with, the 1000 instances running on 1000 CPUs will only have to parse 10,000 records each, reducing the parsing time to a few seconds, instead of several minutes.
I’m using the JsonTextReader and the JsonSerializer to parse the file. I place the stream to a specific byte that I feed each instance with, and I then deserialize 10,000 objects.
The “catchy” part of all this is that since I’m not parsing the whole json text, but only some records of the root array, the JsonTextReader is raising an exception each time I do a Read() between the records, because it reads the comma that separates the records.
I guess this is something expected, since I’m “hiding” from the JsonTextReader the complete file, so it thinks the stream contains invalid JSON text. This is why I just do a try-catch there to suppress this error (at the end of the code I’m posting below).
This has been working fine for quite some time now (several months), but as of version 10.0.1, the exception is causing the JsonTextReader to halt. As soon as this exception is raised, the JsonTextReader.Read() is returning null and the stream position no longer progresses.
The exact same code works fine as soon as I revert back to version 9.0.1.
Is this something intended? Maybe the stream would continue working even after the exception, as it was up to version 9.0.1?
In case you don’t see this being fixed in the future, any advice would be more than welcome.
int position = 0;
using (StreamReader sr = new StreamReader(input))
using (JsonTextReader jr = new JsonTextReader(sr) { SupportMultipleContent = true })
{
// Set the stream to a specific byte
sr.BaseStream.Seek(nextByteToRead, SeekOrigin.Begin);
JsonSerializer js = new JsonSerializer();
while (position < 10000 && jr.Read())
{
if (jr.TokenType == JsonToken.EndArray) break;
MyJsonClass jsonObject = js.Deserialize<MyJsonClass>(jr);
// At this point, we have the jsonObject, so we handle it.
position++;
// After having finished, we want to go to the next record.
// I do a ReadAsString() to pass through the comma separating the records.
// As of Newtonsoft.Json v10.0, this makes the JsonTextReader to stop working!!
// So, Newtonsoft v9.0 should be used until further testing.
try
{ jr.ReadAsString(); }
catch (Exception)
{ }
}
}
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (2 by maintainers)
I’ve added support for reading comma separated data with SupportMultipleContent https://github.com/JamesNK/Newtonsoft.Json/commit/6aed848eca0bb7ee8612495c203dc44a71e0f12a
That’s great news, thank you!!