Spatial.Converters throws null reference exception when data serialized with System.Text.Json.JsonSerializer class
See original GitHub issueDescription
When attempting to load data with spatial data from CosmosDB, the Microsoft.Azure.Cosmos.Spatial.Converters.GeometryJsonConverter
throws a NullReferenceException
in ReadJson
when the spatial data was stored using System.Text.Json.JsonSerializer.SerializeAsync(Stream, object)
. This is because the Microsoft serializer writes spatial data in a different format than the Newtonsoft deserializer used in the GeometryJsonConverter
class.
This problem arose when attempting to follow the official sample code for bulk inserts provided at https://github.com/Azure-Samples/cosmos-dotnet-bulk-import-throughput-optimizer/blob/main/src/Program.cs
To Reproduce
- Start with the official sample code for bulk CosmosDB inserts linked above
- In your sample, add a field of type
Microsoft.Azure.Cosmos.Spatial.Point
calledlocation
- Run your sample; your CosmosDB collection should have an object with a
location
value - Attempt to read your data back using code similar to this:
var queryable = Container
.GetItemLinqQueryable<MyType>(true)
.Where(p => p.id = "my id");
var iterator = queryable.ToFeedIterator();
var models = new List<MyType>();
while (iterator.HasMoreResults)
{
var response = iterator.ReadNextAsync()?.Result;
if (response is null) break;
models.AddRange(response);
}
Expected behavior The query should execute and return your data
Actual behavior
The method iterator.ReadNextAsync()
throws an AggregateException
containing a NullReferenceException
terminating at Microsoft.Azure.Cosmos.Spatial.Converters.GeometryJsonConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
Environment summary SDK Version: .NET 5.0.3, Microsoft.Azure.Cosmos package 3.16.0 OS Version: Windows 10
Additional context
The cause appears to be a mismatch between the way Newtonsoft and Microsoft serialize Point
data. The example code for bulk-inserts linked above uses System.Text.Json.JsonSerializer
to serialize the data for insertion into CosmosDB. But the Microsoft.Azure.Cosmos.FeedIteratorCore<T>
class uses Newtonsoft tools to deserialize.
Newtonsoft’s serialization looks like this:
"location": {
"type": "Point",
"coordinates": [
-90.740237,
39.950254
]
},
while Microsoft’s looks like this:
"location": {
"Position": {
"Coordinates": [
-87.9066,
41.9795
],
"Longitude": -87.9066,
"Latitude": 41.9795,
"Altitude": null
},
"Crs": {
"Type": 0
},
"Type": 0,
"BoundingBox": null,
"AdditionalProperties": {}
}
Examining the source code at https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos/src/Spatial/Converters/GeometryJsonConverter.cs has this at line s 63-66:
JToken typeToken = token["type"];
if (typeToken.Type != JTokenType.String)
{
throw new JsonSerializationException(RMResources.SpatialInvalidGeometryType);
}
My guess is that line 64 throws the NullReferenceException
because token["type"]
will be null in the Microsoft-serialized example.
I tried to fix this by replacing await System.Text.Json.JsonSerializer.SerializeAsync(stream, model)
in the sample code with this bulkier Newtonsoft code:
var itemsToInsert = new Dictionary<string, Stream>(models.Count);
var jsonSerializer = new JsonSerializer() { NullValueHandling = NullValueHandling.Ignore };
foreach (var model in models)
{
var stream = new MemoryStream();
var streamWriter = new StreamWriter(stream);
var jsonWriter = new JsonTextWriter(streamWriter);
jsonSerializer.Serialize(jsonWriter, model);
await streamWriter.FlushAsync();
itemsToInsert.Add(model.id, stream);
}
This worked, but I am seriously worried about all the open streams after it finishes. It looks like a potential memory leak to me.
(Note also that the sample uses the model’s PartitionKey
as the key in the itemsToInsert
dictionary, which means the sample breaks when you attempt to add two items that are in the same partition. My fix to that problem was to use the ID as the dictionary key and get the item’s partition key
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (8 by maintainers)
Top GitHub Comments
@ealsur , I believe the
GeometryJsonConverter
throws the NRE at line 64:…because I believe
token["type"]
returns null in that case.I’m still concerned that the mismatch between Newtonsoft and Microsoft on the
Point
class means existing data may cause the SDK to fail in future releases.OK. I appreciate what you’re saying. But there’s still a bug here, even if it’s only a usability bug. Specifically, your statement “Spatial types, provided by the SDK, are Newtonsoft.Json compatible” is generally true but does not remain true when you use
System.Text.Json.JsonSerializer.SerializeAsync(Stream, object)
to serialize aPoint
object.I filed this report because tracking down what was actually happening took a couple of hours. I think it would have been easier to track down if, instead of throwing a
NullReferenceException
the SDK threw aSerializationException
instead. And I filed it here because the SDK, not the sample code or the JSON libraries, threw the exception that took me a couple of hours of digging through the SDK source code to figure out. I believe I identified exactly why the SDK throws an NRE, and I believe that this is an inappropriate outcome in the general scenario I’ve described (serializing aPoint
withSystem.Text.Json
and deserializing it with whatever the SDK is using).Another possibility is for the team responsible for
System.Text.Json
to make their serialization ofPoint
compatible with naïve serializations such as Newtonsoft’s. That would ensure that when the Azure SDK eventually switches toSystem.Text.Json
, the switch doesn’t break what could be millions of CosmosDB documents that serialized geospatial data using Newtonsoft’s serializer.Finally, if you provide feedback to the team responsible for the sample code I followed, please also mention that populating the dictionary
itemsToInsert
using the partition key as the dictionary key will failif a logical partition contains more than one document.