question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Server: Explore implications: Don't deserialize uploaded objects?

See original GitHub issue

Right now, we’re sending objects like this:

[{"id": "9b2cdb21da092dbd3558a4bc55b2cf7e", "speckle_type": "Base", "totalChildrenCount": 0, "numbers": [0.04667752874618203, 0.16370857295385177, 0.1008153029515465]}]

That is a batch of 1 base object that has a property with 3 random numbers.

The server has to deserialize all uploaded objects content (from the POST parameter), take some metadata (id, speckle_type) and then serialize each object to be inserted in the DB.

We can improve the server CPU and RAM usage by not deserializing and serializing object data. But in order to achieve this, we have to greatly separate the object metadata from the object content and find ways to upload them both, so the server can use the object data as a blob, without touching it.

This ticket is about exploring and discussing the implications, and decide if we want to go further in this direction.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
didimitriecommented, Jul 26, 2021

Okay @cristi8, some initial sample strings for objects implementing the discussed protocol, coming straight from Grasshopper:

  1. Object with no actual closure
v2	{"id":"716defb2e17dfcb93fcc42d6d840aedf","closure":null,"type":"Objects.Geometry.Point"}	{"x":0.0,"y":0.0,"z":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Point","id":"716defb2e17dfcb93fcc42d6d840aedf"}
  1. With a closure
v2	{"id":"36ecb5b44d8a86503e5ce52baddbd828","closure":{"716defb2e17dfcb93fcc42d6d840aedf":2,"6de9157abccc365d93eb3679458564fb":1},"type":"Objects.Geometry.Line"}	{"@A":{"speckle_type":"reference","referencedId":"6de9157abccc365d93eb3679458564fb"},"domain":{"start":0.0,"end":28.635642126552703,"totalChildrenCount":0,"speckle_type":"Objects.Primitive.Interval","id":"e35d6dd4de5d51614ed4508b5c81ad8d"},"bbox":{"basePlane":{"origin":{"x":0.0,"y":0.0,"z":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Point","id":"716defb2e17dfcb93fcc42d6d840aedf"},"normal":{"x":0.0,"y":0.0,"z":1.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Vector","id":"c8285634490d59f5d0febadf7e6f8184"},"xdir":{"x":1.0,"y":0.0,"z":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Vector","id":"7ffdd8bd27b63cfd03c7d765aa08521b"},"ydir":{"x":0.0,"y":1.0,"z":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Vector","id":"d85f68f829a55e1e456d7e46a71bed2c"},"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Plane","id":"00b5e9f9f29ebcd1249417950a0ebe31"},"xSize":{"start":-5.0,"end":1.0,"totalChildrenCount":0,"speckle_type":"Objects.Primitive.Interval","id":"b01bc283813436213bbd00cfeb20bef8"},"ySize":{"start":-19.0,"end":9.0,"totalChildrenCount":0,"speckle_type":"Objects.Primitive.Interval","id":"8db937bbbb9092581edf6d679ac5c6fe"},"zSize":{"start":0.0,"end":0.0,"totalChildrenCount":0,"speckle_type":"Objects.Primitive.Interval","id":"f74327267ec0f6a441b2825bd5244dca"},"area":336.0,"volume":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Box","id":"8ae222d315047d0a7e417c894194babc"},"area":0.0,"length":28.635642126552703,"start":{"x":1.0,"y":-19.0,"z":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Point","id":"b77d1f8efe686b4774fe295fff73fef4"},"end":{"x":-5.0,"y":9.0,"z":0.0,"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Point","id":"923363e29c47231cf62acb5ff093e306"},"totalChildrenCount":0,"units":"mm","speckle_type":"Objects.Geometry.Line","__closure":{"716defb2e17dfcb93fcc42d6d840aedf":2,"6de9157abccc365d93eb3679458564fb":1},"id":"36ecb5b44d8a86503e5ce52baddbd828"}

The changes in .NET were quite minimal (so far), see this branch. It’s defo WIP, so we can still fiddle with the string formatting and what props we add in metadata (if we still need to discuss this).

Have been testing this with local transports only so far.

1reaction
cristi8commented, Jul 15, 2021

Minimal changes

!!! The following describe the minimal changes to optimize the REST API upload endpoint and aim to keep everything else working (GraphQL endpoints and download code should remain the same and working with their current performance)

Right now, this is an example batch of 1 object that is uploaded to the Speckle Server:

[{"id": "9b2cdb21da092dbd3558a4bc55b2cf7e", "speckle_type": "Base", "totalChildrenCount": 0, "numbers": [0.04667752874618203, 0.16370857295385177, 0.1008153029515465]}]

As this file can be very large, we should separate the object metadata from the object content. This would allow to treat the object content as a large string on the server side

Metadata:

  • object id (Optional right now. It can be left optional, as this can be computed on the server on the content blob)
  • __closure field
  • speckleType field (it is also stored in the objects table as a column)

Content:

  • can be left in the current version, also including the metadata (but not the __closure field, as it was removed in the server code before sending to db)

Upload protocol

The upload protocol it’s the main component that has to change, to allow for the server optimizations. One proposed solution is to have line-based files uploaded, like this:

- object #1 metadata (json) - would be deserialized on the server
- object #1 content (string) - entire line is the object content
- object #2 metadata (json)
- object #2 content (string)

or, to keep things more compact, to use a \t character to separate the metadata from the content:

- [object #1 metadata (json)]\t[object #1 content (string)]
- [object #2 metadata (json)]\t[object #2 content (string)]

This is ok because the compact json representation doesn’t contain tab or newline characters.

Server changes

The API doesn’t need to change (we can use the same upload methods) and we can even keep the backwards compatibility for some time (we can distinguish the old uploads by the starting [ character in the uploaded batch).

  • just adapt the createObjectsBatched method to have the object metadata separated from the content - easy change

Client changes (based on Py sdk)

AbstractTransport:

  • From: save_object(id, serialized_object)
  • To: save_object(id, serialized_metadata, serialized_object)

Serializer:

  • Call the new save_object for the transport and serialize and pass the metadata separately as a different argument

ServerTransport:

  • Construct batches according to the new upload protocol

Other transports:

  • SQLite: objects had a __closure field, now they don’t. If not needed, we can drop the metadata. If needed, we can make a metadata column in the sqlite db

Possible implementation plan

  1. Server REST API support for the new upload protocol
  2. Python implementation
  3. .net implementation

Bonus: Downloading objects

Right now, when downloading, the db query select ..., data ... from objects returns the data deserialized and we serialize it when sending to clients.

We can select data::text to get the text representation instead to avoid deserialization/serialization for every downloaded object

Read more comments on GitHub >

github_iconTop Results From Across the Web

Systematically Hunting for Deserialization Exploits - Mandiant
In this case, the server returned a simple HTTP 200 OK response after deserializing and executing the provided object.
Read more >
Deserialization of untrusted data - OWASP Foundation
Consequences · Availability: The logic of deserialization could be abused to create recursive object graphs or never provide data expected to terminate reading....
Read more >
Deserialization risks in use of BinaryFormatter and related types
Deserialize method is never safe when used with untrusted input. We strongly recommend that consumers instead consider using one of the ...
Read more >
java - Cannot deserialize a class only when sent by the server ...
The classes I serialized copy and pasted. The classpath is the same, all fields are serializable and the have no-arg constructors. I also...
Read more >
Exploiting insecure deserialization vulnerabilities - PortSwigger
In this section, we'll teach you how to exploit some common scenarios using examples from PHP, Ruby, and Java deserialization. We hope to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found