Server: Explore implications: Don't deserialize uploaded objects?
See original GitHub issueRight now, we’re sending objects like this:
[{"id": "9b2cdb21da092dbd3558a4bc55b2cf7e", "speckle_type": "Base", "totalChildrenCount": 0, "numbers": [0.04667752874618203, 0.16370857295385177, 0.1008153029515465]}]
That is a batch of 1 base object that has a property with 3 random numbers.
The server has to deserialize all uploaded objects content (from the POST parameter), take some metadata (id, speckle_type) and then serialize each object to be inserted in the DB.
We can improve the server CPU and RAM usage by not deserializing and serializing object data. But in order to achieve this, we have to greatly separate the object metadata from the object content and find ways to upload them both, so the server can use the object data as a blob, without touching it.
This ticket is about exploring and discussing the implications, and decide if we want to go further in this direction.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Systematically Hunting for Deserialization Exploits - Mandiant
In this case, the server returned a simple HTTP 200 OK response after deserializing and executing the provided object.
Read more >Deserialization of untrusted data - OWASP Foundation
Consequences · Availability: The logic of deserialization could be abused to create recursive object graphs or never provide data expected to terminate reading....
Read more >Deserialization risks in use of BinaryFormatter and related types
Deserialize method is never safe when used with untrusted input. We strongly recommend that consumers instead consider using one of the ...
Read more >java - Cannot deserialize a class only when sent by the server ...
The classes I serialized copy and pasted. The classpath is the same, all fields are serializable and the have no-arg constructors. I also...
Read more >Exploiting insecure deserialization vulnerabilities - PortSwigger
In this section, we'll teach you how to exploit some common scenarios using examples from PHP, Ruby, and Java deserialization. We hope to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay @cristi8, some initial sample strings for objects implementing the discussed protocol, coming straight from Grasshopper:
The changes in .NET were quite minimal (so far), see this branch. It’s defo WIP, so we can still fiddle with the string formatting and what props we add in metadata (if we still need to discuss this).
Have been testing this with local transports only so far.
Minimal changes
!!! The following describe the minimal changes to optimize the REST API upload endpoint and aim to keep everything else working (GraphQL endpoints and download code should remain the same and working with their current performance)
Right now, this is an example batch of 1 object that is uploaded to the Speckle Server:
As this file can be very large, we should separate the object metadata from the object content. This would allow to treat the object content as a large string on the server side
Metadata:
id
(Optional right now. It can be left optional, as this can be computed on the server on the content blob)__closure
fieldspeckleType
field (it is also stored in theobjects
table as a column)Content:
__closure
field, as it was removed in the server code before sending to db)Upload protocol
The upload protocol it’s the main component that has to change, to allow for the server optimizations. One proposed solution is to have line-based files uploaded, like this:
or, to keep things more compact, to use a
\t
character to separate the metadata from the content:This is ok because the compact json representation doesn’t contain tab or newline characters.
Server changes
The API doesn’t need to change (we can use the same upload methods) and we can even keep the backwards compatibility for some time (we can distinguish the old uploads by the starting
[
character in the uploaded batch).createObjectsBatched
method to have the object metadata separated from the content - easy changeClient changes (based on Py sdk)
AbstractTransport:
save_object(id, serialized_object)
save_object(id, serialized_metadata, serialized_object)
Serializer:
save_object
for the transport and serialize and pass the metadata separately as a different argumentServerTransport:
Other transports:
__closure
field, now they don’t. If not needed, we can drop the metadata. If needed, we can make ametadata
column in the sqlite dbPossible implementation plan
Bonus: Downloading objects
Right now, when downloading, the db query
select ..., data ... from objects
returns the data deserialized and we serialize it when sending to clients.We can
select data::text
to get the text representation instead to avoid deserialization/serialization for every downloaded object