TS Bebop views are shared which leads to difficult to diagnose bugs
See original GitHub issueDescribe the bug Okay, this might technically be a feature, but it was undocumented and so I spent a good bit of time assuming my code was wrong only to find out it had to do with serialization logic assumptions not holding up.
To Reproduce This test passes
it("correctly encodes and decodes datagrams", () => {
const data = new Uint8Array([7, 0, 0, 0, 75, 86, 83, 116, 111, 114, 101]);
const dgram1: IRpcDatagram = {
discriminator: RpcResponseOk.discriminator,
value: {
header: {id: 1},
data,
}
};
// this is the current state at runtime
// {"discriminator":2,"value":{"header":{"id":1},"data":{"0":7,"1":0,"2":0,"3":0,"4":75,"5":86,"6":83,"7":116,"8":111,"9":114,"10":101}}}
// 17,0,0,0,2,1,0,11,0,0,0,7,0,0,0,75,86,83,116,111,114,101
const raw = RpcDatagram.encode(dgram1);
const dgram2 = RpcDatagram.decode(raw);
expect(dgram2.discriminator).toBe(RpcResponseOk.discriminator);
if (dgram2.discriminator == RpcResponseOk.discriminator)
expect(dgram2.value.data).toEqual(data)
expect(dgram2).toEqual(dgram1);
expect(dgram2).not.toBe(dgram1);
})
And this test fails
it("correctly encodes and decodes datagrams and data", () => {
const name1: I_HelloServiceNameReturn = {value: "KVStore"};
// this line would fix it
// const data = new Uint8Array(_HelloServiceNameReturn.encode(name1));
const data = _HelloServiceNameReturn.encode(name1);
expect(data).toEqual(new Uint8Array([7, 0, 0, 0, 75, 86, 83, 116, 111, 114, 101]));
const dgram1: IRpcDatagram = {
discriminator: RpcResponseOk.discriminator,
value: {
header: { id: 1 },
data
}
};
// this is the current state at runtime
// {"discriminator":2,"value":{"header":{"id":1},"data":{"0":7,"1":0,"2":0,"3":0,"4":75,"5":86,"6":83,"7":116,"8":111,"9":114,"10":101}}}
// 17,0,0,0,2,1,0,11,0,0,0,7,0,0,0,2,1,0,11,0,0,0
// As you can see the buffer ^ is not the same as when we used our own buffer even though we confirmed the data was the same
const raw = RpcDatagram.encode(dgram1);
const dgram2 = RpcDatagram.decode(raw);
expect(dgram2.discriminator).toBe(RpcResponseOk.discriminator);
if (dgram2.discriminator == RpcResponseOk.discriminator) {
expect(dgram2.value.data).toEqual(data); // <-- Fails on this line
const name3 = _HelloServiceNameReturn.decode(dgram2.value.data);
expect(name3).toEqual(name1);
}
expect(dgram2).toEqual(dgram1);
expect(dgram2).not.toBe(dgram1);
})
Basically just use the serialized data from one type as a byte[]
value for another and serialize and then weird things happen.
There is another case where this can and will pop up, which is if you serialize data, and then for ANY reason await
before using the buffer. This could happen in networking or subtle code logic, like if you were to have
async myFn() {
const buf = MyType.encode(obj)
await doThing(buf)
await doOtherThing(buf)
}
because it is very possible, depending on what triggers myFn, to have buf
change before doOtherThing
is called.
This is also an issue where you call say a function for a database to do a thing with buf
and, without you even knowing, it chooses to await something before it uses your provided buffer. At that point it would be almost impossible to debug without knowing about this behavior why the data in your database is not the data that was submitted to the function.
You will likely only run into this secondary case if you are writing network code which triggers sterilization as multiple concurrent operations may occur naturally. What is worse about this, is that it is effectively a race condition that is very subtle and almost impossible to reproduce and debug.
Expected behavior When I get the serialized data, I expect it to continue living until I drop the last reference to it unless otherwise documented.
IMO, in a language like C++, it makes some sense to do this and expect the user to know and proceed with caution, but you would also probably use a mutex on the buffer. In TS there is NO mutex which is fine as long as you don’t run into that secondary case of await-before-use because JS is single-threaded. At minimum this needs to be documented, but JS is not a language focused on performance so I think it would be better to make a new BebopView for every time encode is called by default as it is the least likely to cause weird results. We should allow the user to bring their own view and that would allow more efficient serialization where they know what they are doing.
Bebop info:
- Version: 2.4.2-ish (
rpc-ts
branch) - Runtime: TS
Desktop (please complete the following information):
- OS: Windows
- Version: 10
Additional context Working on RPC and for this we have a datagram which contains a raw data field which is another bebop serialized type. This is how this behavior was discovered by I, @Eadword, a person who was not particularly familiar with the serialization implementation and made what I thought were reasonable assumptions about behavior.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
I would lean towards making the change out of the “principle of least surprise”, and then exposing the old fast-but-dangerous behavior as
encodeUnsafe()
. 👍I implemented a fix for this in #225 but I’m holding off on merging. Copying memory, even on small data structures, reduces performance substantially:
The
encode
method creates the illusion and promise that the buffer returned to you is unique and safe to access at anytime and that the data will remain the same; this is of course, not true, and it’s possible for Bebop to partially decode “corrupted data” (essentially the data of another type encoded prior to accessing the buffer) which could lead to all sorts of issues. If the decode function simply blew up, I’d be less worried, but given the circumstances this requires deeper discussion.Do we prioritize safety over speed or more clearly document this runtime behavior to maintain speed over ergonomics and safety?
@lynn tagging you for comment since you wrote the original implementation and I’d appreciate your perspective