Extra bytes when decoding ByteField
See original GitHub issueHi,
I’m encoding and storing string fields in my dataset under the ByteField with:
title = "denise's peanut chicken"
np.frombuffer(title.encode('utf-8'), dtype=np.uint8)
When I decode the same field in the loader, I’m getting a bunch of extra trailing bytes:
>>> titles[0].tobytes().decode('utf-8')
'denise\'s peanut chicken\x00\x08\x00\x00\x00\x00\x00\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00"\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00$\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00/\x00\x00\x00\x00\x00\x00\x00\x11\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x1e\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\x00\x00-\x00\x00\x00\x00\x00\x00\x00H\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00$\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x00\x00\x00\x00\x00\x00;\x00\x00\x00\x00\x00\x00'
Where do these extra bytes come from?
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
hex - Decoding Thrift Object what are these extra bytes?
Every field in a struct has a 1 byte type and a 2 byte field ID prefix. In compact protocol field ids, when...
Read more >Decompress consumes an extra byte when extra data are ...
One has to read as much as data there are and pass them to tinfl_decompress() to let it decode the deflated data. Once...
Read more >encoding/json treats []byte as b64 encoded. Could it not?
Basically the current behavior is that if you have a struct with a []byte field that you pass into the json marshaler, it...
Read more >Encoding | Protocol Buffers - Google Developers
Variable-width integers, or varints, are at the core of the wire format. They allow encoding unsigned 64-bit integers using anywhere between one and...
Read more >Understanding the ELF. or, writing an ELF quine - Medium
This is just like decimal with six extra characters, making it base-16 instead of base-10. A byte can be written using two hex...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Makes sense! AFAIK I think that there are no guarantees for the padding of ByteField as of now, so the best thing to do for now would be to store the length of the string as a field and slice it that way.
Closing this now, feel free to re-open (or open another issue) if anything else comes up!