question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extra bytes when decoding ByteField

See original GitHub issue

Hi,

I’m encoding and storing string fields in my dataset under the ByteField with:

title = "denise's peanut chicken"
np.frombuffer(title.encode('utf-8'), dtype=np.uint8)

When I decode the same field in the loader, I’m getting a bunch of extra trailing bytes:

>>> titles[0].tobytes().decode('utf-8')
'denise\'s peanut chicken\x00\x08\x00\x00\x00\x00\x00\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00"\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00$\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00/\x00\x00\x00\x00\x00\x00\x00\x11\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x1e\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\x00\x00-\x00\x00\x00\x00\x00\x00\x00H\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00$\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x00\x00\x00\x00\x00\x00;\x00\x00\x00\x00\x00\x00'

Where do these extra bytes come from?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
andrewilyascommented, Jan 20, 2022

Makes sense! AFAIK I think that there are no guarantees for the padding of ByteField as of now, so the best thing to do for now would be to store the length of the string as a field and slice it that way.

0reactions
andrewilyascommented, Jan 21, 2022

Closing this now, feel free to re-open (or open another issue) if anything else comes up!

Read more comments on GitHub >

github_iconTop Results From Across the Web

hex - Decoding Thrift Object what are these extra bytes?
Every field in a struct has a 1 byte type and a 2 byte field ID prefix. In compact protocol field ids, when...
Read more >
Decompress consumes an extra byte when extra data are ...
One has to read as much as data there are and pass them to tinfl_decompress() to let it decode the deflated data. Once...
Read more >
encoding/json treats []byte as b64 encoded. Could it not?
Basically the current behavior is that if you have a struct with a []byte field that you pass into the json marshaler, it...
Read more >
Encoding | Protocol Buffers - Google Developers
Variable-width integers, or varints, are at the core of the wire format. They allow encoding unsigned 64-bit integers using anywhere between one and...
Read more >
Understanding the ELF. or, writing an ELF quine - Medium
This is just like decimal with six extra characters, making it base-16 instead of base-10. A byte can be written using two hex...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found