Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] UnicodeDecodeError when accessing Tensor.meta

See original GitHub issue

🐛🐛 Bug Report

⚗️ Current Behavior

Im getting an error when reading an image file that seems to be fine.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte

It happens when accessing Tensor.meta apparently.

I don’t think I can attach files. I’ll happily share it. The image is 0077 from the NWPU crowd dataset.

Full trace:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/var/folders/rm/y4mfyltx67d84r0n1h3ccgp80000gn/T/ipykernel_25645/1849073815.py in <module>
      5         pts = [[]]
      6 
----> 7     ds.images.append(hub.read(img_paths[i]))
      8     ds.points.append(np.array(pts, dtype=np.uint32))

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/util/invalid_view_op.py in inner(x, *args, **kwargs)
     14                 type(x).__name__,
     15             )
---> 16         return callable(x, *args, **kwargs)
     17 
     18     return inner

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in append(self, sample)
    321             sample (InputSample): The data to append to the tensor. `Sample` is generated by `hub.read`. See the above examples.
    322         """
--> 323         self.extend([sample])
    324 
    325     def clear(self):

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/util/invalid_view_op.py in inner(x, *args, **kwargs)
     14                 type(x).__name__,
     15             )
---> 16         return callable(x, *args, **kwargs)
     17 
     18     return inner

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in extend(self, samples)
    265         """
    266         self._write_initialization()
--> 267         self.chunk_engine.extend(
    268             samples, link_callback=self._append_to_links if self.meta.links else None
    269         )

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/chunk_engine.py in extend(self, samples, link_callback)
    640             if link_callback:
    641                 for sample in samples:
--> 642                     link_callback(sample, flat=None)
    643 
    644         self.cache.autoflush = initial_autoflush

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in _append_to_links(self, sample, flat)
    682         for k, v in self.meta.links.items():
    683             if flat is None or v["flatten_sequence"] == flat:
--> 684                 self.dataset[k].append(get_link_transform(v["append"])(sample))
    685 
    686     def _update_links(

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor_link.py in __call__(self, *args, **kwargs)
     21             out_kwargs = {k: v for k, v in kwargs.items() if k in self.kwargs}
     22             return self.f(*args, **out_kwargs)
---> 23         return self.f(args[0])
     24 
     25     def __str__(self):

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor_link.py in append_info(sample)
     48 def append_info(sample):
     49     if isinstance(sample, hub.core.sample.Sample):
---> 50         meta = sample.meta
     51         meta["modified"] = False
     52         return meta

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in meta(self)
    396         compression_type = get_compression_type(self.compression)
    397         if compression_type == IMAGE_COMPRESSION:
--> 398             meta["exif"] = self._getexif()
    399         if compression_type == VIDEO_COMPRESSION:
    400             meta.update(self._get_video_meta())

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in _getexif(self)
    386         else:
    387             img = Image.open(BytesIO(self.buffer))
--> 388         return {
    389             TAGS.get(k, k): f"{v.decode() if isinstance(v, bytes) else v}"
    390             for k, v in img.getexif().items()

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in <dictcomp>(.0)
    387             img = Image.open(BytesIO(self.buffer))
    388         return {
--> 389             TAGS.get(k, k): f"{v.decode() if isinstance(v, bytes) else v}"
    390             for k, v in img.getexif().items()
    391         }

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte

Input Code

hub.read("0077.jpg").meta

Expected behavior/code The image loads and is appended to the dataset as expected

⚙️ Environment

Python version(s):
- good: [e.g. 3.9.9]
OS: [M1 macOS]
IDE: [VS-Code]

Issue Analytics

State:
Created a year ago
Comments:14 (2 by maintainers)

Top GitHub Comments

1reaction

farizrahman4ucommented, Apr 19, 2022

@pietz above workaround decompresses the image and recompresses it, which can slow things down. A better work around is to pass create_sample_info_tensor=False to your create_tensor call to ignore exif and other meta data. Example:

ds.create_tensor("image", sample_compression="jpeg", create_sample_info_tensor=False)

1reaction

pietzcommented, Apr 19, 2022

I’ll let you know when I see your avatar among the 8359 people, haha.

Top Results From Across the Web

'utf-8' codec can't decode byte 0xbc in position 0: invalid start ...

It seems like TensorBoard errs while recursively traversing the logdir. The first error seems like the relevant one. The second error seems ...

UnicodeDecodeError from tf.train.import_meta_graph

I think that Tensorflow is using a different encoding when serializing vs when deserializing.

Facenet with DeepStream Python Not Able to Parse Output ...

I have enabled output tensor meta in the config file output-tensor-meta=1 . The prob I have added it on nvvidconv sink pad.

QGIS Raster merging UnicodeDecodeError: invalid start byte

I Tried merging two Raster Layers (Digital surface models), but that's the output I get. Does anyone know what the problem is? QGIS...

How to Solve File Encoding Error in Python - InsideAIML

But as I execute the above code, I am getting error an error shown below. UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[BUG] UnicodeDecodeError when accessing Tensor.meta

🐛🐛 Bug Report

⚗️ Current Behavior

⚙️ Environment

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[BUG] Filtering with user-defined-functions returns different dataset as when using filtering with pythonic query language

[FEATURE] Improve PyTorch-user adoption by better aligning with expected behaviours of Dataset and DataLoader