question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] UnicodeDecodeError when accessing Tensor.meta

See original GitHub issue

🐛🐛 Bug Report

⚗️ Current Behavior

Im getting an error when reading an image file that seems to be fine.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte

It happens when accessing Tensor.meta apparently.

I don’t think I can attach files. I’ll happily share it. The image is 0077 from the NWPU crowd dataset.

Full trace:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/var/folders/rm/y4mfyltx67d84r0n1h3ccgp80000gn/T/ipykernel_25645/1849073815.py in <module>
      5         pts = [[]]
      6 
----> 7     ds.images.append(hub.read(img_paths[i]))
      8     ds.points.append(np.array(pts, dtype=np.uint32))

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/util/invalid_view_op.py in inner(x, *args, **kwargs)
     14                 type(x).__name__,
     15             )
---> 16         return callable(x, *args, **kwargs)
     17 
     18     return inner

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in append(self, sample)
    321             sample (InputSample): The data to append to the tensor. `Sample` is generated by `hub.read`. See the above examples.
    322         """
--> 323         self.extend([sample])
    324 
    325     def clear(self):

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/util/invalid_view_op.py in inner(x, *args, **kwargs)
     14                 type(x).__name__,
     15             )
---> 16         return callable(x, *args, **kwargs)
     17 
     18     return inner

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in extend(self, samples)
    265         """
    266         self._write_initialization()
--> 267         self.chunk_engine.extend(
    268             samples, link_callback=self._append_to_links if self.meta.links else None
    269         )

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/chunk_engine.py in extend(self, samples, link_callback)
    640             if link_callback:
    641                 for sample in samples:
--> 642                     link_callback(sample, flat=None)
    643 
    644         self.cache.autoflush = initial_autoflush

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in _append_to_links(self, sample, flat)
    682         for k, v in self.meta.links.items():
    683             if flat is None or v["flatten_sequence"] == flat:
--> 684                 self.dataset[k].append(get_link_transform(v["append"])(sample))
    685 
    686     def _update_links(

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor_link.py in __call__(self, *args, **kwargs)
     21             out_kwargs = {k: v for k, v in kwargs.items() if k in self.kwargs}
     22             return self.f(*args, **out_kwargs)
---> 23         return self.f(args[0])
     24 
     25     def __str__(self):

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor_link.py in append_info(sample)
     48 def append_info(sample):
     49     if isinstance(sample, hub.core.sample.Sample):
---> 50         meta = sample.meta
     51         meta["modified"] = False
     52         return meta

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in meta(self)
    396         compression_type = get_compression_type(self.compression)
    397         if compression_type == IMAGE_COMPRESSION:
--> 398             meta["exif"] = self._getexif()
    399         if compression_type == VIDEO_COMPRESSION:
    400             meta.update(self._get_video_meta())

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in _getexif(self)
    386         else:
    387             img = Image.open(BytesIO(self.buffer))
--> 388         return {
    389             TAGS.get(k, k): f"{v.decode() if isinstance(v, bytes) else v}"
    390             for k, v in img.getexif().items()

~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in <dictcomp>(.0)
    387             img = Image.open(BytesIO(self.buffer))
    388         return {
--> 389             TAGS.get(k, k): f"{v.decode() if isinstance(v, bytes) else v}"
    390             for k, v in img.getexif().items()
    391         }

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte

Input Code

hub.read("0077.jpg").meta

Expected behavior/code The image loads and is appended to the dataset as expected

⚙️ Environment

  • Python version(s):
    • good: [e.g. 3.9.9]
  • OS: [M1 macOS]
  • IDE: [VS-Code]

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:14 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
farizrahman4ucommented, Apr 19, 2022

@pietz above workaround decompresses the image and recompresses it, which can slow things down. A better work around is to pass create_sample_info_tensor=False to your create_tensor call to ignore exif and other meta data. Example:

ds.create_tensor("image", sample_compression="jpeg", create_sample_info_tensor=False)

1reaction
pietzcommented, Apr 19, 2022

I’ll let you know when I see your avatar among the 8359 people, haha.

Read more comments on GitHub >

github_iconTop Results From Across the Web

'utf-8' codec can't decode byte 0xbc in position 0: invalid start ...
It seems like TensorBoard errs while recursively traversing the logdir. The first error seems like the relevant one. The second error seems ...
Read more >
UnicodeDecodeError from tf.train.import_meta_graph
I think that Tensorflow is using a different encoding when serializing vs when deserializing.
Read more >
Facenet with DeepStream Python Not Able to Parse Output ...
I have enabled output tensor meta in the config file output-tensor-meta=1 . The prob I have added it on nvvidconv sink pad.
Read more >
QGIS Raster merging UnicodeDecodeError: invalid start byte
I Tried merging two Raster Layers (Digital surface models), but that's the output I get. Does anyone know what the problem is? QGIS...
Read more >
How to Solve File Encoding Error in Python - InsideAIML
But as I execute the above code, I am getting error an error shown below. UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found