[BUG] UnicodeDecodeError when accessing Tensor.meta
See original GitHub issue🐛🐛 Bug Report
⚗️ Current Behavior
Im getting an error when reading an image file that seems to be fine.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte
It happens when accessing Tensor.meta apparently.
I don’t think I can attach files. I’ll happily share it. The image is 0077 from the NWPU crowd dataset.
Full trace:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
/var/folders/rm/y4mfyltx67d84r0n1h3ccgp80000gn/T/ipykernel_25645/1849073815.py in <module>
5 pts = [[]]
6
----> 7 ds.images.append(hub.read(img_paths[i]))
8 ds.points.append(np.array(pts, dtype=np.uint32))
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/util/invalid_view_op.py in inner(x, *args, **kwargs)
14 type(x).__name__,
15 )
---> 16 return callable(x, *args, **kwargs)
17
18 return inner
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in append(self, sample)
321 sample (InputSample): The data to append to the tensor. `Sample` is generated by `hub.read`. See the above examples.
322 """
--> 323 self.extend([sample])
324
325 def clear(self):
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/util/invalid_view_op.py in inner(x, *args, **kwargs)
14 type(x).__name__,
15 )
---> 16 return callable(x, *args, **kwargs)
17
18 return inner
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in extend(self, samples)
265 """
266 self._write_initialization()
--> 267 self.chunk_engine.extend(
268 samples, link_callback=self._append_to_links if self.meta.links else None
269 )
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/chunk_engine.py in extend(self, samples, link_callback)
640 if link_callback:
641 for sample in samples:
--> 642 link_callback(sample, flat=None)
643
644 self.cache.autoflush = initial_autoflush
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor.py in _append_to_links(self, sample, flat)
682 for k, v in self.meta.links.items():
683 if flat is None or v["flatten_sequence"] == flat:
--> 684 self.dataset[k].append(get_link_transform(v["append"])(sample))
685
686 def _update_links(
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor_link.py in __call__(self, *args, **kwargs)
21 out_kwargs = {k: v for k, v in kwargs.items() if k in self.kwargs}
22 return self.f(*args, **out_kwargs)
---> 23 return self.f(args[0])
24
25 def __str__(self):
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/tensor_link.py in append_info(sample)
48 def append_info(sample):
49 if isinstance(sample, hub.core.sample.Sample):
---> 50 meta = sample.meta
51 meta["modified"] = False
52 return meta
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in meta(self)
396 compression_type = get_compression_type(self.compression)
397 if compression_type == IMAGE_COMPRESSION:
--> 398 meta["exif"] = self._getexif()
399 if compression_type == VIDEO_COMPRESSION:
400 meta.update(self._get_video_meta())
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in _getexif(self)
386 else:
387 img = Image.open(BytesIO(self.buffer))
--> 388 return {
389 TAGS.get(k, k): f"{v.decode() if isinstance(v, bytes) else v}"
390 for k, v in img.getexif().items()
~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/hub/core/sample.py in <dictcomp>(.0)
387 img = Image.open(BytesIO(self.buffer))
388 return {
--> 389 TAGS.get(k, k): f"{v.decode() if isinstance(v, bytes) else v}"
390 for k, v in img.getexif().items()
391 }
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte
Input Code
hub.read("0077.jpg").meta
Expected behavior/code The image loads and is appended to the dataset as expected
⚙️ Environment
Python
version(s):- good: [e.g. 3.9.9]
OS
: [M1 macOS]IDE
: [VS-Code]
Issue Analytics
- State:
- Created a year ago
- Comments:14 (2 by maintainers)
Top Results From Across the Web
'utf-8' codec can't decode byte 0xbc in position 0: invalid start ...
It seems like TensorBoard errs while recursively traversing the logdir. The first error seems like the relevant one. The second error seems ...
Read more >UnicodeDecodeError from tf.train.import_meta_graph
I think that Tensorflow is using a different encoding when serializing vs when deserializing.
Read more >Facenet with DeepStream Python Not Able to Parse Output ...
I have enabled output tensor meta in the config file output-tensor-meta=1 . The prob I have added it on nvvidconv sink pad.
Read more >QGIS Raster merging UnicodeDecodeError: invalid start byte
I Tried merging two Raster Layers (Digital surface models), but that's the output I get. Does anyone know what the problem is? QGIS...
Read more >How to Solve File Encoding Error in Python - InsideAIML
But as I execute the above code, I am getting error an error shown below. UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@pietz above workaround decompresses the image and recompresses it, which can slow things down. A better work around is to pass
create_sample_info_tensor=False
to yourcreate_tensor
call to ignore exif and other meta data. Example:ds.create_tensor("image", sample_compression="jpeg", create_sample_info_tensor=False)
I’ll let you know when I see your avatar among the 8359 people, haha.