dvc.api.read() raises an "UnicodeDecodeError"
See original GitHub issueI am trying to acess a DICOM file [image saved in the Digital Imaging and Communications in Medicine (DICOM) format]:
import dvc.api
path = 'dir/image.dcm'
remote = 'remote_name'
repo = 'git_repo'
mode = 'r'
data = dvc.api.read(path = path, remote = remote, repo = repo, mode = mode)
When I run the previous code, and after the “downloading progress bar” is complete, I get the following error:
Traceback (most recent call last): File "draft.py", line 7, in <module> mode ='r') File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\api.py", line 91, in read return fd.read() File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 764: character maps to <undefined>
I tried to overcome this issue by using the encoding argument:
data = dvc.api.read(path = path, remote = remote, repo = repo, mode = mode, encoding='ANSI')
Since, when I open a DICOM file using for example Notepad++, this is the encoding specified. However, it raises the error:
Exception ignored in: <bound method Pool.__del__ of <dvc.fs.pool.Pool object at 0x0000021D1347A160>> Traceback (most recent call last): File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\fs\pool.py", line 42, in __del__ File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\fs\pool.py", line 46, in close File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\fs\ssh\connection.py", line 71, in close File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\paramiko\sftp_client.py", line 194, in close File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\paramiko\sftp_client.py", line 185, in _log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\paramiko\sftp.py", line 158, in _log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\logging\__init__.py", line 1372, in log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\logging\__init__.py", line 1441, in _log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\logging\__init__.py", line 1411, in makeRecord TypeError: 'NoneType' object is not callable
I also tried encoding = 'utf-8'
, but the “UnicodeDecodeError” continues to appear:
Traceback (most recent call last): File "draft.py", line 7, in <module> mode ='r', encoding='utf-8') File "C:\Users\lbrandao\anaconda3\envs\ccab_env_dev\lib\site-packages\dvc\api.py", line 91, in read return fd.read() File "C:\Users\lbrandao\anaconda3\envs\ccab_env_dev\lib\codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 140: invalid continuation byte
Can anyone please help? Thanks.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
@pared Here it is:
By the way, can the API work with lower DVC versions, e.g. 0.9.4?
@lilianabrandao We’ve migrated to sshfs (asyncssh instead of paramiko inside), so that ssh error that you were getting with
rb
should be resolved in recent dvc versions. Please give it a try and let us know if you still run into this issue. Thank you!