question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[python 3.8] UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 45: ordinal not in range(128)

See original GitHub issue

Hello,

I’ve just made an updated conda environment for python 3.8 and I can’t read loom files using anndata.read_loom() anymore. It gives me this error (see full traceback below):

Traceback (most recent call last):

  File "<ipython-input-2-b0b79aae2f29>", line 1, in <module>
    adata = anndata.read_loom('/home/clarice/Documents/SingleCell_PseudoTime/data/CHLA9.loom')

  File "/home/clarice/.local/lib/python3.8/site-packages/anndata/_io/read.py", line 225, in read_loom
    var = dict(lc.row_attrs)

  File "/home/clarice/anaconda3/lib/python3.8/site-packages/loompy/attribute_manager.py", line 102, in __getitem__
    return self.__getattr__(thing)

  File "/home/clarice/anaconda3/lib/python3.8/site-packages/loompy/attribute_manager.py", line 119, in __getattr__
    vals = loompy.materialize_attr_values(self.ds._file[a][name][:])

  File "/home/clarice/anaconda3/lib/python3.8/site-packages/loompy/normalize.py", line 98, in materialize_attr_values
    result = np.array([html.unescape(x) for x in temp.astype(str)], dtype=object)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 45: ordinal not in range(128)

Of note: I can read the same file in my python3.7 environment, but it prints a message:

Variable names are not unique. To make them unique, call '.var_names_make_unique'.

It’s always been like this. After running .var_names_make_unique, it all works out perfectly.

Any idea why UnicodeDecoder is failing? Is there anything I can do?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

6reactions
SergejNcommented, May 17, 2021

Ok, I think I got it. It should also take care of #141 . After some hours of debugging I realized that the file gencode.v31.metadata.tab, which I downloaded from https://storage.googleapis.com/linnarsson-lab-www-blobs/human_GRCh38_gencode.v31.tar.gz contains non-ASCII symbols:

[nowoshil@vieccews0302 human_GRCh38_gencode.v31.600]$ grep --color='auto' -P -n "[^\x00-\x7F]" gencode.v31.metadata.tab
33589:ENSG00000175634   ENSG00000175634.15      RPS6KB2 ribosomal protein S6 kinase B2  protein_coding  HGNC:10437      chr11   67428460        67435401   protein-coding gene     gene with protein product       11q13.2 11q13.2 "p70S6Kb|P70-BETA|STK14B|KLS|S6KB|S6Kbeta|S6Kβ" OTTHUMG00000167673uc001old.4       NM_003952       CCDS41677       Q9UBS0  "9878560|9804755"       MGI:1927343     RGD:1305144     RPS6KB2 608939          False
33759:ENSG00000110203   ENSG00000110203.9       FOLR3   folate receptor gamma   protein_coding  HGNC:3795       chr11   72114869        72139892  protein-coding gene      gene with protein product       11q13.4 11q13.4 "FR-G|FRγ"      OTTHUMG00000167870      uc031xur.2      NM_000804       CCDS73344  P41439  8110752                 FOLR3   602469          False
33764:ENSG00000110195   ENSG00000110195.13      FOLR1   folate receptor alpha   protein_coding  HGNC:3791       chr11   72189558        72196323  protein-coding gene      gene with protein product       11q13.4 11q13.4 FRα     OTTHUMG00000167876      uc001osa.3      NM_016725       CCDS8211  P15328   1717147 MGI:95568       RGD:71032       FOLR1   136430          False
33765:ENSG00000165457   ENSG00000165457.14      FOLR2   folate receptor beta    protein_coding  HGNC:3793       chr11   72216601        72221950  protein-coding gene      gene with protein product       11q13.4 11q13.4 FRβ     OTTHUMG00000150394      uc001ose.5      NM_000803       CCDS8212  P14207   "7698003|8110752"       MGI:95569       RGD:1308515     FOLR2   136425          False
44873:ENSG00000166501   ENSG00000166501.14      PRKCB   protein kinase C beta   protein_coding  HGNC:9395       chr16   23835983        24220611  protein-coding gene      gene with protein product       16p12.2-p12.1   16p12.2-p12.1   PKCβ    OTTHUMG00000131615      uc002dmd.4      NM_212535 "CCDS10618|CCDS10619"    P05771  3658678 MGI:97596       RGD:3396        PRKCB   176970          False
49067:ENSG00000154229   ENSG00000154229.12      PRKCA   protein kinase C alpha  protein_coding  HGNC:9393       chr17   66302613        66810743  protein-coding gene      gene with protein product       17q24.2 17q24.2 PKCα    OTTHUMG00000179533      uc002jfp.2      NM_002737       CCDS11664 P17252           MGI:97595       RGD:3395        PRKCA   176960          False
52643:ENSG00000105221   ENSG00000105221.17      AKT2    AKT serine/threonine kinase 2   protein_coding  HGNC:392        chr19   40230317        40285536   protein-coding gene     gene with protein product       19q13.2 19q13.2 PKBβ    OTTHUMG00000137375      uc002onf.3      NM_001626       "CCDS12552|CCDS82350"      P31751  1409633 MGI:104874      RGD:2082        AKT2    164731          False
53513:ENSG00000126583   ENSG00000126583.11      PRKCG   protein kinase C gamma  protein_coding  HGNC:9402       chr19   53879190        53907652  protein-coding gene      gene with protein product       19q13.42        19q13.42        "PKCC|MGC57564|PKCγ"    OTTHUMG00000064846      uc002qcq.2NM_002739        CCDS12867       P05129  "8432525|3755548"       MGI:97597       RGD:3397        PRKCG   176980          False
58592:ENSG00000089289   ENSG00000089289.16      IGBP1   immunoglobulin binding protein 1        protein_coding  HGNC:5461       chrX    70133447  70166324 protein-coding gene     gene with protein product       Xq13.1  Xq13.1  α4      OTTHUMG00000021767      uc004dxv.4      NM_001370192    CCDS14396  P78318  9441740 MGI:1346500     RGD:62011       IGBP1   300139          False
59609:ENSG00000129675   ENSG00000129675.16      ARHGEF6 Rac/Cdc42 guanine nucleotide exchange factor 6  protein_coding  HGNC:685        chrX    136665547  136780932       protein-coding gene     gene with protein product       Xq26.3  Xq26.3  "alphaPIX|Cool-2|KIAA0006|alpha-PIX|Cool2|αPix" OTTHUMG00000022518 uc004fab.5      NM_004840       "CCDS14660|CCDS78509"   Q15052  "7584048|9659915"       MGI:1920591     RGD:1359674     ARHGEF6 300267             False

I played around with the locale settings of my Docker container, but it didn’t bring much. I ended up patching the file normalize.py as follows:

--- /usr/local/lib/python3.9/site-packages/loompy/normalize.py  2021-05-17 13:00:47.120228000 +0200
+++ /usr/local/lib/python3.9/site-packages/loompy/normalize.py  2021-05-17 13:00:47.120228000 +0200
@@ -95,7 +95,10 @@
                else:
                        temp = a
                # Then unescape XML entities and convert to unicode
-               result = np.array([html.unescape(x) for x in temp.astype(str)], dtype=object)
+               try:
+                       result = np.array([html.unescape(x) for x in temp.astype(str)], dtype=object)
+               except:
+                       result = np.array([html.unescape(x.decode("utf-8")) for x in temp], dtype=object)
        elif np.issubdtype(a.dtype, np.str_) or np.issubdtype(a.dtype, np.unicode_):
                result = np.array(a.astype(str), dtype=object)
        else:

I’m not sure how x.decode("utf-8") impacts the performance, therefore, the modified branch is only executed for the few lines above that would otherwise make UnicodeDecoder fail.

0reactions
stela2502commented, Sep 23, 2021

The fix is in your git code, but this is too new to be installed with “pip install loompy”. So I likely just need to wait. Until then an install from git is sufficient to fix the problem. No pull request necessary any more. But Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

'ascii' codec can't decode byte 0xd1 in position 2: ordinal not ...
The reason for the error being that Python is trying to automatically decode it from the default encoding, ASCII, so that it can...
Read more >
'Ascii' Codec Can't Decode Byte 0Xe9 In Position 20 - ADocLib
[python 3.8] UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 45: ordinal not in range128 #149. Python UnicodeDecodeError: 'cp950' codec can ...
Read more >
'ascii' codec can't decode byte 0xb0 in position 37: ordinal not ...
UnicodeDecodeError : 'ascii' codec can't decode byte 0xb0 in position 37: ordinal not in range(128). An unexpected error has occurred:.
Read more >
'ascii' codec can't decode byte 0xc3 in position 27: ordinal not ...
Hi everyone;i'm trying to add constraints of unicity;i can show the message in the beginning but after 2 tests i show this message:...
Read more >
Python UnicodeDecodeError utf-8 codec can t decode byte ...
Python UnicodeDecodeError utf-8 codec can t decode byte 0xa0 in position ... against non-ASCII input which is not allowed by my application.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found