Encoding error while reading database
See original GitHub issueHi, I am experiencing an encoding error while reading calima_star database in a Windows environment, and that is due to the fact that database.py relies on default encoding:
with open(fpath, 'r') as dbfile:
Then, if I try to db = CalimaStarDB.builtin_db()
, it will raise an UnicodeError for the database is in UTF-8 (I guess), and Python default encoding in Windows is that of the computer language (mine, CP1252). Is there any way I could specify the encoding while reading the builtin_db?
If I replace that line at database.py to the one below, the error will be fixed, but that is not straightforward since I intend to making camel_tools pip installable through requirements.
with open(fpath, 'r', encoding="utf-8") as dbfile:
Thanks in advance
Issue Analytics
- State:
- Created 3 years ago
- Comments:8
Top Results From Across the Web
Why do I get a unicode encoding error in the middle of reading ...
There is most likely an invalid Unicode character in the file that you are reading from. You can try to either remove it...
Read more >How can I fix the UTF-8 error when bulk uploading users?
This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the...
Read more >UnicodeDecodeError: 'utf-8' codec can't decode byte [...] in ...
When Pandas reads a CSV, by default it assumes that the encoding is UTF-8. When the following error occurs, the CSV parser encounters...
Read more >Troubleshooting Encoding Issues When Integrating Data from ...
Here we will highlight the common indicators of these issues along with some possible solutions. EXAMPLES OF ENCODING ERRORS. UNREADABLE CHARACTERS. Have you ......
Read more >SQL Encoding Read Error · Issue #720 · sequelpro ... - GitHub
I connected via SSH to a database and wanted to import a database. The database is utf8 and Sequel Pro also shows UTF8...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @alvelvis,
Just a quick update.
So camel-tools should be installable on plain Windows now through the master branch. I’ve added installation instructions. However, dialectid will not be available.
Also, camel_tools.calima_star is now camel_tools.morphology (this was the breaking changes I mentioned). The usage is almost the same other than things being renamed (CalimaStarDB -> MorphologyDB, CalimaStarAnalyzer -> Analyzer, etc). Just take a look at the docs to see what changed.
A pip release should be coming in the next couple of weeks or so once everything else is stabilized.
Yes and no.
Yes the
"encoding=utf-8"
will be present in a future version on pip. However, like I mentioned before, there will be breaking changes in the API you’ll need to account for. No, it will not be installable on “plain” Windows for the foreseeable future because of the issues I mentioned in my last response.However, I will consider with the team if we should have an option to install just the components that will work on a “plain” Windows setup. This means that users won’t have access to the Dialect Identification system (perhaps other components as well). The analyzer alone should work on a plain Windows setup without issues.
It will be some time from now before a new pip version is released, so please keep an eye out (you can also fill out the form in the README to get an email when the next version is released).