Encoding error while reading databaseSee original GitHub issue
Hi, I am experiencing an encoding error while reading calima_star database in a Windows environment, and that is due to the fact that database.py relies on default encoding:
with open(fpath, 'r') as dbfile:
Then, if I try to
db = CalimaStarDB.builtin_db(), it will raise an UnicodeError for the database is in UTF-8 (I guess), and Python default encoding in Windows is that of the computer language (mine, CP1252). Is there any way I could specify the encoding while reading the builtin_db?
If I replace that line at database.py to the one below, the error will be fixed, but that is not straightforward since I intend to making camel_tools pip installable through requirements.
with open(fpath, 'r', encoding="utf-8") as dbfile:
Thanks in advance
- Created 3 years ago
Top GitHub Comments
Just a quick update.
So camel-tools should be installable on plain Windows now through the master branch. I’ve added installation instructions. However, dialectid will not be available.
Also, camel_tools.calima_star is now camel_tools.morphology (this was the breaking changes I mentioned). The usage is almost the same other than things being renamed (CalimaStarDB -> MorphologyDB, CalimaStarAnalyzer -> Analyzer, etc). Just take a look at the docs to see what changed.
A pip release should be coming in the next couple of weeks or so once everything else is stabilized.
Will it be available in the “pip” install in the future?
Yes and no.
"encoding=utf-8" will be present in a future version on pip. However, like I mentioned before, there will be breaking changes in the API you’ll need to account for. No, it will not be installable on “plain” Windows for the foreseeable future because of the issues I mentioned in my last response.
However, I will consider with the team if we should have an option to install just the components that will work on a “plain” Windows setup. This means that users won’t have access to the Dialect Identification system (perhaps other components as well). The analyzer alone should work on a plain Windows setup without issues.
It will be some time from now before a new pip version is released, so please keep an eye out (you can also fill out the form in the README to get an email when the next version is released).