JSONDecodeError when running feat extract after doing kaldi import
See original GitHub issueHi!
If I run lhotse kaldi import <kaldidir> 16000 <newdir>
and then
lhotse feat extract <newdir>/recordings.jsonl.gz <newdir>/feat
I get an error
[..]
File "/home/rudolf/.local/lib/python3.8/site-packages/lhotse/bin/modes/features.py", line 86, in extract
recordings: RecordingSet = RecordingSet.from_json(recording_manifest)
[..]
File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 285)
There’s nothing wrong with the kaldi dir (it was fixed and I’ve validated it). This is lhotse 1.2, I tried out both pip install lhotse
and pip install lhotse[orjson]
.
This is what the first 3 lines of zcat
look like:
{"id": "5ccae615b4e948578998a20f", "sources": [{"type": "file", "channels": [0], "source": "/path/to/wav/5ccae615b4e948578998a20f-wav.wav"}], "sampling_rate": 16000, "num_samples": 24992427, "duration": 1562.0266875}
{"id": "5ccae699b4e948578998a211", "sources": [{"type": "file", "channels": [0], "source": "/path/to/wav/wav/5ccae699b4e948578998a211-wav.wav"}], "sampling_rate": 16000, "num_samples": 25830741, "duration": 1614.4213125}
{"id": "5ccae7b2b4e948578998a215", "sources": [{"type": "file", "channels": [0], "source": "/path/to/wav/5ccae7b2b4e948578998a215-wav.wav"}], "sampling_rate": 16000, "num_samples": 11936427, "duration": 746.0266875}
Any ideas?
edit: Seems it’s just loading it as a json when it should be reading line by line? should I be passing a specific cli flag?
edit2: I think the fix is to change from_json
to from_file
here, I could make a PR?
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Python YFinance json.decoder.JSONDecodeError
If I run the script with only 50 tickers, I don't end up getting any error and it works fine, but when I...
Read more >JSONDecodeError while accessing Salesforce's ContentNote ...
If I use the same ContentNote soql query in workbench or talend using salesforce object, I am able to get the coded notes...
Read more >JSONDecodeError on Python 3.10 · Issue #4210 - GitHub
Issue. Poetry crashes whenever it evaluates a package for installation. Running poetry install will return JSONDecodeError 100% of the time.
Read more >Solved: JSONDecodeError When Accessing a Feature Layer
I experience the issue when using either versions 1.6.2 or 1.7.1 of the arcgis python API. Solved! Go to Solution.
Read more >Python JSONDecodeError Explanation and Solution | CK
To read this data into our program, we can use the json module: import json with open("equipment.json") as file: data = json ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks guys!
PS the kaldi import function is suuuper useful, just wanted to highlight that.
You should be able to use both phone and bpe lang dirs but I’m not sure if the phone recipe is regularly tested (I recall fixing it once some time ago; it uses a bit different components).