Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JSONDecodeError when running feat extract after doing kaldi import

See original GitHub issue

Hi! If I run lhotse kaldi import <kaldidir> 16000 <newdir> and then lhotse feat extract <newdir>/recordings.jsonl.gz <newdir>/feat I get an error

  File "/home/rudolf/.local/lib/python3.8/site-packages/lhotse/bin/modes/", line 86, in extract
    recordings: RecordingSet = RecordingSet.from_json(recording_manifest)
  File "/usr/lib/python3.8/json/", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 285)

There’s nothing wrong with the kaldi dir (it was fixed and I’ve validated it). This is lhotse 1.2, I tried out both pip install lhotse and pip install lhotse[orjson].

This is what the first 3 lines of zcat look like:

{"id": "5ccae615b4e948578998a20f", "sources": [{"type": "file", "channels": [0], "source": "/path/to/wav/5ccae615b4e948578998a20f-wav.wav"}], "sampling_rate": 16000, "num_samples": 24992427, "duration": 1562.0266875}
{"id": "5ccae699b4e948578998a211", "sources": [{"type": "file", "channels": [0], "source": "/path/to/wav/wav/5ccae699b4e948578998a211-wav.wav"}], "sampling_rate": 16000, "num_samples": 25830741, "duration": 1614.4213125}
{"id": "5ccae7b2b4e948578998a215", "sources": [{"type": "file", "channels": [0], "source": "/path/to/wav/5ccae7b2b4e948578998a215-wav.wav"}], "sampling_rate": 16000, "num_samples": 11936427, "duration": 746.0266875}

Any ideas?

edit: Seems it’s just loading it as a json when it should be reading line by line? should I be passing a specific cli flag?

edit2: I think the fix is to change from_json to from_file here, I could make a PR?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

RuABrauncommented, May 30, 2022

Thanks guys!

PS the kaldi import function is suuuper useful, just wanted to highlight that.

pzelaskocommented, May 30, 2022

You should be able to use both phone and bpe lang dirs but I’m not sure if the phone recipe is regularly tested (I recall fixing it once some time ago; it uses a bit different components).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python YFinance json.decoder.JSONDecodeError
If I run the script with only 50 tickers, I don't end up getting any error and it works fine, but when I...
Read more >
JSONDecodeError while accessing Salesforce's ContentNote ...
If I use the same ContentNote soql query in workbench or talend using salesforce object, I am able to get the coded notes...
Read more >
JSONDecodeError on Python 3.10 · Issue #4210 - GitHub
Issue. Poetry crashes whenever it evaluates a package for installation. Running poetry install will return JSONDecodeError 100% of the time.
Read more >
Solved: JSONDecodeError When Accessing a Feature Layer
I experience the issue when using either versions 1.6.2 or 1.7.1 of the arcgis python API. Solved! Go to Solution.
Read more >
Python JSONDecodeError Explanation and Solution | CK
To read this data into our program, we can use the json module: import json with open("equipment.json") as file: data = json ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found