Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error: Job not requeued because: timed-out and not checkpointable.

See original GitHub issue

When I execute:

python -m cc_net -l fa

It throws the following exception:

  File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 502, in readinto
    n = self.fp.readinto(b)
  File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.8/site-packages/submitit/core/job_environment.py", line 185, in checkpoint_and_try_requeue
    raise utils.UncompletedJobError(message)
submitit.core.utils.UncompletedJobError: Job not requeued because: timed-out and not checkpointable.

Here is the full log.err:

2021-01-20 22:08 INFO 20945:cc_net.process_wet_file - Parsed 1 / 16000 files. Estimated remaining time: 177.9h
2021-01-20 22:08 INFO 20945:root - Starting download of https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2019-09/segments/155024747>
2021-01-20 22:08 INFO 20945:root - Downloaded https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2019-09/segments/1550247479101.30/we>
2021-01-20 22:08 INFO 20945:cc_net.process_wet_file - Kept 41_939 documents over 44_039 (95.2%).
2021-01-20 22:08 INFO 20945:cc_net.process_wet_file - Parsed 2 / 16000 files. Estimated remaining time: 147.7h
2021-01-20 22:08 INFO 20945:root - Starting download of https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2019-09/segments/155024747>
submitit WARNING (2021-01-20 22:08:54,313) - Caught signal 10 on 8095a4502934: this job is timed-out.
2021-01-20 22:08 WARNING 20945:submitit - Caught signal 10 on 8095a4502934: this job is timed-out.
2021-01-20 22:08 INFO 20945:submitit - Job not requeued because: timed-out and not checkpointable.
2021-01-20 22:08 INFO 20945:root - Downloaded https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2019-09/segments/1550247479101.30/we>
submitit WARNING (2021-01-20 22:08:54,522) - Bypassing signal 15
submitit WARNING (2021-01-20 22:08:54,522) - Bypassing signal 15
2021-01-20 22:08 WARNING 20956:submitit - Bypassing signal 15
2021-01-20 22:08 WARNING 20957:submitit - Bypassing signal 15
2021-01-20 22:08 INFO 20945:Classifier - Processed 0 documents in 0.025h (  0.0 doc/s).
2021-01-20 22:08 INFO 20945:Classifier - Kept 0 docs over 0 (0.0%)
2021-01-20 22:08 INFO 20945:Classifier - Found 0 language labels: {}
2021-01-20 22:08 INFO 20945:where - Selected 0 documents out of 0 ( 0.0%)
submitit ERROR (2021-01-20 22:08:54,541) - Submitted job triggered an exception
2021-01-20 22:08 ERROR 20945:submitit - Submitted job triggered an exception
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 851, in next
    item = self._items.popleft()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/opt/conda/lib/python3.8/site-packages/submitit/core/submission.py", line 65, in submitit_main
    process_job(args.folder)
  File "/opt/conda/lib/python3.8/site-packages/submitit/core/submission.py", line 58, in process_job
    raise error
  File "/opt/conda/lib/python3.8/site-packages/submitit/core/submission.py", line 47, in process_job

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:11 (1 by maintainers)

Top GitHub Comments

2reactions

hadifarcommented, Feb 4, 2021

@sidsvash26 set num_segments_pre_shard to some value in above config (e.g., 10 instead of -1) .

2reactions

sidsvash26commented, Feb 4, 2021

I’m not sure what was the problem but with the following config file it will work:

python -m cc_net --config config/myconfig.json

Here is myconfig.json:

{
    "hash_in_mem": 2,
    "dump": "2019-09",
    "num_shards": 8,
    "lang_whitelist": ["fa"],
    "num_segments_per_shard": -1,
    "mine_num_processes": 1,
    "pipeline": [
        "dedup",
        "lid",
        "keep_lang",
        "split_by_segment"
    ],
    "execution": "debug",
    "target_size": "1GB",
    "output_dir": "fa_data2",
    "mined_dir": "fa_mined_by_segment2",
    "cache_dir": "fa_data2/wet_cache"
}

this works for me as well. thanks! Is it possible to download and run on only a small sample of the full data? Say (1 million documents) in one language? If yes, what would be the config for that look like?