Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model Prediction call stuck at convert_examples_to_features when using `use_multiprocessing=True`

See original GitHub issue

Describe the bug Model prediction is failing while converting_examples_to_features when we use multiprocessing

To Reproduce Run a Gunicorn server which accepts prediction calls and hit predict method at classification_model.ClassificationModel().predict()

Expected behavior It should predict without any issues

Desktop (please complete the following information):

Ubuntu 18.04

Additional context Observed replacing code at https://github.com/ThilinaRajapakse/simpletransformers/blob/master/simpletransformers/classification/classification_utils.py#L376 Fixes the issue

with Pool(process_count) as p:
               features = list(
                   tqdm(
                       p.imap(convert_example_to_feature, examples, chunksize=500),
                       total=len(examples),
                       disable=silent,
                   )
               )

    try:                                                                                                                                                                                                                                                                                
        p = Pool(process_count)                                                                                                                                                                                                                                                         
        features = list(                                                                                                                                                                                                                                                                    
            tqdm(                                                                                                                                                                                                                                                                               
                p.imap(convert_example_to_feature, examples, chunksize=500),                                                                                                                                                                                                                    
                            total=len(examples),                                                                                                                                                                                                                                                            
                            disable=silent,                                                                                                                                                                                                                                                             
            )                                                                                                                                                                                                                                                                           
       )                                                                                                                                                                                                                                                                           
    finally:                                                                                                                                                                                                                                                                            
        p.close()                                                                                                                                                                                                                                                                       
        p.join()

    with Pool(process_count) as p:                                                                                                                                                                                                                                                      
        features = list(                                                                                                                                                                                                                                                                    
            tqdm(                                                                                                                                                                                                                                                                               
                p.imap(convert_example_to_feature, examples, chunksize=500),                                                                                                                                                                                                                    
                            total=len(examples),                                                                                                                                                                                                                                                            
                            disable=silent,                                                                                                                                                                                                                                                             
                )                                                                                                                                                                                                                                                                           
            )                                                                                                                                                                                                                                                                               
        p.close()                                                                                                                                                                                                                                                                       
        p.join()

Issue Analytics

State:
Created 4 years ago
Reactions:5
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

Lysimachoscommented, Feb 19, 2020

I have the same problem. The simpletransformer library I am using is without the correction you are proposing. On what machine are you running the code on?

0reactions

nmvijaycommented, Jun 11, 2020

sometime just gets stuck (hangs), sometimes throws error (random behaviour)

0%| | 0/72685 [00:00<?, ?it/s] 0%| | 1/72685 [00:00<5:51:11, 3.45it/s] 2%|▏ | 1501/72685 [00:00<4:00:47, 4.93it/s] 4%|▍ | 3001/72685 [00:00<2:45:02, 7.04it/s] 6%|▌ | 4501/72685 [00:00<1:53:03, 10.05it/s] 8%|▊ | 6001/72685 [00:00<1:17:26, 14.35it/s] 10%|█ | 7501/72685 [00:00<53:00, 20.49it/s]
12%|█▏ | 9001/72685 [00:01<36:16, 29.25it/s] 14%|█▍ | 10012/72685 [00:01<25:06, 41.60it/s] 16%|█▌ | 11501/72685 [00:01<17:11, 59.34it/s] 18%|█▊ | 13001/72685 [00:01<11:45, 84.58it/s] 20%|█▉ | 14501/72685 [00:01<08:03, 120.40it/s] 22%|██▏ | 16001/72685 [00:01<05:31, 171.21it/s] 24%|██▍ | 17501/72685 [00:02<03:47, 243.03it/s] 26%|██▌ | 19001/72685 [00:02<02:36, 343.75it/s] 28%|██▊ | 20500/72685 [00:02<00:06, 8688.71it/s] multiprocessing.pool.RemoteTraceback: “”" Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/pool.py”, line 121, in worker result = (True, func(*args, **kwds)) File “/usr/lib/python3.7/multiprocessing/pool.py”, line 44, in mapstar return list(map(*args)) File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/simpletransformers/classification/classification_utils.py”, line 112, in convert_example_to_feature tokens_a = tokenizer.tokenize(example.text_a) File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/transformers/tokenization_utils.py”, line 1329, in tokenize tokenized_text = split_on_tokens(added_tokens, text) File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/transformers/tokenization_utils.py”, line 1303, in split_on_tokens if not text.strip(): AttributeError: ‘float’ object has no attribute ‘strip’ “”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File “src/component_prediction_hf.py”, line 53, in <module> model.train_model(train_data) File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py”, line 274, in train_model train_dataset = self.load_and_cache_examples(train_examples, verbose=verbose) File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py”, line 852, in load_and_cache_examples args=args, File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/simpletransformers/classification/classification_utils.py”, line 391, in convert_examples_to_features disable=silent, File “/my_bundle_bundle/my_env/lib/python3.7/site-packages/tqdm/std.py”, line 1129, in iter for obj in iterable: File “/usr/lib/python3.7/multiprocessing/pool.py”, line 325, in <genexpr> return (item for chunk in result for item in chunk) File “/usr/lib/python3.7/multiprocessing/pool.py”, line 748, in next raise value AttributeError: ‘float’ object has no attribute ‘strip’