_pickle.UnpicklingError: pickle data was truncated
See original GitHub issueDescribe the bug
I was running autosklearn for a hour on the dataframe kin8nm with id: 189 downloaded from OpenML and then autosklearn stops with this exception _pickle.UnpicklingError: pickle data was truncated
To Reproduce
Steps to reproduce the behavior:
- Download the dataframe 189 with the OpenML API
- Apply train_test_split
- Create the automl instance:
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=timelife*60,
per_run_time_limit=30,
memory_limit=psutil.virtual_memory().available,
n_jobs=-1,
resampling_strategy_arguments = {'cv': 10}
)
Timelife in this case i-s equal to 60
- Run the fit:
automl.fit(X_train, y_train)
Actual behavior, stacktrace or logfile
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
code = _serve_one(child_r, fds,
File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
code = spawn._main(child_r, parent_sentinel)
File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
code = _serve_one(child_r, fds,
File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
code = spawn._main(child_r, parent_sentinel)
File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Environment and installation:
Please give details about your installation:
- OS: Ubuntu 20.04.2 LTS
- Is your installation in a virtual environment or conda environment? Virtual environment
- Python version: 3.8.10 64-bit
- Auto-sklearn version: 0.12.6
Issue Analytics
- State:
- Created 2 years ago
- Comments:15 (8 by maintainers)
Top Results From Across the Web
python 3.6 socket pickle data was truncated - Stack Overflow
I use pickle but my client pickle crashes with this error: pickle data was truncated. My server : I create a numpy array...
Read more >_pickle.UnpicklingError: pickle data was truncated · Issue #73 ...
Basically it seems like some files may not have finished downloading. You can check #60 for the correct file sizes and which file(s)...
Read more >Loading big Doc2Vec model with error UnpicklingError
I ran into Error of UnpicklingError: pickle data was truncated when loading the model. Do you know what the issue here is and...
Read more >Be careful with pickle.load
You are probably aware that pickle.load can execute arbitrary code and must not be used for ... UnpicklingError: pickle data was truncated.
Read more >_pickle.UnpicklingError: pickle data was truncated? - PyTorch ...
Could you redownload or recreate the data as it seems the file is not complete (truncated). Is there any other possible cause of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Unfortunately I’m not aware on a way to compute the optimal amount of RAM per core for a specific dataset. As long as datasets are small it doesn’t really matter. However, as you realized, when they get larger it has an impact and figuring out how much to use automatically would be great, but is so far beyond the scope of Auto-sklearn.
So there is no configuration that maximizes the performance of the algorithm with any kind of amount of RAM and cores. It depends on the weight of the dataframe and the other things we mentioned earlier. The configuration I wrote is optimal for the components I have on my PC but perhaps it is not suitable for other amounts of ram and cores. Or maybe on average the calculation that you recommended turns out to be better for the largest amount of setup?