Ivis seems to provoke errors when composing a sklearn.pipeline.Pipeline passed to sklearn.model_selection.GridSearchCV and executed in parallel
See original GitHub issueThe problem
I noticed that when Ivis compose a sklearn.pipeline.Pipeline
which is passed to sklearn.model_selection.GridSearch
to fine-tune hyper-parameters across all estimators/transformers, and GridSearch
has n_jobs=-1
(i.e., when executions within GridSearch
are parallel), errors are thrown. This does not happen when n_jobs=1
(i.e., when the executions within GridSearch
are sequential).
Since Pipeline
globally regulates the n_jobs
parameter, thus not supporting the parallelization of only specific steps, this problem forces the global use of n_jobs=1
, which sensibly slows down the fine-tuning process by underusing the computational power of the setup in which the script is being executed (even in parts where n_jobs=-1
would work).
Environment
A virtual environment was created specifically to this repository, wherein all modules described in requirements.txt
were installed. My setup runs an up-to-date version of Windows 10 (no WSL).
Runtime
python=3.8.4
Relevant modules
ivis=2.0.3
tensorflow=2.5.0
Minimal reproducible example
Code
if __name__ == "__main__":
import tempfile
import ivis
from sklearn import datasets, ensemble, model_selection, pipeline, preprocessing
from os import environ
environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
X, y = datasets.load_iris(return_X_y=True)
pipeline_with_ivis = pipeline.Pipeline([
("normalize", preprocessing.MinMaxScaler()),
("project", ivis.Ivis()),
("classify", ensemble.RandomForestClassifier()),
], memory=tempfile.mkdtemp())
parameter_grid = {
"project__k": (15,),
"project__verbose": (True,),
"classify__random_state": (2021,)
}
grid_search = model_selection.GridSearchCV(pipeline_with_ivis, parameter_grid, scoring="accuracy", cv=10, n_jobs=-1,
return_train_score=True, verbose=3).fit(X, y)
Error
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 212, in extract_knn
process.start()
File "C:\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\process.py", line 39, in _Popen
return Popen(process_obj)
File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\popen_loky_win32.py", line 70, in __init__
child_env.update(process_obj.env)
AttributeError: 'KnnWorker' object has no attribute 'env'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 341, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 303, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 591, in __call__
return self._cached_call(args, kwargs)[0]
File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 534, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 761, in call
output = self.func(*args, **kwargs)
File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 754, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "<REPOSITORY_ROOT>\ivis\ivis.py", line 350, in fit_transform
self.fit(X, Y, shuffle_mode)
File "<REPOSITORY_ROOT>\ivis\ivis.py", line 328, in fit
self._fit(X, Y, shuffle_mode)
File "<REPOSITORY_ROOT>\ivis\ivis.py", line 190, in _fit
self.neighbour_matrix = AnnoyKnnMatrix.build(X, path=self.annoy_index_path,
File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 63, in build
return cls(index, X.shape, path, k, search_k, precompute, include_distances, verbose)
File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 48, in __init__
self.precomputed_neighbours = self.get_neighbour_indices()
File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 96, in get_neighbour_indices
return extract_knn(
File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 236, in extract_knn
process.terminate()
File "C:\Python38\lib\multiprocessing\process.py", line 133, in terminate
self._popen.terminate()
AttributeError: 'NoneType' object has no attribute 'terminate'
warnings.warn("Estimator fit failed. The score on this train-test"
[...]
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the test scores are non-finite: [nan]
warnings.warn(
Discussion
By coding and playing with the example above, I acquired the understanding that, since both sklearn
uses joblib
and ivis
uses multiprocessing
, these modules might not be playing well with each other for some reason.
I would discard the understanding that nested estimators/transformers with parallel routines would be the problem: estimators like sklearn.ensemble.RandomForestClassifier
can be set to have n_jobs=-1
without problem within the Pipeline
passed to GridSearchCV
.
I am particularly affected by this issue because I want to employ ivis
in projects that involve hyper-parameter fine-tuning using cross-validation via GridSearchCV
with concurrent executions. I attempted to diagnose the problem, but to no avail, which is why I bring this issue to your attention.
Observation: another part of this problem is a design choice that is not adherent to the sklearn
API guidelines, whose solution I propose and detail in #95. This issue does not cause the aforementioned error, but might cause other errors that could affect the same use scenario (Pipeline
in GridSearchCV
running in parallel).
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
Just a quick update: I am still testing
ivis
on the minimal reproducible example, as well as on a pipeline I have been working on. I still managed to find some errors, but they seem to happen just when I am runningGridSearchCV
withn_jobs=-1
inside adocker
container. I am just ascertaining that this is adocker
problem, and not aivis
one.If it serves of anything, here is the error I have been seeing under
docker
. It seems to happen whenever I runGridSearchCV
withn_jobs != 1
. It runs for some time without any problems, and then this happens:I tried to find more information on this on the web, but the only thing I was able to find that resembles this problem was this unanswered question on StackOverflow. It seems to happen whenever the
Pipeline
includesivis
, and it does not seem to happen with other projectors (e.g., UMAP, PCA) on the few tests I made, which makes me wonder ifivis
plays a part on this. If pertinent, I will produce another minimal reproducible example involvingdocker
for you to try and reproduce this error on your end.As I said, I am still running tests, so take everything I said above with a pinch of salt. And before I forget, thank you for the diligence with which you assisted me in solving this issue. I really appreciate it.
Hmm, it never occurred to me that this could be memory. Thank you for the clarification on this matter and for solving the issue with
GridSearchCV
, I really appreciate it. Feel free to close this issue if there is nothing else to be added.