n_jobs for DBSCAN
See original GitHub issue#16213 ## Describe the bug
n_jobs
argument doesn’t seem to change the time it takes to run DBSCAN.fit()
. Runtime is the same with and without n_jobs
.
Is n_jobs
actually implemented for DBSCAN.fit()
?
Steps/Code to Reproduce
Example:
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
from time import time
from sklearn.cluster import DBSCAN
# generate a symmetric distance matrix
num_training_examples = 30000
num_features = 10
X = np.random.randint(5, size=(num_training_examples, num_features))
D = euclidean_distances(X,X)
# DBSCAN parameters
eps = 0.25
kmedian_thresh = 0.005
min_samples = 5
# case 1: omit n_jobs arg from DBSCAN
start = time()
db = DBSCAN(eps=eps,
min_samples = min_samples,
metric='precomputed').fit(D)
end = time()
total_time = end - start
print('DBSCAN took {} seconds for {} training examples without n_jobs arg'\
.format(total_time,num_training_examples))
# case 2: add n_jobs arg to DBSCAN
n_jobs = -1
start = time()
db = DBSCAN(eps=eps,
min_samples = min_samples,
metric='precomputed',
n_jobs=n_jobs).fit(D)
end = time()
total_time = end - start
print('DBSCAN took {} seconds for {} training examples with n_jobs arg'\
.format(total_time,num_training_examples,n_jobs))
Sample code to reproduce the problem
#### Expected Results
Expected runtime to decrease with more processors.
#### Actual Results
Runtime basically unchanged.
DBSCAN took 285.76699996 seconds for 30000 training examples without n_jobs arg
DBSCAN took 363.289000034 seconds for 30000 training examples with n_jobs arg
#### Versions
Cython: None
scipy: 1.2.2
setuptools: 41.6.0
pip: 19.3.1
numpy: 1.16.5
pandas: 0.24.2
sklearn: 0.20.4
import sys; print("Python", sys.version)
('Python', '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]')
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
sklearn.cluster.DBSCAN — scikit-learn 1.2.0 documentation
Perform DBSCAN clustering from vector array or distance matrix. DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high ......
Read more >How Does DBSCAN Clustering Work? - Analytics Vidhya
It groups 'densely grouped' data points into a single cluster. It can identify clusters in large spatial datasets by looking at the local ......
Read more >DBSCAN Algorithm Clustering in Python - Section.io
DBSCAN algorithm group points based on distance measurement. To cluster data points, this algorithm separates the high-density regions of ...
Read more >DBSCAN Clustering Algorithm in Machine Learning
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a base algorithm for density-based clustering. It can discover clusters ...
Read more >How DBSCAN works and why should we use it?
Density-based spatial clustering of applications with noise (DBSCAN) is a well-known data clustering algorithm that is commonly used in data mining and ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
go for it
On Fri, 31 Jan 2020 at 15:45, JohanWork notifications@github.com wrote:
– Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
This issue should be closed by #16615?