question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

compute pairwise_distance with custom metric function for non-numeric data

See original GitHub issue

This is a feature request: I’m working with strings and have a custom metric function to compute similarities between the strings. I would love to use the pairwise_distance function together with this custom metric to compute the whole distance/similarity matrix for my data. However, I’m getting a ValueError: could not convert string to float: 'some string' when the X and Y arrays are checked. It would be great if these checks could be made optional for custom metric functions.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
cod3liciouscommented, Dec 29, 2019

I wanted to use the function to compute a distance matrix that I was using elsewhere, i.e., outside of sklearn; I figured the function had some nice parallelization built in or was in some other way more efficient than a naive implementation. So yes, it’s probably of limited value in conjunction with sklearn models, but even if there the better solution would be to pass a precomputed distance matrix, this matrix needs to be computed somehow as well. And considering that it’s probably just a matter of adding one parameter check_input=True and then one if statement before the arrays are checked, I think it’s worth it, even if the benefit doesn’t extend to other sklearn models.

0reactions
vnmabuscommented, Dec 3, 2020

I have to add something to this topic. I am the main maintainer of scikit-fda, a project that implements functional data methods compatible with scikit-learn. In our case we do not even have arrays, as our data represent functions, so we have our own objects analog to a 1d array of functions (but sharing common things between them). Also we have developed functional metrics, such as the Lp metrics (which use integrals instead of sums). We even know how to compute the pairwise distance for some of these metrics in a more efficient way than the naive implementation (for example multiplying the weights of the quadrature and using einsum). Moreover, we want to apply some distance-based methods almost verbatim to our objects, such as knn and agglomerative clustering, an objective that we currently achieve wrapping the estimators and using the “precomputed” distance.

In summary, it would be nice if we had support for the following things:

  • Computing the pairwise distances with our types and metrics, relying in the optimized implementation if available.
  • Using the distance based classifiers directly with our types and metrics, again relying in the optimized implementation of the pairwise distances if available.
Read more comments on GitHub >

github_iconTop Results From Across the Web

How can I construct a pairwise distance matrix using a custom ...
I would like to create a program that computes a distance matrix from the results of my calculations on sets. Data about these...
Read more >
sklearn.metrics.pairwise_distances
Compute the distance matrix from a vector array X and optional Y. This method takes either a vector array or a distance matrix,...
Read more >
dispRity: Measuring Disparity
If the dispRity data has custom subsets with a single group, ... Each method for calculating distance is expressed as a function of...
Read more >
Pairwise Mahalanobis distances - Cross Validated
So, center columns of the data matrix, compute the hat matrix, ... of the cloud and replace each pairwise distance by the corresponding ......
Read more >
Pairwise distance between pairs of observations - MATLAB pdist
Define a custom distance function that ignores coordinates with NaN values, and compute pairwise distance by using the custom distance function.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found