Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MinMaxScaler output datatype

See original GitHub issue

Describe the workflow you want to enable

Currently, applying a MinMaxScaler to data that includes features with small datatypes like int8 results in float64 output. I would like to have a way to output to a datatype that is also some low precision type such as float16, and I don’t believe this is supported in the MinMaxScaler today without applying another transformation after applying the minmax scaler. This likely applies to other scaling functions as well.

I would like this capability in order to avoid running out of memory on large-ish datasets that could be operated on in one VM, but can’t after many of my columns turn into higher-than-necessary-precision datatypes.

Describe your proposed solution

I’d modify MinMaxScaler to accept an optional output data type argument and then cast values to that type while performing the necessary arithmetic for scaling.

Describe alternatives you’ve considered, if relevant

The casting operation could happen at many points, including after performing arithmetic in a higher precision type like float64, which could be important in some use cases to avoid loss of precision.

Additional context

Here’s an example of a case based on MinMaxScaler docs where applying the MinMaxScaler takes int8 data and turns the result into float64

from sklearn.preprocessing import MinMaxScaler
data = np.array([[-1, 2], [-1, 6], [0, 10], [1, 18]], dtype=np.int8)
scaler = MinMaxScaler()

assert(data.dtype==np.int8)
assert(scaler.transform(data).dtype == np.float64)

Issue Analytics

State:
Created 3 years ago
Reactions:4
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

trewaitecommented, Oct 6, 2020

Hey @nkelly13, if you comment with the single word “take” the bot will assign the task to you at which point you can develop and make a pull request when ready. Make sure to follow the contribution guidelines 😄.

1reaction

rthcommented, Sep 24, 2020

I think we could add a dtype=None init parameter (with the current behavior by default) that would need to be passed here since there is indeed little control on how MinMaxScaler makes the dtype conversion.

Would you be interested in making a pull request?

Top Results From Across the Web

sklearn.preprocessing.MinMaxScaler

Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in...

MinMaxScaler vs StandardScaler - Python Examples

The MinMaxscaler is a type of scaler that scales the minimum and maximum values to be 0 and 1 respectively.

MinMaxScaler - index - Data Science with Apach Spark Test

MinMaxScaler computes summary statistics on a data set and produces a MinMaxScalerModel. ... Output: Features scaled to range: [0.0, 1.0].

Data Preprocessing 02: MinMaxscaler Sklearn Python

... you will learn about minmaxscaler in sklearnOther important playlistsPython Tutorial: https://bit.ly/Complete-Python-TutorialPyTorch Tu.

org.apache.spark.ml.feature.MinMaxScaler

Since zero values will probably be transformed to non-zero values, output of the transformer will be DenseVector even for sparse input. Linear Supertypes....