question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MinMaxScaler output datatype

See original GitHub issue

Describe the workflow you want to enable

Currently, applying a MinMaxScaler to data that includes features with small datatypes like int8 results in float64 output. I would like to have a way to output to a datatype that is also some low precision type such as float16, and I don’t believe this is supported in the MinMaxScaler today without applying another transformation after applying the minmax scaler. This likely applies to other scaling functions as well.

I would like this capability in order to avoid running out of memory on large-ish datasets that could be operated on in one VM, but can’t after many of my columns turn into higher-than-necessary-precision datatypes.

Describe your proposed solution

I’d modify MinMaxScaler to accept an optional output data type argument and then cast values to that type while performing the necessary arithmetic for scaling.

Describe alternatives you’ve considered, if relevant

The casting operation could happen at many points, including after performing arithmetic in a higher precision type like float64, which could be important in some use cases to avoid loss of precision.

Additional context

Here’s an example of a case based on MinMaxScaler docs where applying the MinMaxScaler takes int8 data and turns the result into float64

from sklearn.preprocessing import MinMaxScaler
data = np.array([[-1, 2], [-1, 6], [0, 10], [1, 18]], dtype=np.int8)
scaler = MinMaxScaler()

assert(data.dtype==np.int8)
assert(scaler.transform(data).dtype == np.float64)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
trewaitecommented, Oct 6, 2020

Hey @nkelly13, if you comment with the single word “take” the bot will assign the task to you at which point you can develop and make a pull request when ready. Make sure to follow the contribution guidelines 😄.

1reaction
rthcommented, Sep 24, 2020

I think we could add a dtype=None init parameter (with the current behavior by default) that would need to be passed here since there is indeed little control on how MinMaxScaler makes the dtype conversion.

Would you be interested in making a pull request?

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.preprocessing.MinMaxScaler
Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in...
Read more >
MinMaxScaler vs StandardScaler - Python Examples
The MinMaxscaler is a type of scaler that scales the minimum and maximum values to be 0 and 1 respectively.
Read more >
MinMaxScaler - index - Data Science with Apach Spark Test
MinMaxScaler computes summary statistics on a data set and produces a MinMaxScalerModel. ... Output: Features scaled to range: [0.0, 1.0].
Read more >
Data Preprocessing 02: MinMaxscaler Sklearn Python
... you will learn about minmaxscaler in sklearnOther important playlistsPython Tutorial: https://bit.ly/Complete-Python-TutorialPyTorch Tu.
Read more >
org.apache.spark.ml.feature.MinMaxScaler
Since zero values will probably be transformed to non-zero values, output of the transformer will be DenseVector even for sparse input. Linear Supertypes....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found