question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OneHotEncoding issue

See original GitHub issue

When I OneHotEncode the behaviour is as expected

one_hot_encoder = vaex.ml.OneHotEncoder(features=["scp"])
training_data = one_hot_encoder.fit_transform(data)

And this also works as expected training_data.get_column_names()

I get 'scp_0.0', 'scp_0.1', 'scp_0.3', 'scp_0.4', 'scp_0.5', 'scp_0.8', 'scp_0.9', 'scp_1.0', 'scp_1.1', 'scp_1.3', 'scp_1.8', 'scp_1.9',

But When I try this training_data[['scp_0.0', 'scp_0.1']] or training_data[training_data.get_column_names()] I get an error message :

File "C:\Program Files\Anaconda3\lib\ast.py", line 35, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<unknown>", line 1 scp_0.0 ^ SyntaxError: invalid syntax

But training_data['scp_0.0'] shows right value.

One work around for this was training_data[training_data.column_names] But then I am unable to fit the data, training fails with the above message. The columns have no missing values, am I missing something?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
maartenbreddelscommented, Jun 4, 2020

This is released now, you can try it out with $ pip install "vaex-core>=2.0.2"

1reaction
JovanVeljanoskicommented, Jun 3, 2020

Hi @arjunrao01

Thanks for the report. This is a rather complex issue related to how the Expression system works. We hope to make a better solution for this soon.

In the meantime, you can try using training_data[training_data.get_column_names(alias=False)]

That will give you the expression names that vaex understands and everything should work from there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are the main issues with using one-hot encoding? - Quora
One hot encoding is a binary representation of a categorical data. This became popular after deep learning came into practice because categorical data...
Read more >
Stop One-Hot Encoding Your Categorical Variables. - Medium
One -hot encoding, otherwise known as dummy variables, is a method of converting categorical variables into several binary columns, where a 1 ...
Read more >
Categorical Encoding | One Hot Encoding vs Label Encoding
One -Hot Encoding results in a Dummy Variable Trap as the outcome of one variable can easily be predicted with the help of...
Read more >
Are You Getting Burned By One-Hot Encoding?
Tree-based models, such as Decision Trees, Random Forests, and Boosted Trees, typically don't perform well with one-hot encodings with lots of ...
Read more >
Problem in one hot encoding | Data Science and ... - Kaggle
So according to this, when you one-hot encode the datasets differently, you will have 3 columns generated in training dataset for column "A",...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found