OneHotEncoding issue
See original GitHub issueWhen I OneHotEncode the behaviour is as expected
one_hot_encoder = vaex.ml.OneHotEncoder(features=["scp"])
training_data = one_hot_encoder.fit_transform(data)
And this also works as expected
training_data.get_column_names()
I get
'scp_0.0', 'scp_0.1', 'scp_0.3', 'scp_0.4', 'scp_0.5', 'scp_0.8', 'scp_0.9', 'scp_1.0', 'scp_1.1', 'scp_1.3', 'scp_1.8', 'scp_1.9',
But When I try this
training_data[['scp_0.0', 'scp_0.1']]
or training_data[training_data.get_column_names()]
I get an error message :
File "C:\Program Files\Anaconda3\lib\ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
scp_0.0
^
SyntaxError: invalid syntax
But training_data['scp_0.0']
shows right value.
One work around for this was training_data[training_data.column_names]
But then I am unable to fit the data, training fails with the above message. The columns have no missing values, am I missing something?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
This is released now, you can try it out with
$ pip install "vaex-core>=2.0.2"
Hi @arjunrao01
Thanks for the report. This is a rather complex issue related to how the Expression system works. We hope to make a better solution for this soon.
In the meantime, you can try using
training_data[training_data.get_column_names(alias=False)]
That will give you the expression names that vaex understands and everything should work from there.