Variable types not preserved after call to normalize_entity()
See original GitHub issueReproducible example:
import pandas as pd
import featuretools as ft
from featuretools.variable_types import IPAddress
from autonormalize import autonormalize as an
input_df = pd.DataFrame(
{
'ip_address': ['128.101.101.101', '1.120.0.0', '17.86.21.0', '23.1.23.255'],
'length': [900, 60, 20, 30],
'city': ['adl', 'syd', 'adl', 'syd'],
'country': ['aus', 'aus', 'aus', 'aus'],
'is_threat': [True, False, False, False]
}
)
variable_types = {'ip_address': IPAddress}
es = ft.EntitySet()
es.entity_from_dataframe(entity_id='data',
dataframe=input_df,
index='index',
variable_types=variable_types,
make_index=True)
Column ip_address is set to dtype featuretools.variable_types.IPAddress:
print(es['data'].variables)
[<Variable: index (dtype = index)>,
<Variable: length (dtype = numeric)>,
<Variable: city (dtype = categorical)>,
<Variable: country (dtype = categorical)>,
<Variable: is_threat (dtype = boolean)>,
<Variable: ip_address (dtype = ip)>]
After normalisation, ip_address resolves back to categorical:
normalized_es = an.normalize_entity(es)
for entity in normalized_es.entity_dict:
print(normalized_es.entity_dict[entity].variables)
Entity: index
[<Variable: index (dtype = index)>,
<Variable: length (dtype = numeric)>,
<Variable: city (dtype = id)>,
<Variable: is_threat (dtype = boolean)>,
<Variable: ip_address (dtype = categorical)>]
Entity: city
[<Variable: city (dtype = index)>, <Variable: country (dtype = categorical)>]
To get the desired features, the variable types need to be preserved so the right primitives can be applied when running dfs. My question is whether this should be the desired behaviour or do the variable types need to be set manually again?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Variable types not respected when normalizing entities #199
It's hard to think of a non-contrived example where I'd want to change the variable types after creating the base entity.
Read more >Relational Database Normalization Process
This process of specifying and defining tables, keys, columns, and relationships in order to create an efficient database is called normalization. Normalization ......
Read more >Nested Data Normalization createEntityAdapter
I had to piece one together. import { schema, normalize } from "normalizr"; const user = new schema.Entity("user", {}, { idAttribute: ...
Read more >Customizing the behavior of cached fields
You can customize how a particular field in your Apollo Client cache is read and written. To do so, you define a field...
Read more >XML Normalization
The difficulties arise due to the loss of the following information not available in the data model: notations and external unparsed entity ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks, created PR.
@j-grover can you create a fork to make the pull request?