question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Variable types not preserved after call to normalize_entity()

See original GitHub issue

Reproducible example:

import pandas as pd
import featuretools as ft

from featuretools.variable_types import IPAddress
from autonormalize import autonormalize as an

input_df = pd.DataFrame(
    {
        'ip_address': ['128.101.101.101', '1.120.0.0', '17.86.21.0', '23.1.23.255'],
        'length': [900, 60, 20, 30],
        'city': ['adl', 'syd', 'adl', 'syd'],
        'country': ['aus', 'aus', 'aus', 'aus'],
        'is_threat': [True, False, False, False]
    }
)

variable_types = {'ip_address': IPAddress}

es = ft.EntitySet()
es.entity_from_dataframe(entity_id='data',
                         dataframe=input_df,
                         index='index',
                         variable_types=variable_types,
                         make_index=True)

Column ip_address is set to dtype featuretools.variable_types.IPAddress:

print(es['data'].variables)

[<Variable: index (dtype = index)>, 
<Variable: length (dtype = numeric)>, 
<Variable: city (dtype = categorical)>, 
<Variable: country (dtype = categorical)>, 
<Variable: is_threat (dtype = boolean)>, 
<Variable: ip_address (dtype = ip)>]

After normalisation, ip_address resolves back to categorical:

normalized_es = an.normalize_entity(es)

for entity in normalized_es.entity_dict:
    print(normalized_es.entity_dict[entity].variables)
Entity: index
[<Variable: index (dtype = index)>, 
<Variable: length (dtype = numeric)>, 
<Variable: city (dtype = id)>, 
<Variable: is_threat (dtype = boolean)>, 
<Variable: ip_address (dtype = categorical)>]
Entity: city
[<Variable: city (dtype = index)>, <Variable: country (dtype = categorical)>]

To get the desired features, the variable types need to be preserved so the right primitives can be applied when running dfs. My question is whether this should be the desired behaviour or do the variable types need to be set manually again?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
j-grovercommented, Oct 18, 2019

@j-grover can you create a fork to make the pull request?

Thanks, created PR.

1reaction
kmax12commented, Oct 17, 2019

@j-grover can you create a fork to make the pull request?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Variable types not respected when normalizing entities #199
It's hard to think of a non-contrived example where I'd want to change the variable types after creating the base entity.
Read more >
Relational Database Normalization Process
This process of specifying and defining tables, keys, columns, and relationships in order to create an efficient database is called normalization. Normalization ......
Read more >
Nested Data Normalization createEntityAdapter
I had to piece one together. import { schema, normalize } from "normalizr"; const user = new schema.Entity("user", {}, { idAttribute: ...
Read more >
Customizing the behavior of cached fields
You can customize how a particular field in your Apollo Client cache is read and written. To do so, you define a field...
Read more >
XML Normalization
The difficulties arise due to the loss of the following information not available in the data model: notations and external unparsed entity ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found