Example notebook for converting panel data to required data format[DOC]
See original GitHub issueHi there! Great project and work that you all have been doing 😃
As a new comer to using this package, I’ve been trying to figure out how to format the data that I have so that it can be used by all of the awesome methods that you guys have made. I’ve taken a look at load data example where there is extensive writing about loading from a .ts file as well as how to use long format data with sktime. I’ve also looked at the other examples but they all have the data in it’s correct format
It says the from_long_to_nested method converts rows from a long-table schema data frame assuming each row contains information for case_id, dimension_id, reading_id, value
. From my understanding (please correct me if i’m wrong) long format data should be in the format (cases, feature dim, time dim). For example, if I have 5 customers, 3 different columns for each one, and 6 time observations for each customer, the dimensions would 5 x 3 x 6.
If we begin with this data format (I feel like this is a pretty common initial representation):
CustomerId | date | feature1 | feature2 | feature3 | class |
---|---|---|---|---|---|
1 | 2010-10-15 | 1 | 2 | 3 | ‘a’ |
1 | 2010-10-16 | 3 | 4 | 5 | ‘b’ |
. | |||||
. | |||||
. | |||||
2 | 2010-10-15 | 1 | 2 | 3 | ‘c’ |
2 | 2010-10-16 | 3 | 4 | 5 | ‘d’ |
. | |||||
. |
How do the above columns map to the case_id, dimension_id, reading_id, value
representation?
CustomerId -> case_id, correct?
Could there be an example that uses some concrete examples of the mapping?
Could there also be an example that shows how to prepare this data for use by from_long_to_nested
?
Even better might be a method df_to_long(case_id_identifier, feature_name_list, date_column_identifier)
that converts a dataframe that is tidy to the required long format.
case_id_identifier would be the unique identifier (CustomerId in my case), feature_name_list an array of strings that identify the columns you want to use, and date_column_identifier a string that identifies the date column
Thanks so much!
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (1 by maintainers)
The loading_data.ipynb in examples has been updated based on the work in pull request #553. There is now additional functions to convert between a variety of data formats and the nested format currently used by sktime.
Hi @bendykstra94 thanks for creating this issue!
I think one problem is that there isn’t really a consensus representation for panel data.
What you call
features
is what is calleddimensions
in that notebook. So “melting” the feature columns into a feature name and value column should do the trick.We’d appreciate a PR to add those discussion to the data loading notebook!