question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Example notebook for converting panel data to required data format[DOC]

See original GitHub issue

Hi there! Great project and work that you all have been doing 😃

As a new comer to using this package, I’ve been trying to figure out how to format the data that I have so that it can be used by all of the awesome methods that you guys have made. I’ve taken a look at load data example where there is extensive writing about loading from a .ts file as well as how to use long format data with sktime. I’ve also looked at the other examples but they all have the data in it’s correct format

It says the from_long_to_nested method converts rows from a long-table schema data frame assuming each row contains information for case_id, dimension_id, reading_id, value. From my understanding (please correct me if i’m wrong) long format data should be in the format (cases, feature dim, time dim). For example, if I have 5 customers, 3 different columns for each one, and 6 time observations for each customer, the dimensions would 5 x 3 x 6.

If we begin with this data format (I feel like this is a pretty common initial representation):

CustomerId date feature1 feature2 feature3 class
1 2010-10-15 1 2 3 ‘a’
1 2010-10-16 3 4 5 ‘b’
.
.
.
2 2010-10-15 1 2 3 ‘c’
2 2010-10-16 3 4 5 ‘d’
.
.

How do the above columns map to the case_id, dimension_id, reading_id, value representation? CustomerId -> case_id, correct?

Could there be an example that uses some concrete examples of the mapping?

Could there also be an example that shows how to prepare this data for use by from_long_to_nested?

Even better might be a method df_to_long(case_id_identifier, feature_name_list, date_column_identifier) that converts a dataframe that is tidy to the required long format.

case_id_identifier would be the unique identifier (CustomerId in my case), feature_name_list an array of strings that identify the columns you want to use, and date_column_identifier a string that identifies the date column

Thanks so much!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
RNKuhnscommented, Jan 30, 2021

The loading_data.ipynb in examples has been updated based on the work in pull request #553. There is now additional functions to convert between a variety of data formats and the nested format currently used by sktime.

1reaction
mloningcommented, Dec 17, 2020

Hi @bendykstra94 thanks for creating this issue!

I think one problem is that there isn’t really a consensus representation for panel data.

What you call features is what is called dimensions in that notebook. So “melting” the feature columns into a feature name and value column should do the trick.

We’d appreciate a PR to add those discussion to the data loading notebook!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reshaping panel data with long_panel() and widen_panel()
As a general rule, the conversion of data from wide to long is much more difficult than the inverse. When preparing to reshape...
Read more >
Reshape Panel Data from Wide to Long | LOST
Panel data is data in which individuals are observed at multiple points ... Reshaping is the method of converting wide-format data to long...
Read more >
Pandas for Panel Data - GitHub
In an earlier lecture on pandas, we looked at working with simple data sets. Econometricians often need to work with more complex data...
Read more >
How to Panel data python – An easy introduction - DSPYT
In this article, we discuss panel data python and panel data regression python. We also provide the python panel data examples illustrating the ......
Read more >
Convert to Panel Data format - Statalist
I have a pooled cross-sectional data that I want to present in a panel data format. Here is a sample of my data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found