question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: DataFrame Constructions from Data Classes

See original GitHub issue

Is your feature request related to a problem?

I wish to construct pandas.DataFrame from iterable of dataclasses.dataclass as from iterable of tuples DataFrame.from_records. The rationale behind is that data classes is more typed object than general tuple or dictionary. Also, data classes more memory efficient than tuple’s. It makes data classes attractive to use them instead of dict’s or tuple’s whenever schema is known.

Describe the solution you’d like

I would like class method .from_dataclasses which allows DataFrame construction and type inference from uniform (for simplicity) sequence of data classes. See example below.

import pandas as pd
from dataclasses import dataclass


@dataclass
class Record:
    id: int
    name: str
    constant: float

df = pd.DataFrame.from_dataclasses([
    Record(0, 'Landau', 3.1415926),
    Record(1, 'Kapitsa', 2.718281828459045),
    Record(2, 'Bogolyubov', 6.62607015),
])

print(df.dtypes)
#  id            int64
#  name         object
#  constant    float64
#  dtype: object

In the example above schema of DataFrame is infered with Record.__annotations__ dictionary which contains type user provided type information. API could also provide ways to validate schema in runtime by comparying type of actual type and specified type for a column.

API breaking implications

There is no API breaking in general but there is requirements to minimum Python version (which is 3.7).

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, Nov 2, 2020
0reactions
taytzehaocommented, Nov 8, 2020

Updated mistakes of the documentation update #37699

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Dataclasses With Properties and Pandas - Medium
Constructing a pandas DataFrame with a “classic” class. The code snippet below is an equivalent data-centric class definition with all the “ ...
Read more >
Intro to data structures - Pandas
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL...
Read more >
Package 'tibble'
Description tibble() constructs a data frame. It is used like base::data.frame(), but with a couple notable differences: • The returned data ...
Read more >
Read CSV File in Python Pandas - Scaler Topics
1.Getting Started with Pandas · 2.Data Structures · 3.Working with Data. What are Different Types of Dataset Formats Generally Used? Read Excel ...
Read more >
pandas remove words from string - Coffeenote
【问题标题】:Python Pandas delete row where column string has duplicate words ... We can use the remove method to remove a class from the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found