question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Data irregularities cause epacems_to_parquet to fail

See original GitHub issue

After (apparently) successfully running the new data package based ETL process on the full EPA CEMS dataset (all years, all states), I tried to run the epacems_to_parquet script, but it encountered errors, and ultimately failed. Several errors were of the type:

sys:1: DtypeWarning: Columns (8,10,12,14) have mixed types. Specify dtype option on import or set low_memory=False.

But the thing that crashed it eventually was:

Traceback (most recent call last):
  File "/home/zane/miniconda3/envs/pudl-dev/bin/epacems_to_parquet", line 11, in <module>
    load_entry_point('catalystcoop.pudl', 'console_scripts', 'epacems_to_parquet')()
  File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 319, in main
    clobber=args.clobber)
  File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 205, in epacems_to_parquet
    df = year_from_operating_datetime(df).astype(IN_DTYPES)
  File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 123, in year_from_operating_datetime
    df['year'] = df.operating_datetime_utc.dt.year
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/generic.py", line 5175, in __getattr__
    return object.__getattribute__(self, name)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/accessor.py", line 175, in __get__
    accessor_obj = self._accessor(obj)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 343, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike " "values")
AttributeError: Can only use .dt accessor with datetimelike values

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:16 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
karldwcommented, Oct 20, 2019

I wrote some code that should address this, but I’ll test it later this week before creating a PR.

0reactions
rollcommented, Oct 25, 2019

@cmgosnell Please try tableschema-pandas@1.1. I have added support for composite primary keys

Read more comments on GitHub >

github_iconTop Results From Across the Web

Doing Business – Data Irregularities Statement - World Bank
A number of irregularities have been reported regarding changes to the data in the Doing Business 2018 and Doing Business 2020 reports, ...
Read more >
Known Data Problems | ECHO | US EPA
This page lists known data quality problems with larger sets of data. Concerns have been identified by EPA or state/local environmental agency staff....
Read more >
Why Big Data Science & Data Analytics Projects Fail
85% of data science projects fail. Why? Learn these eight leading reasons and what you can do to beat the odds.
Read more >
Known Data Irregularities - NASA Langley Science Directorate
SRB results show noticeable anomalies in this period, some of which are likely artifacts of the calibration situation.
Read more >
If Your Data Is Bad, Your Machine Learning Tools Are Useless
Poor data quality is enemy number one to the widespread, profitable use of machine learning. The quality demands of machine learning are ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found