question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kedro Tutorial pandas.CSVDataSet not found and kedro.extras ModuleNotFoundError

See original GitHub issue

Description

kedro ipython fails with

DataSetError: An exception occurred when parsing config for DataSet `companies`:
Class `pandas.CSVDataSet` not found.

if dataset referenced as pandas.CSVDataSet.

Context

I follow the tutorial until the “Setting up the data”-“Reference all datasets” step. I reference the two datasets as pandas.CSVDataSet and check that the datasets are correctly referenced by running context.catalog.load(“companies”).head() in a kedro ipython session.

Steps to Reproduce

  1. Create a new project and install dependencies as shown in the Kedro Spaceflights tutorial.
  2. Download the data files and reference the files as shown in the Setting up the data part:

`companies: type: pandas.CSVDataSet filepath: data/01_raw/companies.csv

reviews: type: pandas.CSVDataSet filepath: data/01_raw/reviews.csv`

  1. run kedro ipython

Expected Result

kedro ipython session should start and context.catalog.load(“companies”).head() should display the first rows of the dataset.

Actual Result

when I run kedro ipython I get:

DataSetError: An exception occurred when parsing config for DataSet `companies`:
Class `pandas.CSVDataSet` not found.

If dataset referenced as CSVLocalDataSet, then the kedro ipython session starts correctly and context.catalog.load(“companies”).head() displays the first rows of the dataset. However, if I then run from kedro.extras.datasets.pandas import CSVDataSet , it fails with:

ModuleNotFoundError: No module named 'kedro.extras'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (using pip show kedro): 0.15.5
  • Python version used (python -V): Python 3.7.6
  • Operating system and version: macOS High Sierra 10.13.6

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12

github_iconTop GitHub Comments

2reactions
ghostcommented, Feb 16, 2020

Hello @Mshindi777! 👋

Thank you for raising this issue. I’ve explained why this is happening in this PR comment here: https://github.com/quantumblacklabs/kedro/pull/222#issuecomment-586580697 and intend to fix it first thing Monday morning.

To view the correct documentation, at the bottom right of the sidebar on the left in the documentation, you should be able to switch the documentation version from latest to stable.

Hope that helps and sorry for the confusion re: docs!

1reaction
ghostcommented, Mar 3, 2020

Hi @pmbaumgartner,

We’re aware of this and it’s due to the way we handle our dependencies. As a stopgap for now, pip install “kedro[all]” should get you up and running.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kedro Tutorial pandas.CSVDataSet not found and ... - GitHub
Description kedro ipython fails with DataSetError: An exception occurred when parsing config for DataSet `companies`: Class `pandas.
Read more >
kedro.extras.datasets.pandas.CSVDataSet
Has no effect on the data set if versioning was not enabled. Return type. AbstractDataSet. Returns. An instance of an AbstractDataSet subclass.
Read more >
kedro.extras.datasets.pandas.CSVDataSet - Read the Docs
from kedro.extras.datasets.pandas import CSVDataSet import pandas as pd data ... If prefix is not provided, file protocol (local filesystem) will be used.
Read more >
Source code for kedro.extras.datasets.pandas.sql_dataset
Returns: Instructions for installing missing driver. An empty string is returned in case error is related to an unknown driver.
Read more >
Source code for kedro.extras.datasets.pandas.sql_dataset
Returns: Instructions for installing missing driver. An empty string is returned in case error is related to an unknown driver.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found