question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

See original GitHub issue

The following code fails with “‘DatasetDict’ object has no attribute ‘train_test_split’” - am I doing something wrong?

from datasets import load_dataset
dataset = load_dataset('csv', data_files='data.txt')
dataset = dataset.train_test_split(test_size=0.1)

AttributeError: ‘DatasetDict’ object has no attribute ‘train_test_split’

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

6reactions
mrscpcommented, Jun 15, 2021

dataset = load_dataset(‘csv’, data_files=[‘files/datasets/dataset.csv’]) dataset = dataset[‘train’] dataset = dataset.train_test_split(test_size=0.1)

4reactions
SBrandeiscommented, Dec 18, 2020

Hi @david-waterworth!

As indicated in the error message, load_dataset("csv") returns a DatasetDict object, which is mapping of str to Dataset objects. I believe in this case the behavior is to return a train split with all the data. train_test_split is a method of the Dataset object, so you will need to do something like this:

dataset_dict = load_dataset(`'csv', data_files='data.txt')
dataset = dataset_dict['split name, eg train']
dataset.train_test_split(test_size=0.1)

Please let me know if this helps. 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

'DatasetDict' object has no attribute 'train_test_split' - Datasets
Hi @thecity2, as far as I know train_test_split operates on Dataset objects, not DatasetDict objects. For example, this works squad = ( ......
Read more >
AttributeError: 'DatasetDict' object has no attribute 'load_metric'
I can't load metrics using DatasetDict.
Read more >
'DataFrame' object has no attribute 'to_dataframe'
Here is my code up until the error I'm getting. # Load libraries import pandas as pd import numpy as np from pandas.tools.plotting...
Read more >
torchtext.data - Read the Docs
Two fields with the same Field object will have a shared vocabulary. ... If the relative size for valid is missing, only the...
Read more >
Train and Test Set in Python Machine Learning - How to Split
Can you please tell me how i can use this sklearn for training python with another language i have the dataset need i...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found