question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CSVLoader.loadDataSource() cannot load data without objective variable

See original GitHub issue

problem

CSVLoader.loadDataSource() cannot load data which does not have objective (response) variable. For example:

// Load Titanic test data downloaded from Kaggle (https://www.kaggle.com/c/titanic)
ListDataSource dataource = csvLoader.loadDataSource(Paths.get("test.csv"),"Survived");
List<Prediction> predicts = model.predict(dataource);

this code results:

Exception in thread "main" java.lang.IllegalArgumentException: Response Survived not found in file file:/home/tamura/git/tribuo-examples/titanic/test_removed.csv
    at org.tribuo.data.csv.CSVLoader.validateResponseNames(CSVLoader.java:286)
    at org.tribuo.data.csv.CSVLoader.innerLoadFromCSV(CSVLoader.java:244)
    at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:238)
    at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:209)
    at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:184)
    at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:138)
    at TitanicSurvivalClassifier.main(TitanicSurvivalClassifier.java:74)

It’s a little inconvenient.

Solution

Add methods to load data without response variable name.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Craigacpcommented, Oct 2, 2020

The CSVLoader is designed for purely numerical data, not the mixture of numerical and text data that comprises the Kaggle Titanic dataset, so this will probably error out later on anyway. For mixed numerical and text data you should use CSVDataSource and construct a RowProcessor which allows full control of how the features are extracted from the columnar data. CSVDataSource has a boolean outputRequired flag which if set to false will generate Examples which contain the unknown output for that output type (e.g. Label.UNKNOWN for classification tasks) if the output doesn’t exist in the csv file.

However you’re right that there should be a way to load a numerical CSV which doesn’t have a response. We’ll fix that.

0reactions
k-tamuracommented, Oct 7, 2020

Thank you! I will firstly implement a FeatureProcessor.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I read data into R? | SAMHDA
Notice that the result of this function is not assigned to an object name. When R calls load(), all of the R objects...
Read more >
How to load data? No upload wizard in the new version
If your data file is a simple CSV and if its for testing purpose , you can do this way click on Help...
Read more >
CSVLoader (Tribuo 4.0.2 API)
This class is a simple loader *only* for numerical CSV files with a String ... variable. header - The header of the CSV...
Read more >
Loading data into R - AWS
Loading data into R can be a pain. Here are some hints. 1) Before you try to load a .csv file for the...
Read more >
csvloader - Documentation - Volt Active Data
It is possible to use csvloader to load text files other than CSV files, ... If you do not specify an insert procedure,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found