CSVLoader.loadDataSource() cannot load data without objective variable
See original GitHub issueproblem
CSVLoader.loadDataSource()
cannot load data which does not have objective (response) variable. For example:
// Load Titanic test data downloaded from Kaggle (https://www.kaggle.com/c/titanic)
ListDataSource dataource = csvLoader.loadDataSource(Paths.get("test.csv"),"Survived");
List<Prediction> predicts = model.predict(dataource);
this code results:
Exception in thread "main" java.lang.IllegalArgumentException: Response Survived not found in file file:/home/tamura/git/tribuo-examples/titanic/test_removed.csv
at org.tribuo.data.csv.CSVLoader.validateResponseNames(CSVLoader.java:286)
at org.tribuo.data.csv.CSVLoader.innerLoadFromCSV(CSVLoader.java:244)
at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:238)
at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:209)
at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:184)
at org.tribuo.data.csv.CSVLoader.loadDataSource(CSVLoader.java:138)
at TitanicSurvivalClassifier.main(TitanicSurvivalClassifier.java:74)
It’s a little inconvenient.
Solution
Add methods to load data without response variable name.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
How do I read data into R? | SAMHDA
Notice that the result of this function is not assigned to an object name. When R calls load(), all of the R objects...
Read more >How to load data? No upload wizard in the new version
If your data file is a simple CSV and if its for testing purpose , you can do this way click on Help...
Read more >CSVLoader (Tribuo 4.0.2 API)
This class is a simple loader *only* for numerical CSV files with a String ... variable. header - The header of the CSV...
Read more >Loading data into R - AWS
Loading data into R can be a pain. Here are some hints. 1) Before you try to load a .csv file for the...
Read more >csvloader - Documentation - Volt Active Data
It is possible to use csvloader to load text files other than CSV files, ... If you do not specify an insert procedure,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The
CSVLoader
is designed for purely numerical data, not the mixture of numerical and text data that comprises the Kaggle Titanic dataset, so this will probably error out later on anyway. For mixed numerical and text data you should useCSVDataSource
and construct aRowProcessor
which allows full control of how the features are extracted from the columnar data.CSVDataSource
has a booleanoutputRequired
flag which if set to false will generateExample
s which contain the unknown output for that output type (e.g.Label.UNKNOWN
for classification tasks) if the output doesn’t exist in the csv file.However you’re right that there should be a way to load a numerical CSV which doesn’t have a response. We’ll fix that.
Thank you! I will firstly implement a
FeatureProcessor
.