Can we have some more documentation of the predict function/ data format
See original GitHub issueI’m new to machine learning and have managed to train ludwig using an f1 data set:
team,surname,position,track,year
Mercedes,Rosberg,1,Albert Park Grand Prix Circuit,2014
McLaren,Magnussen,2,Albert Park Grand Prix Circuit,2014
McLaren,Button,3,Albert Park Grand Prix Circuit,2014
...
my model is as follows:
input_features:
-
name: team
type: category
-
name: track
type: category
-
name: surname
type: category
-
name: year
type: category
output_features:
-
name: position
type: numerical
training:
epochs: 10
When I run the predict function I am trying to get a driver / position prediction for each track. What do I need to do in order to make this happen. I have tried to make a new data file without the position bugt it falls over with a missing ‘position’ key so clearly that’s needed even though that is the field I am trying to predict:
team,surname,track,year
Mercedes,Hamilton,Albert Park Grand Prix Circuit,2019
McLaren,Magnussen,Albert Park Grand Prix Circuit,2019
McLaren,Button,Albert Park Grand Prix Circuit,2019
Note the above data is truncated from the last 5 years with positions of 10th and above. I can expand this if it helps training!
When I add some positions in the prediction just returns a list of numerical values:
1.7585578
2.0917244
1.6508131
I’m assuming this is the position that is would be expecting for that driver/track/team combo but I’m not sure.
Can someone explain this to me further? Do I need a model without the ‘position’ key in for the prediction? Is there any way to tell Ludwig to use full Integers for positions and maybe only assign one of each per track (/per year)
Or can I be pointed to somewhere I can learn this myself?
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
For future reference, take a look at the user guide: https://uber.github.io/ludwig/user_guide/#predict
There are a couple issues here:
Numerical may not be the most ideal data type for position. Because position is discrete, categorical would be more appropriate (for instance, position 1.4 means nothing in this scenario).
For predict, the error is because when Ludwig does prediction it is also trying to evaluate the performance. For evaluating the performance, you need the ground truth. If you just want to run predictions (and not worry about evaluation), try this:
ludwig predict --only_predictions --model_path PATH_TO_THE_MODEL --data_csv PATH_TO_DATA