Serve example doesn't work (empty feature name?)
See original GitHub issueDescribe the bug
I can not successfully run serve example. I think the problem is probably from the dataset? the first column is empty. it should be movie_id
but seems it’s missing (I think this feature is not helpful at all? should we just remove it from dataset). then the inference server think it misses one feature. The tricky thing is I don’t know if how serve works with empty feature name?
https://github.com/ludwig-ai/ludwig-docs/blob/master/docs/data/rotten_tomatoes_test.csv
To Reproduce Steps to reproduce the behavior:
- train rotten_tomato.csv
- run
ludwig serve --model_path=/automl/model
- run prediction request
curl http://0.0.0.0:8000/predict -X POST -F "movie_title=Friends With Money" -F "content_rating=R" -F "genres=Art House & International, Comedy, Drama" -F "runtime=88.0" -F "top_critic=TRUE" -F "review_content=The cast is terrific, the movie isn't."
{"error":"entry must contain all input features"}%
doc: https://ludwig.ai/0.6/getting_started/serve/
Expected behavior It should return 200 request with correct prediction result.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- Python version 3.8.13
- Ludwig version 0.6
Additional context Add any other context about the problem here.
Issue Analytics
- State:
- Created 10 months ago
- Comments:5
Top GitHub Comments
Hi @Jeffwan,
Jumping on the thread here.
First, I’ll say that I was able to follow the serving section of the Getting Started guide to completion.
That said, there are ways that we can clarify the walkthrough, and I also discovered a small unrelated bug (I’m doubtful that this impacted you). In any case, I’ll follow up on both of these AIs.
Here’s a handful of pointers that might be relevant for you though, to help you get a better understanding of what you might be observing:
The first column of the rotten tomatoes dataset is for an ID, and probably not a good feature for a model.
This is probably why in the getting started guide, we prompt you to train with a config that excludes the first column:
Features with empty column names are given a default column name
Unnamed: 0
. Features with empty column names can be referenced with this name in the Ludwig config.When pinging models deployed using
ludwig serve
, it’s possible to add additional feature values to the POST call. However, if these features weren’t listed in the original config that was used to train the model, then the extra fields are simply ignored and have no impact on inference, for example:These two curls should yield the exact same result:
If you call
create_auto_config
from automl on this dataset, the returned config includes the empty “id”-like feature, and the list of input features will be:(as a side note, ludwig automl should probably leave out all-unique single-token id-like features by default)
Curling the
ludwig serve
endpoint requires specifying all input features that were listed in the original config. If the model was trained with a feature with an empty column name, then there should be anUnnamed: 0
field in your curl, e.g.:Otherwise, you’ll get the
{"error":"entry must contain all input features"}%
error.The models trained with and without the ID-like feature seem very different qualitatively, and thus will return different predictions and probabilities.
One other thing to watch out for is that when re-training a model, unless you are clearing your
results
directory after every run, the new model will be saved toresults/api_experiment_run_{n}
where “n” increments after each subsequent run as to not override the artifacts of previous training runs.ludwig serve
should also be initialized with the new model path.@w4nderlust I trained the mode with original dataset from ludwig website and it doesn’t have field_id.