Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serve example doesn't work (empty feature name?)

See original GitHub issue

Describe the bug I can not successfully run serve example. I think the problem is probably from the dataset? the first column is empty. it should be movie_id but seems it’s missing (I think this feature is not helpful at all? should we just remove it from dataset). then the inference server think it misses one feature. The tricky thing is I don’t know if how serve works with empty feature name?

https://github.com/ludwig-ai/ludwig-docs/blob/master/docs/data/rotten_tomatoes_test.csv

To Reproduce Steps to reproduce the behavior:

train rotten_tomato.csv
run ludwig serve --model_path=/automl/model
run prediction request

curl http://0.0.0.0:8000/predict -X POST -F "movie_title=Friends With Money" -F "content_rating=R" -F "genres=Art House & International, Comedy, Drama" -F "runtime=88.0" -F "top_critic=TRUE" -F "review_content=The cast is terrific, the movie isn't."
{"error":"entry must contain all input features"}%

doc: https://ludwig.ai/0.6/getting_started/serve/

Expected behavior It should return 200 request with correct prediction result.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Python version 3.8.13
Ludwig version 0.6

Additional context Add any other context about the problem here.

Issue Analytics

State:
Created 10 months ago
Comments:5

Top GitHub Comments

2reactions

justinxzhaocommented, Nov 9, 2022

Hi @Jeffwan,

Jumping on the thread here.

First, I’ll say that I was able to follow the serving section of the Getting Started guide to completion.

That said, there are ways that we can clarify the walkthrough, and I also discovered a small unrelated bug (I’m doubtful that this impacted you). In any case, I’ll follow up on both of these AIs.

Here’s a handful of pointers that might be relevant for you though, to help you get a better understanding of what you might be observing:

The first column of the rotten tomatoes dataset is for an ID, and probably not a good feature for a model.

This is probably why in the getting started guide, we prompt you to train with a config that excludes the first column:

input_features:
    - name: genres
      type: set
      preprocessing:
          tokenizer: comma
    - name: content_rating
      type: category
    - name: top_critic
      type: binary
    - name: runtime
      type: number
    - name: review_content
      type: text
      encoder: embed
output_features:
    - name: recommended
      type: binary

Features with empty column names are given a default column name Unnamed: 0. Features with empty column names can be referenced with this name in the Ludwig config.

When pinging models deployed using ludwig serve, it’s possible to add additional feature values to the POST call. However, if these features weren’t listed in the original config that was used to train the model, then the extra fields are simply ignored and have no impact on inference, for example:

These two curls should yield the exact same result:

curl http://0.0.0.0:8000/predict -X POST \
  -F "movie_title=Friends With Money" \
  -F "content_rating=R" \
  -F "genres=Art House & International, Comedy, Drama" \
  -F "runtime=88.0" \
  -F "top_critic=TRUE" \
  -F "review_content=The cast is terrific, the movie isn't."

curl http://0.0.0.0:8000/predict -X POST \
  -F "movie_title=Friends With Money" \
  -F "content_rating=R" \
  -F "genres=Art House & International, Comedy, Drama" \
  -F "runtime=88.0" \
  -F "top_critic=TRUE" \
  -F "review_content=The cast is terrific, the movie isn't." \
  -F "extra_feature__1=123"

If you call create_auto_config from automl on this dataset, the returned config includes the empty “id”-like feature, and the list of input features will be:
- Unnamed: 0
- movie_title
- content_rating
- genres
- runtime
- top_critic
(as a side note, ludwig automl should probably leave out all-unique single-token id-like features by default)
Curling the ludwig serve endpoint requires specifying all input features that were listed in the original config. If the model was trained with a feature with an empty column name, then there should be an Unnamed: 0 field in your curl, e.g.:
```
curl http://0.0.0.0:8000/predict -X POST \
  -F "movie_title=Friends With Money" \
  -F "content_rating=R" \
  -F "genres=Art House & International, Comedy, Drama" \
  -F "runtime=88.0" \
  -F "top_critic=TRUE" \
  -F "review_content=The cast is terrific, the movie isn't." \
  -F "Unnamed: 0=123"
```
Otherwise, you’ll get the {"error":"entry must contain all input features"}% error.
The models trained with and without the ID-like feature seem very different qualitatively, and thus will return different predictions and probabilities.
One other thing to watch out for is that when re-training a model, unless you are clearing your results directory after every run, the new model will be saved to results/api_experiment_run_{n} where “n” increments after each subsequent run as to not override the artifacts of previous training runs. ludwig serve should also be initialized with the new model path.

0reactions

Jeffwancommented, Nov 8, 2022

@w4nderlust I trained the mode with original dataset from ludwig website and it doesn’t have field_id.

Top Results From Across the Web

Sometimes adding a WCF Service Reference generates an ...

As an enhancement here is a quick example of using svcutil. ... build: It turns out that the service reference's Reference.cs file was...

Service name must not be null or empty #73 - GitHub

I want to disable jaeger span using opentracing-jaeger-enabled: false and no jaeger service name is defined. However I'm getting following error java.lang.

Server names - Nginx.org

A special wildcard name in the form “ .example.org ” can be used to match both the ... in a server block then...

Dealing with errors - Power Query | Microsoft Learn

In the example above, the error message is The column 'Column' of the table ... reference to a column name that doesn't exist...

Publish hosted feature layers—ArcGIS Online Help

When you create an empty hosted feature layer, it inherits the visible range of the feature layer or template you used to create...