question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serve example doesn't work (empty feature name?)

See original GitHub issue

Describe the bug I can not successfully run serve example. I think the problem is probably from the dataset? the first column is empty. it should be movie_id but seems it’s missing (I think this feature is not helpful at all? should we just remove it from dataset). then the inference server think it misses one feature. The tricky thing is I don’t know if how serve works with empty feature name?

https://github.com/ludwig-ai/ludwig-docs/blob/master/docs/data/rotten_tomatoes_test.csv

To Reproduce Steps to reproduce the behavior:

  1. train rotten_tomato.csv
  2. run ludwig serve --model_path=/automl/model
  3. run prediction request
curl http://0.0.0.0:8000/predict -X POST -F "movie_title=Friends With Money" -F "content_rating=R" -F "genres=Art House & International, Comedy, Drama" -F "runtime=88.0" -F "top_critic=TRUE" -F "review_content=The cast is terrific, the movie isn't."
{"error":"entry must contain all input features"}%

doc: https://ludwig.ai/0.6/getting_started/serve/

Expected behavior It should return 200 request with correct prediction result.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Python version 3.8.13
  • Ludwig version 0.6

Additional context Add any other context about the problem here.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
justinxzhaocommented, Nov 9, 2022

Hi @Jeffwan,

Jumping on the thread here.

First, I’ll say that I was able to follow the serving section of the Getting Started guide to completion.

That said, there are ways that we can clarify the walkthrough, and I also discovered a small unrelated bug (I’m doubtful that this impacted you). In any case, I’ll follow up on both of these AIs.

Here’s a handful of pointers that might be relevant for you though, to help you get a better understanding of what you might be observing:

  1. The first column of the rotten tomatoes dataset is for an ID, and probably not a good feature for a model.

    image

    This is probably why in the getting started guide, we prompt you to train with a config that excludes the first column:

    input_features:
        - name: genres
          type: set
          preprocessing:
              tokenizer: comma
        - name: content_rating
          type: category
        - name: top_critic
          type: binary
        - name: runtime
          type: number
        - name: review_content
          type: text
          encoder: embed
    output_features:
        - name: recommended
          type: binary
    
  2. Features with empty column names are given a default column name Unnamed: 0. Features with empty column names can be referenced with this name in the Ludwig config.

  3. When pinging models deployed using ludwig serve, it’s possible to add additional feature values to the POST call. However, if these features weren’t listed in the original config that was used to train the model, then the extra fields are simply ignored and have no impact on inference, for example:

    These two curls should yield the exact same result:

    curl http://0.0.0.0:8000/predict -X POST \
      -F "movie_title=Friends With Money" \
      -F "content_rating=R" \
      -F "genres=Art House & International, Comedy, Drama" \
      -F "runtime=88.0" \
      -F "top_critic=TRUE" \
      -F "review_content=The cast is terrific, the movie isn't."
    
    curl http://0.0.0.0:8000/predict -X POST \
      -F "movie_title=Friends With Money" \
      -F "content_rating=R" \
      -F "genres=Art House & International, Comedy, Drama" \
      -F "runtime=88.0" \
      -F "top_critic=TRUE" \
      -F "review_content=The cast is terrific, the movie isn't." \
      -F "extra_feature__1=123"
    
  4. If you call create_auto_config from automl on this dataset, the returned config includes the empty “id”-like feature, and the list of input features will be:

    • Unnamed: 0
    • movie_title
    • content_rating
    • genres
    • runtime
    • top_critic

    (as a side note, ludwig automl should probably leave out all-unique single-token id-like features by default)

  5. Curling the ludwig serve endpoint requires specifying all input features that were listed in the original config. If the model was trained with a feature with an empty column name, then there should be an Unnamed: 0 field in your curl, e.g.:

    curl http://0.0.0.0:8000/predict -X POST \
      -F "movie_title=Friends With Money" \
      -F "content_rating=R" \
      -F "genres=Art House & International, Comedy, Drama" \
      -F "runtime=88.0" \
      -F "top_critic=TRUE" \
      -F "review_content=The cast is terrific, the movie isn't." \
      -F "Unnamed: 0=123"
    

    Otherwise, you’ll get the {"error":"entry must contain all input features"}% error.

  6. The models trained with and without the ID-like feature seem very different qualitatively, and thus will return different predictions and probabilities.

  7. One other thing to watch out for is that when re-training a model, unless you are clearing your results directory after every run, the new model will be saved to results/api_experiment_run_{n} where “n” increments after each subsequent run as to not override the artifacts of previous training runs. ludwig serve should also be initialized with the new model path.

0reactions
Jeffwancommented, Nov 8, 2022

@w4nderlust I trained the mode with original dataset from ludwig website and it doesn’t have field_id.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sometimes adding a WCF Service Reference generates an ...
As an enhancement here is a quick example of using svcutil. ... build: It turns out that the service reference's Reference.cs file was...
Read more >
Service name must not be null or empty #73 - GitHub
I want to disable jaeger span using opentracing-jaeger-enabled: false and no jaeger service name is defined. However I'm getting following error java.lang.
Read more >
Server names - Nginx.org
A special wildcard name in the form “ .example.org ” can be used to match both the ... in a server block then...
Read more >
Dealing with errors - Power Query | Microsoft Learn
In the example above, the error message is The column 'Column' of the table ... reference to a column name that doesn't exist...
Read more >
Publish hosted feature layers—ArcGIS Online Help
When you create an empty hosted feature layer, it inherits the visible range of the feature layer or template you used to create...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found