question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API for preprocessing only ?

See original GitHub issue

Is your feature request related to a problem? Please describe. Problem: getting access to preprocessed data without training

Describe the use case I would like to use ludwig but just to preprocess my data using the yaml file and skipping the training part.

Describe the solution you’d like

  • CLI approach: ludwig preprocess --data_csv path/to/file.csv --dataset_definition_definition path/to/def.yml
  • API approach:
from ludwig import LudwigPreprocessing

dataset_definition = {...}
preprocessor = LudwigPreprocessing(dataset_definition)
preprocessed_dataframe = preprocessor.preprocess(dataset_dataframe)

Describe alternatives you’ve considered An alternative right now would be to use it as suggested, killing the process during the training step then using the generated preprocessed data.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6

github_iconTop GitHub Comments

6reactions
w4nderlustcommented, Mar 27, 2019

I see, good point. I will consider adding it in the next releases, as the functions are all already there, it’s a matter of finding a nice way to expose them.

2reactions
w4nderlustcommented, Nov 22, 2019

Starting from v0.2, the preprocessing is available for user to play around with, as now one can import whatever they like from data.preprocessing., although the only fully documented part of the code remains the API, so you can play with preprocessing, but it is not fully documented yet, in particular because we are working towards many changes in the preprocessing pipeline for v0.3.

Read more comments on GitHub >

github_iconTop Results From Across the Web

API for preprocessing only ? · Issue #217 · ludwig-ai ... - GitHub
I would like to use ludwig but just to preprocess my data using the yaml file and skipping the training part.
Read more >
Preprocessing API - details - OpenVINO™ Documentation
The purpose of this article is to present details on preprocessing API, ... model has only one input, then simple ov::preprocess::PrePostProcessor::input() ...
Read more >
Preprocess - Hugging Face
The main tool for preprocessing textual data is a tokenizer. A tokenizer splits text into tokens according to a set of rules. The ......
Read more >
Preprocess input data before making predictions using ... - AWS
In this blog post, we'll show how you can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use ......
Read more >
Scikit-Learn's New API Simplifies Data Preprocessing
In this article, we will talk about a new API related to the data preprocessing functions. In machine learning, it is highly unlikely...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found