API for preprocessing only ?
See original GitHub issueIs your feature request related to a problem? Please describe. Problem: getting access to preprocessed data without training
Describe the use case I would like to use ludwig but just to preprocess my data using the yaml file and skipping the training part.
Describe the solution you’d like
- CLI approach:
ludwig preprocess --data_csv path/to/file.csv --dataset_definition_definition path/to/def.yml
- API approach:
from ludwig import LudwigPreprocessing
dataset_definition = {...}
preprocessor = LudwigPreprocessing(dataset_definition)
preprocessed_dataframe = preprocessor.preprocess(dataset_dataframe)
Describe alternatives you’ve considered An alternative right now would be to use it as suggested, killing the process during the training step then using the generated preprocessed data.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6
Top Results From Across the Web
API for preprocessing only ? · Issue #217 · ludwig-ai ... - GitHub
I would like to use ludwig but just to preprocess my data using the yaml file and skipping the training part.
Read more >Preprocessing API - details - OpenVINO™ Documentation
The purpose of this article is to present details on preprocessing API, ... model has only one input, then simple ov::preprocess::PrePostProcessor::input() ...
Read more >Preprocess - Hugging Face
The main tool for preprocessing textual data is a tokenizer. A tokenizer splits text into tokens according to a set of rules. The ......
Read more >Preprocess input data before making predictions using ... - AWS
In this blog post, we'll show how you can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use ......
Read more >Scikit-Learn's New API Simplifies Data Preprocessing
In this article, we will talk about a new API related to the data preprocessing functions. In machine learning, it is highly unlikely...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I see, good point. I will consider adding it in the next releases, as the functions are all already there, it’s a matter of finding a nice way to expose them.
Starting from v0.2, the preprocessing is available for user to play around with, as now one can import whatever they like from
data.preprocessing
., although the only fully documented part of the code remains the API, so you can play with preprocessing, but it is not fully documented yet, in particular because we are working towards many changes in the preprocessing pipeline for v0.3.