Preprocessing module is very confusing
See original GitHub issue❓ Questions and Help
There seems to be a bit of code smell cropping up with the #770 issue related to to_onnx and how example inputs are used. I have a few questions.
What is your question?
- How does the DataPipeline class relate to the preprocess module? The documentation for this class is very sparse.
- Why is it the responsibility of the Preprocessor to instantiate the data source when
from_data_source("something custom")is called to create a data module? This flow is very unintuitive. ApplyToKeysoperates on dictionaries, andexample_input_arraycan be a tensor, tuple, or dict. Where and how is this handled? For example into_onnxorsummarizethe example_input_array tensor is passed through the model, which works fine on just the model, but the preprocessing expects adictfor the most part since it is highly encouraged in the docs to use that function as everything in flash is supposed to be a dict until right before inference. This results in all kinds of strange errors.
Code
The fix for 3 seems to be to do this on my custom Task
def _apply_batch_transfer_handler(self, batch: Any, device: Optional[torch.device] = None, dataloader_idx: Optional[int] = None) -> Any:
if isinstance(batch, torch.Tensor):
return super()._apply_batch_transfer_handler(batch={DefaultDataKeys.INPUT: batch}, device=device, dataloader_idx=dataloader_idx)[DefaultDataKeys.INPUT]
else:
return super()._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
But I have no idea what kinds of side effects this would have and it feels like a strange fix to a fundamental problem.
What have you tried?
This issue sprang up when I implemented a custom preprocess class and spiraled from there. to_onnx and summarize (by extension, training) are broken due to this bug. The ImageClassificationPreprocess won’t work for me because my outputs are continuous,
What’s your environment?
- OS: All
- Packaging: conda/pip
- Version lightning 1.4.7, flash 0.5.0, bolts 0.4.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
A Beginners guide to Machine Learning — Data Preprocessing
So how data preprocessing is done? ... Libraries are collections of modules that can be called and used. ... Confused again?
Read more >Data Preprocessing In Python - Analytics Vidhya
In simple words, pre-processing refers to the transformations applied to your data before feeding it to the algorithm. In python, scikit-learn ...
Read more >Data preprocessing using mean removal - Packt Subscription
Data can be preprocessed in many ways—standardization, scaling, normalization, binarization, and one-hot encoding are some examples of preprocessing techniques.
Read more >Checkpoint: Preprocessing - Andy's Brain Book! - Read the Docs
The more you think about why the results of a preprocessing step look good or bad, the easier it will become to make...
Read more >Difference between Experimental Preprocessing layers and ...
I'm pretty sure these will be the same layers with two import paths for backwards compatibility. · 1 · @philosofool Gotcha, so my...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

#816
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.