Better documentation for pipelines
See original GitHub issueFeature request
The introduction to pipelines documentation does not provide any details on how additional parameters can be passed to the tokenizer during the preprocessing step. After walking through all of the source code, I can see that when instantiating a pipeline via transformers.pipline(...)
one can simply pass these arguments in as keyword arguments, this is not documented anywhere. It is also not included in any examples.
This request is to have the documentation updated so future users don’t need to read the source code. This update should expand beyond tokenizing (as it also handles post_processing, etc…).
Motivation
It’s very often the case that a tokenizer is not called with the default arguments: padding, max length, etc… are often changed. The implementation for pipelines actually makes setting these arguments very simple, but it is not communicated so it is difficult to take advantage of.
Your contribution
I can contribute to the documentation if needed.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top GitHub Comments
@Narsil @stevhliu @sgugger could I get assigned to this?
@Narsil Your suggestions are very helpful.
Adding separate documentation for each pipeline makes sense. For example, in the
TextClassificationPipeline
the keyword arguments are both keyword arguments for the tokenizer’s call function, and keyword arguments for thepostprocess
function. I think even a brief statement along the lines of (but not necessarily identical to):where the function names are clickable would be extremely helpful. Simply pointing to the recipient functions also makes this a beginner friendly task. I’m assuming of course that for each pipeline, the keyword arguments are only ever passed along to other functions.
Getting caught up on the documentation should probably be done over several commits: adding one commit at a time for each of the specific pipelines will be much easier to review, that’s just my two cents though.
@DIvkov575 Keeping in mind that I’m not a maintainer of this repository, and therefore keeping in mind that my above suggestions are not necessarily ones that will be accepted, you can feel free to add documentation if you feel up to it.