Add augmentation probability to tf.keras.layers.BaseImageAugmentationLayer augmentation rate
See original GitHub issueSystem information.
TensorFlow version (you are using): 2.9.1 Are you willing to contribute it (Yes/No): Yes
Describe the feature and the current behavior/state.
Currently, all image augmentation layers inherit from an abstract base layer, BaseImageAugmentationLayer. This layer has a variable rate
in its constructor; however, this variable is never used in any of its subclasses.
I think it would be nice if this variable was used, as its name implies, to control the probability of a specific augmentation technique being applied. This could be useful in cases where there is a significant number of images for training/fine-tuning a model, and there is only the need to apply a little bit of data augmentation, without having to augment every image in the dataset, with the computational cost it would imply.
Will this change the current api? How?
It shouldn’t change the API: all augmentation layers currently accept the rate
variable as part of **kwargs
, and thus rate
is set in the constructor of BaseImageAugmentationLayer
. The only issue is that this variable is currently unused.
Who will benefit from this feature? As mentioned before, this can reduce the computational cost implied in augmenting all the images in a dataset, in cases where there is no real need to augment all the images.
-
Do you want to contribute a PR? (yes/no): yes
-
If yes, please read this page for instructions: read and understood 👍
-
Briefly describe your candidate solution(if contributing): My solution would be to add another
if
statement, just at the beginning of the_augment
function ofBaseImageAugmentationLayer
, which would use the random number generator and therate
variable to determine whether or not to augment each image (whether individual images or each image in a batch). However, even if the straightforward solution would be to use the already existing random number generator to determine if a specific image would be augmented or not, this would cause reproducibility issues when trying to retrain models, since the same RNG would be called more times. The way I see it, there are two ways of fixing this:- Adding a special
if self.rate == 1.0
condition, to avoid calling the RNG. Assuming that nobody has used therate
variable in the constructor of image augmentation layers, this should avoid reproducibility issues. - Adding another RNG to control image augmentation probability, using the same seed to initialize both generators. As before, if nobody has used the
rate
variable, this shouldn’t create reproducibility issues.
- Adding a special
Furthermore, I don’t know if there would be concurrency issues if many “threads” try using the same RNG. If you could clarify me this, I would be relieved of knowing that reproducibility would be maintained.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
yeah, KerasCV docs are out of date. Lets definitely add it to KerasCV’s API docs.
Perfect then! Is there any way I could help with that?