Add staticmethods to reasons to prevent re-compute.
See original GitHub issueI really like the current design with reasons just being function calls.
However, when working with large datasets or in use cases where you already have the predictions of a model, I wonder if you have thought about letting users to pass either a sklearn model or the pre-computed probas (for those Reasons where it make sense). For threshold-based reasons and large datasets this could save some time and compute, allow for faster iteration, and it would open up the possibility of using other models beyond sklearn.
I understand that the design wouldn’t be as clean as it is right now, might cause miss-alignments if users don’t send the correct shapes/positions, but I wonder if you have considered this (or any other way to pass pre-computed predictions).
Just to illustrate what I mean (sorry about the dirty-pseudo code):
class ProbaReason:
def __init__(self, model=None, probas=None, max_proba=0.55):
if not model or probas:
print("You should at least pass a model or probas")
self.model = model
self.probas = probas
self.max_proba = max_proba
def __call__(self, X, y=None):
probas = probas if self.probas else self.model.predict_proba(X)
result = probas.max(axis=1) <= self.max_proba
return result.astype(np.float16)
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (6 by maintainers)
Top GitHub Comments
I’ll certainly explore it for another release then.
Odds are that I’ll also start thinking about adding support for spaCy. I jotted some ideas here: https://github.com/koaning/doubtlab/issues/4.
Yes, I think classmethods would be a good design choice. In any case, I understand why you design it this way and indeed custom functions would be a way to go for reusing preds and probas, with the cost of having to reimplement the reasons, although they are quite compact already. As for the batch computation I agree that you could separate the work like this and maybe even filter your dataset before but it would add complexity if you want to iterate with the thresholds. Also batch processing would work for local methods, but global methods like cleanlab benefit from the full data available. I guess if you plan to add more global methods, the ability to instantiate reasons (via staticmethods) with precomputed could make sense. Again that’s only some thoughts after playing with the library. Finally, as a disclaimer this feature might also be useful for a potential integration of doubtlab with Rubrix 😃