How to define new distributions?
See original GitHub issueI would like to define new distributions using scipy.stats
. Such a distribution would have a set of parameters, and one could evaluate its PDF, CDF, create random variates etc.
For continuous distributions, there is a short description under “Subclassing” in the documentation of rv_continuous
. It says one should subclass rv_continuous
, and as a minimum redefine _pdf
and _cdf
, plus possibly a list of others. The description alludes to a lot of things that I don’t understand, or understand only very superficially (e.g. “frozen distributions”).
I guess one could understand this description if there were some kind of general documentation of the design of scipy.stats
, but I wasn’t able to find it.
Questions:
-
Can I redefine
__init__
, and if yes, how should I callsuper().__init__
? -
In case I am not supposed to redefine
__init__
, how do I pass parameters to my subclass? -
What are the constraints on the signatures of the methods to be redefined? For example, do I have to provide for distribution parameters to be passed, even though I just want to define them upon instantiation?
-
Do I have to support
loc
andscale
, though I don’t want to translate and scale my distribution? The phrase “re-defining at least the _pdf or the _cdf method (normalized to location 0 and scale 1)” seems to suggest that these parameters are automatically taken care of? -
If I implement
_rvs
, can I assume thatself.random_state
is defined? Should I implement a parameterrandom_state
myself, or is that taken care of byrv_continuous.rvs
? -
How does
rvs
pass the number of samples (size
) to_rvs
? -
How are the interface methods (e.g.
ppf
) based on the methods I redefine? If I have performance problems with some interface method, which internal method should I additionally redefine? A dependency graph with some indication of performance bottlenecks and loss of precision would be useful. -
A note on shapes: subclasses need not specify them explicitly. In this case, shapes will be automatically deduced from the signatures of the overridden methods (pdf, cdf etc). If, for some reason, you prefer to avoid relying on introspection, you can specify shapes explicitly as an argument to the instance constructor.
I have only the faintest idea what this means. How exactly are shapes “automatically deduced from the signatures of the overridden methods”? How can I “specify shapes explicitly as an argument to the instance constructor” if
__init__
is not in the list of methods to be redefined?
These are just the questions I came up with at my current state of understanding, I’m sure if I understood more I had much more to ask.
Issue Analytics
- State:
- Created 3 years ago
- Comments:28 (28 by maintainers)
@mdhaber, well I certainly got a lot of information. But I asked 8 questions with a hint there might be more to follow, and only some of them have been discussed.
More importantly, I still think one shouldn’t have to go through source code or issues in order to find out these details. There should be a page in the documentation that includes all the necessary information needed for a user to define a new distributions for their own use, so that the respective object can be used in place of a
scipy.stats
distribution.And there could also be a second page that explains how to construct it such that it can be contributed to the package itself (developer’s guide).
I agree. It looks like @WarrenWeckesser’s is starting to work on these things. His first PR towards this end is gh-12069. It may look more developer-oriented than you had in mind, but I agree that it would be a good idea to make it more clear for users. I recently started working with stats, too, and I also found it confusing. In the meantime, if you do get the answers to your questions and would like to share some of your hard-earned lessons, it would be great if you could draft the document you wish you had when you started! Perhaps the experts could help polish it, but the best perspective for that sort of thing is from someone who recently learned everything the hard way. The wiki might be a good place to start before worrying about rendering with Sphinx. Thanks for your patience. Almost everything is done on a volunteer basis, so this might take a bit of time.