question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to define new distributions?

See original GitHub issue

I would like to define new distributions using scipy.stats. Such a distribution would have a set of parameters, and one could evaluate its PDF, CDF, create random variates etc.

For continuous distributions, there is a short description under “Subclassing” in the documentation of rv_continuous. It says one should subclass rv_continuous, and as a minimum redefine _pdf and _cdf, plus possibly a list of others. The description alludes to a lot of things that I don’t understand, or understand only very superficially (e.g. “frozen distributions”).

I guess one could understand this description if there were some kind of general documentation of the design of scipy.stats, but I wasn’t able to find it.

Questions:

  • Can I redefine __init__, and if yes, how should I call super().__init__?

  • In case I am not supposed to redefine __init__, how do I pass parameters to my subclass?

  • What are the constraints on the signatures of the methods to be redefined? For example, do I have to provide for distribution parameters to be passed, even though I just want to define them upon instantiation?

  • Do I have to support loc and scale, though I don’t want to translate and scale my distribution? The phrase “re-defining at least the _pdf or the _cdf method (normalized to location 0 and scale 1)” seems to suggest that these parameters are automatically taken care of?

  • If I implement _rvs, can I assume that self.random_state is defined? Should I implement a parameter random_state myself, or is that taken care of by rv_continuous.rvs?

  • How does rvs pass the number of samples (size) to _rvs?

  • How are the interface methods (e.g. ppf) based on the methods I redefine? If I have performance problems with some interface method, which internal method should I additionally redefine? A dependency graph with some indication of performance bottlenecks and loss of precision would be useful.

  • A note on shapes: subclasses need not specify them explicitly. In this case, shapes will be automatically deduced from the signatures of the overridden methods (pdf, cdf etc). If, for some reason, you prefer to avoid relying on introspection, you can specify shapes explicitly as an argument to the instance constructor.

    I have only the faintest idea what this means. How exactly are shapes “automatically deduced from the signatures of the overridden methods”? How can I “specify shapes explicitly as an argument to the instance constructor” if __init__ is not in the list of methods to be redefined?

These are just the questions I came up with at my current state of understanding, I’m sure if I understood more I had much more to ask.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:28 (28 by maintainers)

github_iconTop GitHub Comments

3reactions
allefeldcommented, Jul 18, 2020

@mdhaber, well I certainly got a lot of information. But I asked 8 questions with a hint there might be more to follow, and only some of them have been discussed.

More importantly, I still think one shouldn’t have to go through source code or issues in order to find out these details. There should be a page in the documentation that includes all the necessary information needed for a user to define a new distributions for their own use, so that the respective object can be used in place of a scipy.stats distribution.

And there could also be a second page that explains how to construct it such that it can be contributed to the package itself (developer’s guide).

1reaction
mdhabercommented, Jul 18, 2020

I agree. It looks like @WarrenWeckesser’s is starting to work on these things. His first PR towards this end is gh-12069. It may look more developer-oriented than you had in mind, but I agree that it would be a good idea to make it more clear for users. I recently started working with stats, too, and I also found it confusing. In the meantime, if you do get the answers to your questions and would like to share some of your hard-earned lessons, it would be great if you could draft the document you wish you had when you started! Perhaps the experts could help polish it, but the best perspective for that sort of thing is from someone who recently learned everything the hard way. The wiki might be a good place to start before worrying about rendering with Sphinx. Thanks for your patience. Almost everything is done on a volunteer basis, so this might take a bit of time.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating New Distributions
Let's start by creating simple distributions and plots: ... operations to distribution objects, generating new image distributions automatically.
Read more >
Create New Samplers and Distributions - JuliaStats
Create a Distribution ... Most distributions should implement a sampler method to improve batch sampling efficiency. ... Samplers can often rely on pre-computed ......
Read more >
How to generate new distributions in packages distr, distrEx
Basically there are three ways to produce new distributions in packages ... 3. defining new distribution classes / doing it from scratch.
Read more >
Creating New Distributions - Brian Hartman
There are many methods to generate new distributions; some of these methods allow us to give in-depth interpretation to the distributions.
Read more >
Define Custom Distributions Using the Distribution Fitter App
The Distribution field of the New Fit dialog box, available from the Distribution Fitter app, contains the new custom distribution. In the Command...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found