question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Easy access to citations for specific methods

See original GitHub issue

Hello,

As a scientist, I would appreciate an easy access to citation information for a specific estimator. A very good example would be the citation function from R https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/citation That provides citation for each packages (and even better in bibtex format !).

The citation information in sklearn is more or less available in the reference sections of the doctrings and i coded a short function that basically prints the section:

import re
import textwrap

def _get_docstring_sections(txt):
    """Returns the sections content of numpy formated docstrings as dict."""
    
    if not type(txt)==str:
        txt=''
    
    # remove docstring indentation
    txt=textwrap.dedent('    '+txt)

    # split the docstring by sections (------)
    split_sec=re.split('\n--*\n',txt)
    
    # Find fisrt title
    secs={}
    lines=split_sec[0].splitlines()
    if len(lines):
        title= lines[-1]
        for i in range(1,len(split_sec)):
            lines=split_sec[i].splitlines()
            secs[title]='\n'.join(lines[:-1])
            title=lines[-1]
        
    return secs

def cite(obj=None,format=None):
    """Get citation information from object with numpy docstrings
    
    """

    txt="""No references found for this object"""

    if obj is not None:
        
        secs=_get_docstring_sections(obj.__doc__)
        
        if 'References' in secs:
            txt= secs['References']

    print(txt)

It seems to work rather well in practice and print the references in rst format. in a perfect world we could also return bibtex format in option but it seems quite hard without calling some reference web API.

I used it on all estimators using function sklearn.utils.testing.all_estimators() :

lst_all=sklearn.utils.testing.all_estimators()

for name,cl in lst_all:
    fmt='\n\n{} : {}'.format(name,cl)
    print(fmt)
    print('='*(len(fmt)-2))
    
    cite(cl)

It returns a lot of nice references for the methods with some notable empty approaches actually due to documentation error (References in sklearn.svm.SVR for instance in not a proper section). It also work on any python object using the numpy docstring format with reference section so I find it quite practical.

Are you interested by this function in a PR maybe in sklearn.utils ?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
NicolasHugcommented, Feb 29, 2020

Just found out about this https://github.com/duecredit/duecredit

1reaction
rflamarycommented, Feb 26, 2020

OK, I’m all for better documentation. I’m just saying that a lots of publication nowadays cite sklearn (one of the main papers) instead of the proper paper corresponding to the estimator when they should cite both. This might be lazyness but that’s why a simple function that prints a list of reference could help, assuming of course they know about it.

When there is several references, the function returns them all and the user can look into the documentation which one corresponds to the parameters he is using.

Anyways it was just an idea. I think I will implement it for my toolbox and see if i can find a way to add those awesome bibtex that the guys from R have. The way it is implemented it works on functions from numpy/scipy/sklearn and my toolbox as long as it’s using numpy docstring format.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Overview - Citation Styles: APA, MLA, Chicago, Turabian, IEEE
Listed below are a few quick links to resources that will aid you ... There are many different ways of citing resources from...
Read more >
Searching Cited References | CSUN University Library
Searching Cited References at the University Library. ... helps identify cited references in open access journal articles and on websites.
Read more >
Citing Sources: Which citation style should I use?
Citation Styles & Tools Quick MLA, APA, and Chicago style guides for bibliographies; tools for storing and organizing sources. Course Reserves ...
Read more >
How To Cite a Research Paper: Citation Styles Guide
If you are looking for the best advice on how to write a research paper, the first thing you would find is to...
Read more >
In-text Citation - APA Quick Citation Guide - Library Guides
APA style has specific rules for citing works by multiple authors. Use the following guidelines to determine how to correctly cite works by ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found