question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fix broken links in the documentation

See original GitHub issue

Below is the list of broken links in the documention from a make linkcheck run, together with the file the link appears in and the error message.

If you want to work on this, please:

  • do one Pull Request per link
  • add a comment in this issue saying which link you want to tackle so that different people can work on this issue in parallel
  • mention this issue (#23631) in your Pull Request description so that progress on this issue can more easily be tracked

Possible solutions for a broken link include:

  • find a replacement for the broken link. In case of links to articles, being able to link to a resource where the article is openly accessible (rather than behind a paywall) would be nice.
  • The link can be added to the linkcheck_ignore variable: https://github.com/scikit-learn/scikit-learn/blob/59473a91d4528503c63d71ad5843dac1b20a3d67/doc/conf.py#L590. This is the only thing to do for example when:
    • the link is broken with no replacement (for example in testimonials some companies were acquired and their website does not exist)
    • the link works fine in a browser but is flagged as broken by make linkcheck tool. This may happen because some websites are trying to prevent bots to scrape the content of their website

Something that may be useful in the complicated cases is to search on the Internet Archive for the broken link. You may be able to look at the old content and it may help you to find an appropriate link replacement.

  • http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf modules/generated/sklearn.linear_model.OrthogonalMatchingPursuit.rst
    403 Client Error: Forbidden for url: http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf
    
  • http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf modules/decomposition.rst
    404 Client Error: Not Found for url: https://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf
    
  • http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py modules/generated/sklearn.datasets.make_swiss_roll.rst
    403 Client Error: Forbidden for url: http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py
    
  • https://github.com/scikit-learn/scikit-learn/pull/23679 http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf modules/linear_model.rst
    HTTPConnectionPool(host='users.jyu.fi', port=80): Max retries exceeded with url: /~samiayr/pdf/ayramo_eurogen05.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02da35c340>, 'Connection to users.jyu.fi timed out. (connect timeout=10)'))
    
  • #23660 http://www.ats.ucla.edu/stat/r/dae/rreg.htm modules/linear_model.rst
    HTTPConnectionPool(host='www.ats.ucla.edu', port=80): Max retries exceeded with url: /stat/r/dae/rreg.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02dfd53a60>, 'Connection to www.ats.ucla.edu timed out. (connect timeout=10)'))
    
  • http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html datasets/real_world.rst
    404 Client Error: Not Found for url: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
    
  • http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf modules/decomposition.rst
    404 Client Error: Not Found for url: http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf
    
  • http://www.iucnredlist.org/apps/redlist/details/3038/0 auto_examples/neighbors/plot_species_kde.rst
    404 Client Error: Not Found for url: https://www.iucnredlist.org/apps/redlist/details/3038/0
    
  • http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdf modules/outlier_detection.rst
    HTTPSConnectionPool(host='www.recognition.mccme.ru', port=443): Max retries exceeded with url: /pub/papers/SVM/sch99estimating.pdf (Caused by SSLError(SSLCertVerificationError("hostname 'www.recognition.mccme.ru' doesn't match 'kvant.ras.ru'")))
    
  • http://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf modules/generated/sklearn.metrics.hinge_loss.rst
    404 Client Error: Not Found for url: https://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf modules/decomposition.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf modules/kernel_approximation.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf modules/linear_model.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf modules/decomposition.rst
    HTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://dev.pandas.io/docs/development/maintaining.html developers/bug_triaging.rst
    HTTPSConnectionPool(host='dev.pandas.io', port=443): Max retries exceeded with url: /docs/development/maintaining.html (Caused by SSLError(SSLCertVerificationError("hostname 'dev.pandas.io' doesn't match either of '*.numericable.fr', 'numericable.fr'")))
    
  • https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html developers/contributing.rst
    404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html
    
  • #23697 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html modules/grid_search.rst
    404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html
    
  • #23739 https://doi.org/10.13140/RG.2.2.35280.02565 modules/generated/sklearn.cluster.spectral_clustering.rst
    403 Client Error: Forbidden for url: https://www.researchgate.net/publication/354448354?channel=doi&linkId=6138e932a3a397270a8f1300&showFulltext=true
    
  • https://imageio.readthedocs.io/en/latest/userapi.html datasets/loading_other_datasets.rst
    404 Client Error: Not Found for url: https://imageio.readthedocs.io/en/latest/userapi.html
    
  • https://newcircle.com/s/post/1152/scikit-learn_machine_learning_in_python presentations.rst
    HTTPSConnectionPool(host='newcircle.com', port=443): Max retries exceeded with url: /s/post/1152/scikit-learn_machine_learning_in_python (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f02da1007c0>, 'Connection to newcircle.com timed out. (connect timeout=10)'))
    
  • https://pythonhosted.org/joblib/memory.html modules/compose.rst
    404 Client Error: Not Found for url: https://pythonhosted.org/joblib/memory.html
    
  • https://staff.washington.edu/jakevdp presentations.rst
    404 Client Error:  for url: https://staff.washington.edu/jakevdp
    
  • https://trevorhastie.github.io modules/generated/sklearn.metrics.d2_absolute_error_score.rst
    404 Client Error: Not Found for url: https://trevorhastie.github.io/
    
  • https://users.soe.ucsc.edu/~optas/papers/jl.pdf modules/generated/sklearn.random_projection.SparseRandomProjection.rst
    404 Client Error: Not Found for url: https://users.soe.ucsc.edu/~optas/papers/jl.pdf
    
  • https://www.cs.technion.ac.il/~mic/doc/skl-ip.pdf modules/generated/sklearn.decomposition.IncrementalPCA.rst
    HTTPSConnectionPool(host='mic.net.technion.ac.il', port=443): Max retries exceeded with url: //doc/skl-ip.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
    
  • https://www.datascience-paris-saclay.fr/ about.rst
    HTTPSConnectionPool(host='www.datascience-paris-saclay.fr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
    
  • https://www.frs-fnrs.be/-fnrs about.rst
    404 Client Error: Not Found for url: https://www.frs-fnrs.be/fr/-fnrs
    
  • https://www.jstor.org/stable/2984099 modules/generated/sklearn.impute.IterativeImputer.rst
    403 Client Error: Forbidden for url: https://www.jstor.org/stable/2984099
    
  • This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in #23737 https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf modules/svm.rst
    HTTPSConnectionPool(host='www.microsoft.com', port=443): Read timed out. (read timeout=10)
    
  • https://www.numfocus.org/support-numfocus.html about.rst
    403 Client Error: Forbidden for url: https://www.flipcause.com/secure/cause_pdetails/MjM2OA==
    
  • https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis modules/clustering.rst
    403 Client Error: Forbidden for url: https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis
    
  • This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in #23737 https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air modules/generated/sklearn.datasets.load_boston.rst
    403 Client Error: Forbidden for url: https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air
    
  • https://www.sri.com/sites/default/files/publications/ransac-publication.pdf modules/generated/sklearn.linear_model.RANSACRegressor.rst
    404 Client Error: Not Found for url: https://www.sri.com/sites/default/files/publications/ransac-publication.pdf
    
  • https://www.stat.washington.edu/research/reports/2000/tr371.pdf modules/cross_decomposition.rst
    HTTPSConnectionPool(host='www.stat.washington.edu', port=443): Max retries exceeded with url: /research/reports/2000/tr371.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
    

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:44 (40 by maintainers)

github_iconTop GitHub Comments

1reaction
ikeadeoyincommented, Jun 24, 2022

Alright @lesteve

I will be working on:

This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in https://github.com/scikit-learn/scikit-learn/pull/23737 https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf modules/svm.rst

HTTPSConnectionPool(host=‘www.microsoft.com’, port=443): Read timed out. (read timeout=10)

1reaction
lestevecommented, Jun 23, 2022

@lesteve regarding this I can’t find the rst file: there’s not generated folder under doc/modules, and also searching the entire sklearn folder I couldn’t find any spectral_clustering.rst file. Where can I find it?

modules/generated files are automatically generated files during the documentation build, you should look at the corresponding .py file: sklearn/cluster/spectral_clustering.py. The link is likely part of a docstring.

This link is working fine in a browser actually (at least for me but please double-check), you should add it to linkcheck_ignore as in https://github.com/scikit-learn/scikit-learn/pull/23737

Read more comments on GitHub >

github_iconTop Results From Across the Web

HowTo - fix broken links in documents with ReplaceMagic
Fix broken links, link fixer tool for Office, PDF and text documents. ... ReplaceMagic is the best tool to help you fix your...
Read more >
How to Find and Fix Broken Links (5 Methods) - Kinsta
The first method we recommend to find and fix broken links is to use a web-based SEO audit tool. Two of the most...
Read more >
How to find and fix broken links on your knowledge base
You can easily identify your broken links using the Links status checker. Once you've identified your broken links, you can navigate to your ......
Read more >
Ways to Handle Broken Links - ClickHelp
Broken links are unacceptable in documentation because they influence the quality of documentation and SEO. With ClickHelp you can handle broken links ......
Read more >
How to Find and Fix Broken Links on Your Website
Go to Administration – Site building – URL redirects. Click “Add redirect”. Fill in the “From” and “To” blanks by copying and pasting...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found