Fix broken links in the documentation
See original GitHub issueBelow is the list of broken links in the documention from a make linkcheck
run, together with the file the link appears in and the error message.
If you want to work on this, please:
- do one Pull Request per link
- add a comment in this issue saying which link you want to tackle so that different people can work on this issue in parallel
- mention this issue (
#23631
) in your Pull Request description so that progress on this issue can more easily be tracked
Possible solutions for a broken link include:
- find a replacement for the broken link. In case of links to articles, being able to link to a resource where the article is openly accessible (rather than behind a paywall) would be nice.
- The link can be added to the
linkcheck_ignore
variable: https://github.com/scikit-learn/scikit-learn/blob/59473a91d4528503c63d71ad5843dac1b20a3d67/doc/conf.py#L590. This is the only thing to do for example when:- the link is broken with no replacement (for example in testimonials some companies were acquired and their website does not exist)
- the link works fine in a browser but is flagged as broken by
make linkcheck
tool. This may happen because some websites are trying to prevent bots to scrape the content of their website
Something that may be useful in the complicated cases is to search on the Internet Archive for the broken link. You may be able to look at the old content and it may help you to find an appropriate link replacement.
-
http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf
modules/generated/sklearn.linear_model.OrthogonalMatchingPursuit.rst403 Client Error: Forbidden for url: http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf
-
http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf
modules/decomposition.rst404 Client Error: Not Found for url: https://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf
-
http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py
modules/generated/sklearn.datasets.make_swiss_roll.rst403 Client Error: Forbidden for url: http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py
- https://github.com/scikit-learn/scikit-learn/pull/23679
http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf
modules/linear_model.rstHTTPConnectionPool(host='users.jyu.fi', port=80): Max retries exceeded with url: /~samiayr/pdf/ayramo_eurogen05.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02da35c340>, 'Connection to users.jyu.fi timed out. (connect timeout=10)'))
- #23660
http://www.ats.ucla.edu/stat/r/dae/rreg.htm
modules/linear_model.rstHTTPConnectionPool(host='www.ats.ucla.edu', port=80): Max retries exceeded with url: /stat/r/dae/rreg.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02dfd53a60>, 'Connection to www.ats.ucla.edu timed out. (connect timeout=10)'))
-
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
datasets/real_world.rst404 Client Error: Not Found for url: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
-
http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf
modules/decomposition.rst404 Client Error: Not Found for url: http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf
-
http://www.iucnredlist.org/apps/redlist/details/3038/0
auto_examples/neighbors/plot_species_kde.rst404 Client Error: Not Found for url: https://www.iucnredlist.org/apps/redlist/details/3038/0
-
http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdf
modules/outlier_detection.rstHTTPSConnectionPool(host='www.recognition.mccme.ru', port=443): Max retries exceeded with url: /pub/papers/SVM/sch99estimating.pdf (Caused by SSLError(SSLCertVerificationError("hostname 'www.recognition.mccme.ru' doesn't match 'kvant.ras.ru'")))
-
http://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf
modules/generated/sklearn.metrics.hinge_loss.rst404 Client Error: Not Found for url: https://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf
-
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf
modules/decomposition.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
-
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf
modules/kernel_approximation.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
-
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf
modules/linear_model.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
-
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf
modules/decomposition.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
-
https://dev.pandas.io/docs/development/maintaining.html
developers/bug_triaging.rstHTTPSConnectionPool(host='dev.pandas.io', port=443): Max retries exceeded with url: /docs/development/maintaining.html (Caused by SSLError(SSLCertVerificationError("hostname 'dev.pandas.io' doesn't match either of '*.numericable.fr', 'numericable.fr'")))
-
https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html
developers/contributing.rst404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html
- #23697
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html
modules/grid_search.rst404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html
- #23739
https://doi.org/10.13140/RG.2.2.35280.02565
modules/generated/sklearn.cluster.spectral_clustering.rst403 Client Error: Forbidden for url: https://www.researchgate.net/publication/354448354?channel=doi&linkId=6138e932a3a397270a8f1300&showFulltext=true
-
https://imageio.readthedocs.io/en/latest/userapi.html
datasets/loading_other_datasets.rst404 Client Error: Not Found for url: https://imageio.readthedocs.io/en/latest/userapi.html
-
https://newcircle.com/s/post/1152/scikit-learn_machine_learning_in_python
presentations.rstHTTPSConnectionPool(host='newcircle.com', port=443): Max retries exceeded with url: /s/post/1152/scikit-learn_machine_learning_in_python (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f02da1007c0>, 'Connection to newcircle.com timed out. (connect timeout=10)'))
-
https://pythonhosted.org/joblib/memory.html
modules/compose.rst404 Client Error: Not Found for url: https://pythonhosted.org/joblib/memory.html
-
https://staff.washington.edu/jakevdp
presentations.rst404 Client Error: for url: https://staff.washington.edu/jakevdp
-
https://trevorhastie.github.io
modules/generated/sklearn.metrics.d2_absolute_error_score.rst404 Client Error: Not Found for url: https://trevorhastie.github.io/
-
https://users.soe.ucsc.edu/~optas/papers/jl.pdf
modules/generated/sklearn.random_projection.SparseRandomProjection.rst404 Client Error: Not Found for url: https://users.soe.ucsc.edu/~optas/papers/jl.pdf
-
https://www.cs.technion.ac.il/~mic/doc/skl-ip.pdf
modules/generated/sklearn.decomposition.IncrementalPCA.rstHTTPSConnectionPool(host='mic.net.technion.ac.il', port=443): Max retries exceeded with url: //doc/skl-ip.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
-
https://www.datascience-paris-saclay.fr/
about.rstHTTPSConnectionPool(host='www.datascience-paris-saclay.fr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
-
https://www.frs-fnrs.be/-fnrs
about.rst404 Client Error: Not Found for url: https://www.frs-fnrs.be/fr/-fnrs
-
https://www.jstor.org/stable/2984099
modules/generated/sklearn.impute.IterativeImputer.rst403 Client Error: Forbidden for url: https://www.jstor.org/stable/2984099
- This link is working in a browser, it should be addded to
linkcheck_ignore
similarly to what was done in #23737https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
modules/svm.rstHTTPSConnectionPool(host='www.microsoft.com', port=443): Read timed out. (read timeout=10)
-
https://www.numfocus.org/support-numfocus.html
about.rst403 Client Error: Forbidden for url: https://www.flipcause.com/secure/cause_pdetails/MjM2OA==
-
https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis
modules/clustering.rst403 Client Error: Forbidden for url: https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis
- This link is working in a browser, it should be addded to
linkcheck_ignore
similarly to what was done in #23737https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air
modules/generated/sklearn.datasets.load_boston.rst403 Client Error: Forbidden for url: https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air
-
https://www.sri.com/sites/default/files/publications/ransac-publication.pdf
modules/generated/sklearn.linear_model.RANSACRegressor.rst404 Client Error: Not Found for url: https://www.sri.com/sites/default/files/publications/ransac-publication.pdf
-
https://www.stat.washington.edu/research/reports/2000/tr371.pdf
modules/cross_decomposition.rstHTTPSConnectionPool(host='www.stat.washington.edu', port=443): Max retries exceeded with url: /research/reports/2000/tr371.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
Issue Analytics
- State:
- Created a year ago
- Comments:44 (40 by maintainers)
Top Results From Across the Web
HowTo - fix broken links in documents with ReplaceMagic
Fix broken links, link fixer tool for Office, PDF and text documents. ... ReplaceMagic is the best tool to help you fix your...
Read more >How to Find and Fix Broken Links (5 Methods) - Kinsta
The first method we recommend to find and fix broken links is to use a web-based SEO audit tool. Two of the most...
Read more >How to find and fix broken links on your knowledge base
You can easily identify your broken links using the Links status checker. Once you've identified your broken links, you can navigate to your ......
Read more >Ways to Handle Broken Links - ClickHelp
Broken links are unacceptable in documentation because they influence the quality of documentation and SEO. With ClickHelp you can handle broken links ......
Read more >How to Find and Fix Broken Links on Your Website
Go to Administration – Site building – URL redirects. Click “Add redirect”. Fill in the “From” and “To” blanks by copying and pasting...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Alright @lesteve
I will be working on:
This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in https://github.com/scikit-learn/scikit-learn/pull/23737 https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf modules/svm.rst
HTTPSConnectionPool(host=‘www.microsoft.com’, port=443): Read timed out. (read timeout=10)
modules/generated
files are automatically generated files during the documentation build, you should look at the corresponding.py
file:sklearn/cluster/spectral_clustering.py
. The link is likely part of a docstring.This link is working fine in a browser actually (at least for me but please double-check), you should add it to linkcheck_ignore as in https://github.com/scikit-learn/scikit-learn/pull/23737