HTTP 500 error in fetch_mldata mauna-loa-atmospheric-co2
See original GitHub issueThere have been several PRs (https://github.com/scikit-learn/scikit-learn/pull/11100#pullrequestreview-120737169, https://github.com/scikit-learn/scikit-learn/pull/11106) where CircleCI arbitrarly fails due to HTTP 500 errors when calling fetch_mldata('mauna-loa-atmospheric-co2')
,
Partial traceback below,
Traceback (most recent call last):
File "/home/circleci/project/examples/gaussian_process/plot_gpr_co2.py", line 75, in <module>
data = fetch_mldata('mauna-loa-atmospheric-co2').data
File "/home/circleci/project/sklearn/datasets/mldata.py", line 154, in fetch_mldata
mldata_url = urlopen(urlname)
File "/home/circleci/miniconda/envs/testenv/lib/python3.6/urllib/request.py", line 223, in urlopen
[...]
File "/home/circleci/miniconda/envs/testenv/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
If this keeps repeating, a possible workaround could be,
- increasing the number of download attempts in fetch_mldata
- investigating the failures upstream with mldata.org
- copying this particular dataset to figshare (https://github.com/scikit-learn/scikit-learn/issues/7425) which seems to have a better quality of service and adding a fallback URL there used if the download from the mldata website fails…
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
How to Fix the HTTP Error 500 in MAMP (3 Easy Steps) - Kinsta
HTTP 500 errors in MAMP are typically caused by PHP or .htaccess errors. In this article, learn how to fix HTTP 500 errors...
Read more >HTTP Status Code 500: What Is the 500 "Internal Server Error"?
This means that the server encountered something unexpected that prevented it from fulfilling the request. In some ways, it's an ambiguous error.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
My understanding is that the long-term solution is the openml fetcher #9543 (not 100% sure what the status is).
mldata.org has historically not been extremely reliable but if this is just temporary glitches I would say we should ignore them as we have done so far. The feeling I got when investigating #8588 is that mldata.org maintenance is not very active (no disrespect intended, just saying that there is not a staff of 10 full-time people behind it). Edit: more details about who maintains mldata.org: https://github.com/scikit-learn/scikit-learn/issues/8588#issuecomment-292192727.
If it starts to be too annoying to be ignored, we could probably implement a retry mechanism, but someone should double-check that it actually fixes the problem. For example when a glitch happens it may actually last for a few minutes, in which case a retry mechanism may not be a great fit.
or completing the openml PR and loading from there
On 18 May 2018 9:26 pm, “Roman Yurchak” notifications@github.com wrote: