Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTP 500 error in fetch_mldata mauna-loa-atmospheric-co2

See original GitHub issue

There have been several PRs (https://github.com/scikit-learn/scikit-learn/pull/11100#pullrequestreview-120737169, https://github.com/scikit-learn/scikit-learn/pull/11106) where CircleCI arbitrarly fails due to HTTP 500 errors when calling fetch_mldata('mauna-loa-atmospheric-co2'),

Partial traceback below,

Traceback (most recent call last):
  File "/home/circleci/project/examples/gaussian_process/plot_gpr_co2.py", line 75, in <module>
    data = fetch_mldata('mauna-loa-atmospheric-co2').data
  File "/home/circleci/project/sklearn/datasets/mldata.py", line 154, in fetch_mldata
    mldata_url = urlopen(urlname)
  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/urllib/request.py", line 223, in urlopen
[...]

  File "/home/circleci/miniconda/envs/testenv/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

If this keeps repeating, a possible workaround could be,

increasing the number of download attempts in fetch_mldata
investigating the failures upstream with mldata.org
copying this particular dataset to figshare (https://github.com/scikit-learn/scikit-learn/issues/7425) which seems to have a better quality of service and adding a fallback URL there used if the download from the mldata website fails…

Issue Analytics

State:
Created 5 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

lestevecommented, May 18, 2018

My understanding is that the long-term solution is the openml fetcher #9543 (not 100% sure what the status is).

mldata.org has historically not been extremely reliable but if this is just temporary glitches I would say we should ignore them as we have done so far. The feeling I got when investigating #8588 is that mldata.org maintenance is not very active (no disrespect intended, just saying that there is not a staff of 10 full-time people behind it). Edit: more details about who maintains mldata.org: https://github.com/scikit-learn/scikit-learn/issues/8588#issuecomment-292192727.

If it starts to be too annoying to be ignored, we could probably implement a retry mechanism, but someone should double-check that it actually fixes the problem. For example when a glitch happens it may actually last for a few minutes, in which case a retry mechanism may not be a great fit.

0reactions

jnothmancommented, May 21, 2018

or completing the openml PR and loading from there

On 18 May 2018 9:26 pm, “Roman Yurchak” notifications@github.com wrote:

There have been several PRs (#11100 (review) https://github.com/scikit-learn/scikit-learn/pull/11100#pullrequestreview-120737169, #11106 https://github.com/scikit-learn/scikit-learn/pull/11106) where CircleCI arbitrarly failes due to HTTP 500 errors when calling fetch_mldata(‘mauna-loa-atmospheric-co2’),

Partial traceback below,

Traceback (most recent call last): File “/home/circleci/project/examples/gaussian_process/plot_gpr_co2.py”, line 75, in <module> data = fetch_mldata(‘mauna-loa-atmospheric-co2’).data File “/home/circleci/project/sklearn/datasets/mldata.py”, line 154, in fetch_mldata mldata_url = urlopen(urlname) File “/home/circleci/miniconda/envs/testenv/lib/python3.6/urllib/request.py”, line 223, in urlopen […]

File “/home/circleci/miniconda/envs/testenv/lib/python3.6/urllib/request.py”, line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp)urllib.error.HTTPError: HTTP Error 500: Internal Server Error

If this keeps repeating, possible workaround could be,

increasing the number of download attemps in fetch_mldata

investigating the failures upstream with mldata.org

copying this particular dataset to figshare (#7425 https://github.com/scikit-learn/scikit-learn/issues/7425) which seems to have a better quality of service and adding a fallback URL there https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/datasets/mldata.py#L29 used if the download from the mldata website fails…

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/11108, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6xmCMKlMcJsxGDe8KlwDO0mYi-0Rks5tzq_UgaJpZM4UEkg6 .