question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rework argument handling in astropy.utils.data functions that use download_file

See original GitHub issue

Follow-up to #10434

There are several functions in astropy.utils.data that wrap the download_file function. Many of them take arguments that are passed to download_file, but this is handled inconsistently.

For example:

  • get_readable_fileobj largely copies the arguments to download_file, as well as their docstrings (unneccessary duplication). It could instead take some **kwargs and document that they are passed to download_file.

  • download_files_in_parallel takes most of the same arguments as download_file but not all

  • get_pkg_data_filename and friends use download_file with some specific arguments, but it might be nice to be able to pass additional download_file arguments (such as allow_insecure). However, accepting any arbitrary keyword arguments to download_file might not work here (for example these functions already take a remote_timeout argument, where this is passed to the corresponding timeout argument of download_file).

It might be nice to brainstorm a plan to clean this up.

I think in most cases it’s simply a matter of my suggestion for get_readable_fileobj: Allow it to take arbitrary keyword arguments, and document that they are passed to download_file. I’m less sure what to do in cases like get_pkg_data_filename.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
pllimcommented, Nov 11, 2022

Thank you for the detailed analysis! Here are my personal comments but then again I have been using these for a long time, so I am a hesitant to change stuff unless it is absolutely necessary or in the situation where long term benefit outweighs the cost of near term pain. Keep in mind that changing this API might break quite a few things downstream.

remote_timeout vs timeout

For me, I think it makes sense to name it only timeout in download_data. That function is all about downloading (remote) data, so having the keyword as remote_timeout is redundant. However, in a function like get_pkg_data_filename, that file could be local (package data) or remote (from Astropy data server).

This means to me there is place for more freedom to include changes, or I understand wrong here?

More freedom, yes. But complete freedom, probably not. See my reasoning above.

Conf

I have a love-hate relationship with this one. It exists because of https://docs.astropy.org/en/latest/config/index.html . It is nice if you want to set a bunch of stuff for a system or a group of user. But it is also dangerous because that setting is hidden and you will find it surprising if you are not familiar with the config or if someone else set it for you silently. That is why we provide also the keyword to overwrite if you don’t care for some config that is set somewhere and you know what you want to set it to for that one call. So, delegating to Conf and only using that is probably a non-starter.

OOP

data.py is a low-level module, so OOP might seem to be overkill to me but I could be convinced otherwise.

Next step

Thanks again for some really valuable feedback, @davidmpaz . I would bring this up in the next dev telecon and see what other maintainers think. 🙇‍♀️

1reaction
davidmpazcommented, Nov 13, 2022

One of the ideas was to brainstorming 😃 I am glad we are doing it now. I will love to implement whatever it is discussed at the end.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Downloadable Data Management (astropy.utils.data)
In its simplest form, this amounts to using download_file with the cache=True argument to obtain their data, from the cache if the data...
Read more >
Source code for astropy.utils.data
If args[1] is present, it is a filename indicating the path to a temporary file that was created to store a remote data...
Read more >
download_file and flaky server (was: Trouble opening IERS ...
I'm using an anaconda environment with Python 3.7.2; NumPy 1.15.4; AstroPy 3.1.2 on a CentOS 7.6.1810 server. I've also gotten the same problem ......
Read more >
Python astropy.utils.data.download_file() Examples
This page shows Python examples of astropy.utils.data.download_file. ... filepath = download_file(url, cache=True) # Get the coordinates using the file path ...
Read more >
astroquery Documentation
Grab some data from ALMA, then analyze it using the Spectral Cube package ... processing to all the tables in the returned astroquery.utils....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found