Change API to connect to a backend
See original GitHub issueCurrently, the API to connect to a backend, with a backend specific option, is as follows:
import ibis
ibis.options.impala.temp_db = 'foo'
conn = ibis.impala.connect(host='impala',
database='ibis_testing',
hdfs_client=ibis.hdfs_connect(host='impala', port=50070))
I think it has few drawbacks, in particular:
- It’s not very intuitive, since
ibis.impala
seems to be a impala module in ibis, but it was originallyibis/impala/api.py
, now it’sibis/backends/impala
, and potentially it can be a module in any other package - The code becomes trickier, since we need to use
__getattr__
, load the backends dynamically, but ideally keeping the attributes fibis
introspectable, including backends - The
ibis
namespace is already huge, mixing special attributes for backends makes things more complex - When a backend is misspelled or inexistent, the error message is not very intuitive, since we can’t know if the user was trying to call a backend, or a regular attribute
An alternative API that I personally find easier is:
import ibis
conn = ibis.connect(engine='impala,
host='impala',
database='ibis_testing',
hdfs_client=ibis.hdfs_connect(host='impala', port=50070),
temp_db='foo')
I think this API reduces the magic significantly. There is a function that once called will look for the backend and connect to it. If the backend doesn’t exist, we can provide a clear error message.
The only drawback I see is that the signature of connect
won’t be very specific ibis.connect(engine, **kwargs)
. The backend connect
function will have the parameters clearly documented, so I don’t think it makes a big difference for the documentation. But for introspection we would have **kwargs
.
Something that could make things simpler is to use a connection string instead of **kwargs
:
conn = ibis.connect(conn_str='impala://user@impala/ibis_testing',
hdfs_client=ibis.hdfs_connect(host='impala', port=50070),
temp_db='foo')
This should work directly with SQLAlchemy backends and OmniSciDB, I guess some backends may require to parse the url in the Ibis backend, but that doesn’t seem like a big deal.
Somehow unrelated, I’d move backend specific options to the connection object, instead of having them as options. I think this will make things easier and clear, both for Ibis maintainers and for users.
I’d add the ibis.connect
method for 2.0, and a FutureWarning
for ibis.<backend>
, and remove the latter in Ibis 3.0.
@jreback happy with this?
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (14 by maintainers)
Top GitHub Comments
Agree on that. But technically we can also evaluate lazily on
__getattr__
ofibis
, and load the backends onibis.bigquery
or equivalent. The reason we’re not doing that are options. The next would fail, since the backend is not loaded and the option doesn’t exist if the backends hasn’t been loaded. That’s the main reason I don’t think we should have backend options.On the circular imports, I fail to see how having the backend as a separate repo is affecting it. Everything should be the same, backends are being loaded as entrypoints if they are in this repo too. So, nothing should have changed I’d say. Maybe I’m missing something, but seems to me like the cause should be something else.
Connection string logic seems like a good approach here. There’s a lot discussion on this issue, which I appreciate, but it’s better to work out the details for something this complex against real code so folks can try it out and collaborate.