Add Apache Drill as a connector
See original GitHub issueC. Givre wrote in a different channel:
I’m working on a SQLAlchemy dialect for Apache Drill so that I can use Superset with Drill. My intention is to open source it once it works reasonably well and consistently, but I keep running into small issues and I wanted to ask if you would be willing to assist me from time to time by answering some small questions I have.
I’ll tell you that the biggest challenge I’ve encountered is that Drill doesn’t record the data types which are returned in a query. So for every query, I’ve had to create a secondary query using the ‘typeof()’ method in order to obtain the data types. This works reasonably well, but I’m getting a lot of malformed queries when I try to generate charts from Superset.
My preference is not to put these on the Superset issue tracker as the issues probably aren’t issues with Superset but rather the dialect. Anyway, thanks in advance for whatever help you can give me.
Notes from @mistercrunch
- the datasource interface is not very well defined at the moment and I’d like to do some work in that direction. My vision is to have a clear base class for Datasource, Column and Metric that would need to be derived for each “connector”. In the meantime you kind of have to infer the interface which isn’t ideal at all, especially as we add more connectors. The first step is to identify everything that is common to the Druid and SQLAlchemy classes and refactor that in a base class. Also probably breaking down into more python modules, maybe under a new
connectors
folder, withbase.py
,druid.py
andsqlalchemy.py
. - what about inferring types when saving the Drill datasource somehow, maybe by running a dummy query of some sort with a low limit
- it’s ok to add issues and PRs about Drill in this repository. Depending on your commitment and level of support we may put it alongside other connectors, or in a
contrib
folder. Though I think it would be great if you wanted to bring that in and you essentially become the maintainer of that part of the code. Over time and as trust build we can provide you with more rights on the repository. - eventually we should write a “Connectors” page in the docs telling users what they need to know about connectors
- eventually we should add a “Adding a connector” section to the
CONTRIBUTING.md
Issue Analytics
- State:
- Created 7 years ago
- Reactions:4
- Comments:6 (4 by maintainers)
Top GitHub Comments
@mistercrunch @JohnOmernik Here is the DBAPI for PyDrill. I’m need to package this as its own module, but this does work. Simply install pydrill then copy the contents of this archive into
<python-path>/site-packages/pydrill/
directory. My intent is to submit a PR for PyDrill to get the DBAPI incorporated into that module, but I have to write up unit tests and test it on all the various python versions so that might take a while.Once you have this, then install this: https://github.com/cgivre/sqlalchemy-drill/tree/pydrill using
python setup.py install -- force
and you’ll be able to connect Superset and Drill.dbapi.zip
With all that said, the dialect and the DBAPI do work and I’ve been able to successfully create some charts and graphs with them. I haven’t been able to get any of the timeseries visualizations to work, and I’m not quite sure why yet. I’ve also been able to execute arbitrary SQL statements via SQL Lab but I do run into issues when I try to visualize these statements.
The issue is that when SQLAlchemy generates the new table it does so as follows: SELECT field1, field2 FROM (SELECT field1, field2 FROM table 1 ) AS table1___
When using Joins and subqueries, Drill requires all fields to have table names and thus it throws an error. Also SQLAlchemy seems to generate queries that use column aliases in GROUP BY clauses which also causes problems with Drill.
I’ll go through all the visualizations and write up what works and what doesn’t so that others can take a look as well.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue
.pinned
to prevent stale bot from closing the issue.