MsSqlHook.get_sqlalchemy_engine uses pyodbc instead of pymssql
See original GitHub issueApache Airflow Provider(s)
microsoft-mssql
Versions of Apache Airflow Providers
apache-airflow-providers-microsoft-mssql==2.0.1
Apache Airflow version
2.2.2
Operating System
Ubuntu 20.04
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened
MsSqlHook.get_sqlalchemy_engine
uses the default mssql driver: pyodbc
instead of pymssql
.
- If pyodbc is installed: we get
sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError)
- Otherwise we get:
ModuleNotFoundError
PS: Looking at the code it should still apply up to provider version 3.0.0 (lastest version).
What you think should happen instead
The default driver used by sqlalchemy.create_engine
for mssql is pyodbc
.
To use pymssql
with create_engine
we need to have the uri start with mssql+pymssql://
(currently the hook uses DBApiHook.get_uri
which starts with mssql://
.
How to reproduce
>>> from contextlib import closing
>>> from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
>>>
>>> hook = MsSqlHook()
>>> with closing(hook.get_sqlalchemy_engine().connect()) as c:
>>> with closing(c.execute("SELECT SUSER_SNAME()")) as res:
>>> r = res.fetchone()
Will raise an exception due to the wrong driver being used.
Anything else
Demo for sqlalchemy default mssql driver choice:
# pip install sqlalchemy
... Successfully installed sqlalchemy-1.4.39
# pip install pymssql
... Successfully installed pymssql-2.2.5
>>> from sqlalchemy import create_engine
>>> create_engine("mssql://test:pwd@test:1433")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2, in create_engine
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/deprecations.py", line 309, in warned
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/create.py", line 560, in create_engine
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/connectors/pyodbc.py", line 43, in dbapi
return __import__("pyodbc")
ModuleNotFoundError: No module named 'pyodbc'
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Apache Airflow - Connection issue to MS SQL Server using ...
It seems come from pyodbc whereas I want to use pymssql (and in MsSqlHook, the method get_conn uses pymssql !) I searched in...
Read more >pymssql - Python driver for SQL Server - Microsoft Learn
This guide describes installing Python, the ODBC Driver for SQL Server, and pyodbc. Sample code shows how to connect to and interact with...
Read more >pymssql vs pyodbc - Google Groups
While pymssql has some limitations and rough edges, it is working reliably to provide access to MS SQL from my application whether the...
Read more >Microsoft SQL Server - SQLAlchemy 1.4 Documentation
Use the information in the identity key instead. ... Both pyodbc and pymssql return values from BIT columns as Python <class 'bool'> so...
Read more >4.3. Databases and database drivers - CRATE - Read the Docs
SQL Server + django-pymssql ... SQL Server (or other) + django-pyodbc-azure ... It is a wrapper around django-mssql that uses pymssql instead of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yep. This is normal that people are creating scenarios in their heads which are not in the heads of the others. It was just a question where I tried to find out what’s your motivation and whether PyODBC is not enough. I can imagine very well that you could use the ODBC one. Maintainers care for maintaining the project, but often when they see a change and propsal they ask questions to find out what the motivations are and where things come from.
Just (so you know) - none of us know everything about those two drivers by heart. Everything there is - is in the code.
If you imagine that any of people here by heart know all the 3000+ classes and 75+ providers implemented - this is a wrong assumption. Much of Airflow code has been contributed by people like you (we have > 2100 contributors) and there is not a single person that knows everything nor has plans about deprecation or removal of any providers there (this is also the reason why we have unit tests - because they ultimately check if the code contributed still works).
If there ar such plans, this is always public on the devlist and the only way it can happen is by updating the code here and making notes in the release notes - there is no “secret organisation” that has some plans on deprecation here.
But there is nothing wrong with asking questions “why” and drawing conclusions from those (but especially jumping to such conclusions you did from just asking a question is a bit premature 😃.
I started working on a fork. I’ll make a PR next week once it’s ready