question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: using dtype=str in pd.read_sql_query casts nans to strings instead of nan

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

# Table contains columns with nulls 
import sqlite3
import pandas as pd

con = sqlite3.connect('example.db')
cur = con.cursor()
cur.execute('''CREATE TABLE IF NOT EXISTS Sample
               (col1,col2)''')
cur.execute("INSERT INTO Sample VALUES ('val1',NULL)")
con.commit()

df = pd.read_sql_query("SELECT * from Sample", con, dtype=str)
con.close()

print(df.info()) 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   col1    2 non-null      object
 1   col2    2 non-null      object
dtypes: object(2)
memory usage: 160.0+ bytes
print(df.to_markdown())

|    | col1   | col2   |
|---:|:-------|:-------|
|  0 | val1   | None   |
|  1 | val1   | None   |

Issue Description

Parsing dtypes to str on all columns results on ignoring nans and parsing nulls to strings instead of keeping them as nans.

Expected Behavior

Similar to the pd.read_csv(…,dtype=str) , where nans are taken into account.

Installed Versions

pandas=1.3.5 Python=3.8 pyodbc=4.0.3

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Dec 28, 2021

Hi,

Could you provide something reproducible/describe the steps necessary to reproduce?

0reactions
Husseinjdcommented, Dec 30, 2021

This is not related to read_sql. The conversion happens in

df = pd.DataFrame({"a": ["val"], "b": None})
df.astype(str)

In general I would recommend using object dtype instead of str. We don#t have much support for str

Using object solved the issue thanks !

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to turn unwanted string values into NaNs in pandas
Per the Documentation it will replace any value which cannot be converted with NaN import pandas as pd df = pd.
Read more >
Pandas Read SQL Query or Table with Examples
pandas read_sql() function is used to read SQL query or database table into DataFrame. This is a wrapper on read_sql_query() and read_sql_table()
Read more >
pandas.read_sql_query — pandas 1.5.2 documentation
Read SQL query into a DataFrame. Returns a DataFrame corresponding to the result set of the query string. Optionally provide an index_col parameter...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found