question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.read_fwf removes leading and trailing whitespace

See original GitHub issue

Code Sample, a copy-pastable example if possible

from io import StringIO
import pandas as pd

data = u""" a bbb
 ccdd """

df = pd.read_fwf(StringIO(data), widths=[3, 3], header=None)

The output is

>>> df.iloc[0,0]
u'a'

Expected Output

u' a '

Problem description

Apparently, leading and trailing whitespaces are removed but I want to keep them. Adding dtype options, converters does not solve the problem. Is this expected behaviour?

I do not think this is intended because if we implement the same example with pd.read_csv(), whitespaces are preserved.

from io import StringIO
import pandas as pd

data = u""" a ,bbb
 cc,dd """

df = pd.read_csv(StringIO(data), header=None)
>>> df.iloc[0, 0]
' a '

For consistency, behaviour should be identical.

The problem is also mentioned on Stackoverflow (https://stackoverflow.com/questions/41558138/pandas-read-fwf-removing-leading-and-trailing-whitespace).

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:3
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
diatomicDisastercommented, Aug 20, 2020

I know this issue is old and closed, but it’s the only place I could find where it’s been discussed. Is there a way to prevent read_fwf from trimming whitespace? In my particular case I’m trying to split a fixed-width string based on the index of the character in that string. If the first characters are whitespace then this breaks the indexing.

The way I see it: a file where the character widths of each column are fixed, regardless of whether the content of each ‘cell’ occupies the full width or not. But either way, there are databases that follow this format, so it would perhaps be good to have the option to switch stripping on or off?

2reactions
chris-b1commented, Jun 26, 2017

I don’t think this is a bug - since fixed width files are by definition white-space padded, stripping that whitespace is a very sane default and probably what most people want.

That said, I think it would be reasonable to add an option to support this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas read_fwf removing leading and trailing whitespace
I'm using the following pandas code to read a file: file = pd.read_fwf( invoicefile, widths = widthslist, names = nameslist, ...
Read more >
pandas.read_fwf — pandas 0.21.1 documentation
Specifies whether or not whitespace (e.g. ' ' or ' ' ) will be used as the sep. Equivalent to setting sep='\s+' ....
Read more >
[Code]-Pandas read_fwf removes white space-pandas
I am reading in a fixed-width file with read_fwf, and I need white space to be preserved, as the form uses blanks as...
Read more >
Dealing with extra white spaces while reading CSV in ...
Id which identifies each row; Street which has initial and trailing white space; City which has leading blank space; Salary which is numeric ......
Read more >
How to remove trailing whitespaces from column headers in ...
In this blog post you'll learn how to remove trailing and leading whitespaces from column names in a Pandas DataFrame.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found