Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.read_fwf removes leading and trailing whitespace

See original GitHub issue

Code Sample, a copy-pastable example if possible

from io import StringIO
import pandas as pd

data = u""" a bbb
 ccdd """

df = pd.read_fwf(StringIO(data), widths=[3, 3], header=None)

The output is

>>> df.iloc[0,0]
u'a'

Expected Output

u' a '

Problem description

Apparently, leading and trailing whitespaces are removed but I want to keep them. Adding dtype options, converters does not solve the problem. Is this expected behaviour?

I do not think this is intended because if we implement the same example with pd.read_csv(), whitespaces are preserved.

from io import StringIO
import pandas as pd

data = u""" a ,bbb
 cc,dd """

df = pd.read_csv(StringIO(data), header=None)

>>> df.iloc[0, 0]
' a '

For consistency, behaviour should be identical.

The problem is also mentioned on Stackoverflow (https://stackoverflow.com/questions/41558138/pandas-read-fwf-removing-leading-and-trailing-whitespace).

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Reactions:3
Comments:8 (2 by maintainers)

Top GitHub Comments

3reactions

diatomicDisastercommented, Aug 20, 2020

I know this issue is old and closed, but it’s the only place I could find where it’s been discussed. Is there a way to prevent read_fwf from trimming whitespace? In my particular case I’m trying to split a fixed-width string based on the index of the character in that string. If the first characters are whitespace then this breaks the indexing.

The way I see it: a file where the character widths of each column are fixed, regardless of whether the content of each ‘cell’ occupies the full width or not. But either way, there are databases that follow this format, so it would perhaps be good to have the option to switch stripping on or off?

2reactions

chris-b1commented, Jun 26, 2017

I don’t think this is a bug - since fixed width files are by definition white-space padded, stripping that whitespace is a very sane default and probably what most people want.

That said, I think it would be reasonable to add an option to support this.