question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError: assign() keywords must be strings

See original GitHub issue

I am running into issues converting a dataframe to parquet on Heroku (Ubuntu 20.04). The code works perfectly on my local windows machine. The dataframe has a multiindex with dtypes datetime, str. I receive the error below.

File “/app/.heroku/python/lib/python3.8/site-packages/fastparquet/writer.py”, line 935, in write 2021-12-31T14:17:00.091844+00:00 app[web.1]: data = data.assign(**{name: pd.Categorical.from_codes(codes, cats)}) 2021-12-31T14:17:00.091845+00:00 app[web.1]: TypeError: assign() keywords must be strings

df.to_parquet()

“”" close 2007-01-01 SPY 140.54 2007-01-08 SPY 143.24 2007-01-15 SPY 142.82 2007-01-22 SPY 142.13 2007-01-29 SPY 144.81 … … 2021-11-29 SPY 453.42 2021-12-06 SPY 470.74 “”"

Environment:

  • fastparquet: 0.7.2
  • Python version: 1.3.5
  • Operating System: Ubuntu 20.04
  • Install method (conda, pip, source): pip, pypi

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Dec 31, 2021

OK, leaving this open, but I have no plans to work on it in the near as term. As you say, the workaround is simple.

0reactions
yohplalacommented, Dec 31, 2021

I think @yohplala fixed the issue with None names of multi-index levels recently.

Hi, What I have solved for column multi index in the PR just recently merged is managing empty string '' for a column name.

But i understand here that the trouble is with None being used for level names. I confirm that in a dummy branch not merged, I did also ‘proposed a fix’ for it, but I did not port the fix to PR #729 as it would have been ‘yet another item’ and one I am not so sure about possible side effects and did not want to spend too much time on it (the workaround is simple, it is enough to provide a name to levels).

So basically, the trouble is when your define your column multi-index without name for levels.

# Notice no 'name' parameter is being provided.
# This is fully acceptable by pandas, but not by fastparquet.
cmidx = pd.MultiIndex([('a', '1'),('b','1')])

In fastparquet, an issue is then raised in util.get_column_metadata(), line 335.

    if isinstance(name, tuple):
        name = str(name)
    elif not isinstance(name, str):
        raise TypeError(
            'Column name must be a string. Got column {} of type {}'.format(
                name, type(name).__name__
            )
        )

I ‘solved’ it this way, with an additional if to manage the None case.

    if isinstance(name, tuple):
        name = str(name)
    elif is None:
        name = ''
    elif not isinstance(name, str):
        raise TypeError(
            'Column name must be a string. Got column {} of type {}'.format(
                name, type(name).__name__
            )
        )

Here, if I get it right, name is the name of the column index, not of a specific column. When it is a column multi-index, its name is then a tuple. Each values of the tuple is a level name actually. And it can be None.

With the fix above, the exception was not raised anylonger, and I could read the dataframe back, this was ok. The thing I am not easy with is ‘is name used elsewhere?’ (has setting it '' a side effect?)

At least, I should have:

  • created a test case with an assert and checking df_init.equals(df_recorded)
  • run all the other test cases

But i did not delved into that, not yet, one thing at a time 😃 (willing to move on with PR #712 first here 😃)

I propose to keep the ticket open till we come back to this and solve it (I think we just need to confirm the fix is ok). Bests, and soon, happy new year!!! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: keywords must be strings - python - Stack Overflow
The error 'keywords must be strings' refers to this class: I don't understand which variable is not a string. 'pos' seems to be...
Read more >
Assignments to integer column name fail · Issue #4922 - GitHub
Dask can't handle assignments to integer column names, returning TypeError: assign() keywords must be a string . Objective:
Read more >
filter() keywords must be strings - Django's bug tracker
When using Many2Many fields with unicode data, it raises “TypeError: filter() keywords must be strings” on <model>.<m2m>.all(). The solution is same as in ......
Read more >
Issue 2646: Python does not accept unicode keywords
msg65567 ‑ (view) Author: John (J5) Palmieri (j5) Date: 2008‑04‑16 21:40 msg65568 ‑ (view) Author: John (J5) Palmieri (j5) Date: 2008‑04‑16 22:15 msg65572 ‑ (view)...
Read more >
[Example code]-TypeError: keywords must be strings
... in __init__ super().__init__(dict(SOUNDS, **reads)) TypeError: keywords must be strings. The error 'keywords must be strings' refers to this class:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found