question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fails with "_wrap_applied_output() missing 1 required positional argument" where a simple pandas apply succeeds

See original GitHub issue

Hello,

I’m using python 3.8.10 (anaconda distribution, GCC 7.5.10) in Ubuntu LTS 20 64bits x86

From my pip freeze:

pandarallel 1.5.2 pandas 1.3.0 numpy 1.20.3

I’m working with a dataFrame that looks like this one:

HoleID scaffold tpl strand base score tMean tErr modelPrediction ipdRatio coverage isboundary identificationQv context experiment isbegin_bondary isend_boundary isin_IES uniqueID No_known_IES_retention_this_CCS detailed_classif
1025444 70189477 scaffold_024_with_IES 688203 0 T 2 0.517 0.190 0.555 0.931 11 True NaN TTAAATAGAAATTAAAATCAGCTGC NM9_10 False False False NM9_10_70189477 False POTENTIALLY_RETAINED_MACIES_OUTIES
1025446 70189477 scaffold_024_with_IES 688204 0 A 4 1.347 0.367 1.251 1.077 13 True NaN TAAATAGAAATTAAAATCAGCTGCT NM9_10 False False False NM9_10_70189477 False POTENTIALLY_RETAINED_MACIES_OUTIES
1025448 70189477 scaffold_024_with_IES 688205 0 A 5 1.913 0.779 1.464 1.307 16 True NaN AAATAGAAATTAAAATCAGCTGCTT NM9_10 False False False NM9_10_70189477 False POTENTIALLY_RETAINED_MACIES_OUTIES
1025450 70189477 scaffold_024_with_IES 688206 0 A 4 1.535 0.712 1.328 1.156 18 True NaN AATAGAAATTAAAATCAGCTGCTTA NM9_10 False False False NM9_10_70189477 False POTENTIALLY_RETAINED_MACIES_OUTIES
1025452 70189477 scaffold_024_with_IES 688207 0 A 5 1.655 0.565 1.391 1.190 18 True NaN ATAGAAATTAAAATCAGCTGCTTAA NM9_10 False False False NM9_10_70189477 False POTENTIALLY_RETAINED_MACIES_OUTIES

I defined the following function

def get_distance_from_nearest_criteria(df,criteria):
    begins = df[df[criteria]].copy()
    
    if len(begins) == 0:
        return pd.Series([np.nan for x in range(len(df))])
    else:
        list_return = []

        for idx, nt in df.iterrows():
            distances = [abs(nt["tpl"] - x) for x in begins["tpl"]]
            mindistance = min(distances,default=np.nan)
            list_return.append(mindistance)

        return pd.Series(list_return)

Then using :

from pandarallel import pandarallel
pandarallel.initialize(progress_bar=False, nb_workers=12)
out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))

leads to :

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-02fc7c0589e3> in <module>
----> 1 out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))

~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/pandarallel.py in closure(data, func, *args, **kwargs)
    463             )
    464 
--> 465             return reduce(results, reduce_meta_args)
    466 
    467         finally:

~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/data_types/dataframe_groupby.py in reduce(results, df_grouped)
     14         keys, values, mutated = zip(*results)
     15         mutated = any(mutated)
---> 16         return df_grouped._wrap_applied_output(
     17             keys, values, not_indexed_same=df_grouped.mutated or mutated
     18         )

TypeError: _wrap_applied_output() missing 1 required positional argument: 'values'

For me, the error is not clear enough (I can’t tell what’s happening)

However, when I run it with a simple pandas apply :

uniqueID           
HT2_10354935    0      297.0
                1      297.0
                2      296.0
                3      296.0
                4      295.0
                       ...  
NM9_10_9568952  502      NaN
                503      NaN
                504      NaN
                505      NaN
                506      NaN
Length: 1028437, dtype: float64

I’m running all of this in a jupyter notebook

ipykernel 5.3.4 ipython 7.22.0 ipython-genutils 0.2.0 notebook 6.4.0 jupyter 1.0.0 jupyter-client 6.1.12 jupyter-console 6.4.0 jupyter-core 4.7.1 jupyter-dash 0.4.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0

I was wondering if someone could explain me what’s hapenning, and how to fix it if the error is mine. Because it works out of the box with a simple pandas apply, I suppose that there is a small problem in pandarallel

NB: Note also that this code leaves unkilled processes even after I interrupted or restarted the ipython kernel EDIT: Would it be linked to the fact that I’m using a lambda function ?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:8
  • Comments:18 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
Kr4t0ncommented, Aug 11, 2021

One solution is that you could use the pandas version before v1.3.0, for example v1.2.5.

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'1.2.5'
>>> from pandarallel import pandarallel
>>> pandarallel.initialize()
INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
>>> df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])
>>> df
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627
>>> df.groupby('a').apply(lambda grp: grp)
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627
>>> df.groupby('a').parallel_apply(lambda grp: grp)
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627

For version after v1.3.0, the _wrap_applied_output function inside pandas/core/groupby/groupby.py add one positional argument data, therefore, causing this problem.

Screen Shot 2021-08-11 at 5 00 48 PM
3reactions
winglightcommented, Jul 25, 2021

I have the excatly same problem with a normal defined function. And no idea to fix it that seems like a bug in pandarallel.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError Pandas Missing Argument – How to fix
Pandas TypeError - This annoying error means that Pandas can not find an argument ... TypeError: sort_values() missing 1 required positional argument: 'by'....
Read more >
User defined function missing 1 required positional argument
I am passing two arguments in the code, not sure why it is throwing an error message. Could anyone help in rectifying the...
Read more >
pandas.to_numeric — pandas 1.5.2 documentation
Convert argument to a numeric type. The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter...
Read more >
How to use Pandas-Profiling on Google Colab | by Aishah Ismail
there will be an error when you try re-run your notebook, as below; TypeError: load() missing 1 required positional argument: 'Loader'.
Read more >
Pandas DataFrame apply() Examples - DigitalOcean
Pandas DataFrame apply() function is used to apply a function along ... 1 or 'columns'}, default 0. args: The positional arguments to pass ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found