Fails with "_wrap_applied_output() missing 1 required positional argument" where a simple pandas apply succeeds

Hello,

I’m using python 3.8.10 (anaconda distribution, GCC 7.5.10) in Ubuntu LTS 20 64bits x86

From my pip freeze:

pandarallel 1.5.2 pandas 1.3.0 numpy 1.20.3

I’m working with a dataFrame that looks like this one:

	HoleID	scaffold	tpl	base	score	tMean	tErr	modelPrediction	ipdRatio	coverage	isboundary	identificationQv	context	experiment	isbegin_bondary	isend_boundary	isin_IES	uniqueID	No_known_IES_retention_this_CCS	detailed_classif
1025444	70189477	scaffold_024_with_IES	688203	T	2	0.517	0.190	0.555	0.931	11	True	NaN	TTAAATAGAAATTAAAATCAGCTGC	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025446	70189477	scaffold_024_with_IES	688204	A	4	1.347	0.367	1.251	1.077	13	True	NaN	TAAATAGAAATTAAAATCAGCTGCT	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025448	70189477	scaffold_024_with_IES	688205	A	5	1.913	0.779	1.464	1.307	16	True	NaN	AAATAGAAATTAAAATCAGCTGCTT	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025450	70189477	scaffold_024_with_IES	688206	A	4	1.535	0.712	1.328	1.156	18	True	NaN	AATAGAAATTAAAATCAGCTGCTTA	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025452	70189477	scaffold_024_with_IES	688207	A	5	1.655	0.565	1.391	1.190	18	True	NaN	ATAGAAATTAAAATCAGCTGCTTAA	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES

I defined the following function

def get_distance_from_nearest_criteria(df,criteria):
    begins = df[df[criteria]].copy()
    
    if len(begins) == 0:
        return pd.Series([np.nan for x in range(len(df))])
    else:
        list_return = []

        for idx, nt in df.iterrows():
            distances = [abs(nt["tpl"] - x) for x in begins["tpl"]]
            mindistance = min(distances,default=np.nan)
            list_return.append(mindistance)

        return pd.Series(list_return)

Then using :

from pandarallel import pandarallel
pandarallel.initialize(progress_bar=False, nb_workers=12)
out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))

leads to :

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-02fc7c0589e3> in <module>
----> 1 out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))

~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/pandarallel.py in closure(data, func, *args, **kwargs)
    463             )
    464 
--> 465             return reduce(results, reduce_meta_args)
    466 
    467         finally:

~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/data_types/dataframe_groupby.py in reduce(results, df_grouped)
     14         keys, values, mutated = zip(*results)
     15         mutated = any(mutated)
---> 16         return df_grouped._wrap_applied_output(
     17             keys, values, not_indexed_same=df_grouped.mutated or mutated
     18         )

TypeError: _wrap_applied_output() missing 1 required positional argument: 'values'

For me, the error is not clear enough (I can’t tell what’s happening)

However, when I run it with a simple pandas apply :

uniqueID           
HT2_10354935    0      297.0
                1      297.0
                2      296.0
                3      296.0
                4      295.0
                       ...  
NM9_10_9568952  502      NaN
                503      NaN
                504      NaN
                505      NaN
                506      NaN
Length: 1028437, dtype: float64

I’m running all of this in a jupyter notebook

ipykernel 5.3.4 ipython 7.22.0 ipython-genutils 0.2.0 notebook 6.4.0 jupyter 1.0.0 jupyter-client 6.1.12 jupyter-console 6.4.0 jupyter-core 4.7.1 jupyter-dash 0.4.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0

I was wondering if someone could explain me what’s hapenning, and how to fix it if the error is mine. Because it works out of the box with a simple pandas apply, I suppose that there is a small problem in pandarallel

NB: Note also that this code leaves unkilled processes even after I interrupted or restarted the ipython kernel EDIT: Would it be linked to the fact that I’m using a lambda function ?

Issue Analytics

State:
Created 2 years ago
Reactions:8
Comments:18 (5 by maintainers)

Top GitHub Comments

3reactions

Kr4t0ncommented, Aug 11, 2021

One solution is that you could use the pandas version before v1.3.0, for example v1.2.5.

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'1.2.5'
>>> from pandarallel import pandarallel
>>> pandarallel.initialize()
INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
>>> df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])
>>> df
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627
>>> df.groupby('a').apply(lambda grp: grp)
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627
>>> df.groupby('a').parallel_apply(lambda grp: grp)
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627

For version after v1.3.0, the _wrap_applied_output function inside pandas/core/groupby/groupby.py add one positional argument data, therefore, causing this problem.

3reactions

winglightcommented, Jul 25, 2021

I have the excatly same problem with a normal defined function. And no idea to fix it that seems like a bug in pandarallel.

Top Results From Across the Web

TypeError Pandas Missing Argument – How to fix

Pandas TypeError - This annoying error means that Pandas can not find an argument ... TypeError: sort_values() missing 1 required positional argument: 'by'....

User defined function missing 1 required positional argument

I am passing two arguments in the code, not sure why it is throwing an error message. Could anyone help in rectifying the...

pandas.to_numeric — pandas 1.5.2 documentation

Convert argument to a numeric type. The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter...

How to use Pandas-Profiling on Google Colab | by Aishah Ismail

there will be an error when you try re-run your notebook, as below; TypeError: load() missing 1 required positional argument: 'Loader'.

Pandas DataFrame apply() Examples - DigitalOcean

Pandas DataFrame apply() function is used to apply a function along ... 1 or 'columns'}, default 0. args: The positional arguments to pass ......