question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`Groupby` behaves differently depending on the order of the columns

See original GitHub issue

Describe the bug When creating a DataFrame, depending on the order of the columns the groupby() function works properly or returns an error.

To Reproduce This column order works perfectly:

let data = {
    worker: ["david", "david", "john", "alice", "john", "david"],
    hours: [5, 6, 2, 8, 4, 3],
    day: ["monday", "tuesday", "wednesday", "thursday", "friday", "friday"],
};
let df = new dfd.DataFrame(data);

df.groupby(["day"]).col(["hours"]).sum().print()

// ╔════════════╀═══════════════════╀═══════════════════╗
// β•‘            β”‚ day               β”‚ hours_sum         β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 0          β”‚ monday            β”‚ 5                 β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 1          β”‚ tuesday           β”‚ 6                 β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 2          β”‚ wednesday         β”‚ 2                 β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 3          β”‚ thursday          β”‚ 8                 β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 4          β”‚ friday            β”‚ 7                 β•‘
// β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

df.groupby(["worker"]).count().print()
// ╔════════════╀═══════════════════╀═══════════════════╀═══════════════════╗
// β•‘            β”‚ worker            β”‚ hours_count       β”‚ day_count         β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 0          β”‚ david             β”‚ 3                 β”‚ 3                 β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 1          β”‚ john              β”‚ 2                 β”‚ 2                 β•‘
// β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
// β•‘ 2          β”‚ alice             β”‚ 1                 β”‚ 1                 β•‘
// β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

But when I change the column order to the following it doesn’t work:

let data = {
    hours: [5, 6, 2, 8, 4, 3],
    worker: ["david", "david", "john", "alice", "john", "david"],
    day: ["monday", "tuesday", "wednesday", "thursday", "friday", "friday"],
};
let df = new dfd.DataFrame(data);

df.groupby(["day"]).col(["hours"]).sum().print()
// Uncaught Error: Can't perform math operation on column hours
//    arithemetic groupby.ts:266
//    operations groupby.ts:417
//    count groupby.ts:431

df.groupby(["worker"]).count().print()
// Uncaught Error: Can't perform math operation on column hours
//    arithemetic groupby.ts:266
//    operations groupby.ts:417
//    count groupby.ts:431

Expected behavior I would expect that changing the order of the columns wouldn’t make any change on the result.

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Firefox v97.0.1, Chrome v98.0.4758.102, Edge v98.0.1108.56
  • Version: -

Additional context I’m using the browser version, not the node.js one.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
igonrocommented, Feb 22, 2022

Now I’m having a problem with ChromeHeadless in WSL2.

No binary for ChromeHeadless browser on your platform

I saw that @sponsfreixes had a similar issue in #173, I will try to fix it myself πŸ˜…

1reaction
igonrocommented, Feb 22, 2022

Thanks @risenW, I will try it!

Read more comments on GitHub >

github_iconTop Results From Across the Web

group by pandas dataframe and select latest in each group
In my tests, last() behaves a bit differently than nth(), when there are None values in the same column. For example, if first...
Read more >
pandas GroupBy: Your Guide to Grouping Data in Python
groupby () can accept several different arguments: A column or list of columns; A dict or pandas Series; A NumPy array or pandas...
Read more >
Group by: split-apply-combine - Pandas
Splitting the data into groups based on some criteria. ... If we also have a MultiIndex on columns A and B , we...
Read more >
Groupby behaves differently when using levels and list of ...
When grouping by several levels of a MultiIndex, groupby evaltuates all possible combinations of the groupby keys. When grouping by column ...
Read more >
Pandas Groupby Sort within Groups - Spark by {Examples}
You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found