question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enable dictionary format on `dask.dataframe.DataFrame.to_bag` output

See original GitHub issue

The current dask.dataframe.DataFrame.to_bag behavior is as follow:

import pandas as pd
import dask.dataframe as dd
import dask.bag as db

my_dict = {
    'Name' : ['John', 'Lara', 'James', 'Pablo'],
    'Age' : [23, 21, 22, 21],
    'Fruit' : ['orange', 'apple', 'apricot', 'cherry'],
}

df = pd.DataFrame(my_dict)
ddf = dd.from_pandas(df, npartitions=3)

ddf_bag = ddf.to_bag()
ddf_bag.compute()
[('John', 23, 'orange'),
 ('Lara', 21, 'apple'),
 ('James', 22, 'apricot'),
 ('Pablo', 21, 'cherry')]

This loses the information of the column names, I would like to see something like this as an output

[{'name': 'John', 'Age': 23, 'Fruit':'orange'},
 {'name':'Lara', 'Age': 21, 'Fruit':'apple'},
 {'name':'James', 'Age': 22, 'Fruit':'apricot'},
 {'name':'Pablo', 'Age':21, 'Fruit':'cherry'}]

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
jsignellcommented, Aug 18, 2021

Closing this - thanks @rajagurunath

1reaction
mrocklincommented, Jul 24, 2021

Welcome! I encouraged you to submit a pull request, even if it is not done yet. Pull requests make it easier for maintainers to view and to engage in conversation.

On Sat, Jul 24, 2021, 1:13 PM gurunath @.***> wrote:

Hi @jrbourbeau https://github.com/jrbourbeau @mrocklin https://github.com/mrocklin @ncclementi https://github.com/ncclementi

Thanks for the detailed discussion, based on the comments above, implemented the initial draft version here: @.*** https://github.com/rajagurunath/dask/commit/ce6812f69c3f0384e79b2927addbd0025d6bf520

I am kind of new to the dask community, Can someone have a look into the above commit and help me with further suggestions/Best practices, etc. Very much happy to take this as my good first issue ❤️

Thanks in Advance

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/7920#issuecomment-886090922, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTGOHYG2YIG2L2PSACLTZL7EZANCNFSM5AWZ7B6Q .

Read more comments on GitHub >

github_iconTop Results From Across the Web

DataFrame.to_bag - Dask documentation
Create Dask Bag from a Dask DataFrame. Parameters. indexbool, optional ... Whether to return a bag of tuples, dictionaries, or dataframe-like objects.
Read more >
DataFrame from bag of sequence of dicts can only infer ...
When creating a dask DataFrame from a bag pointing to a sequence of dicts, columns are only computed based on the first dict...
Read more >
convert dask.bag of dictionaries to dask.dataframe using dask ...
I am struggling to convert a dask.bag of dictionaries into dask.delayed pandas.DataFrames into a final dask.dataframe.
Read more >
Filtering Syntax | Dash for Python Documentation | Plotly
... conditional formatting, editing, sorting, filtering, and more. ... from dash.dependencies import Input, Output import datetime import pandas as pd df ...
Read more >
Dask method on dataframe to return a dictionary of applied ...
Coding example for the question Dask method on dataframe to return a dictionary of applied method results-Pandas,Python.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found