question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add method to parse string columns into list of string columns

See original GitHub issue

One of the most needed functions in data analysis is the operation among multiple columns and generation of new columns (and rows). The need can be abstracted to a unified method like List<Column> columnOperate(List<Column>) to accomplish inter-column operation tasks. But now, I encountered an inter-column operation problem which cannot be solved efficiently and elegantly using only a few methods. In fact I found I couldn’t solve this basic need using methods given in tablesaw. Details is provided below.

Let’s say I have a Table named “df” with two columns “multi_ratio” and “amount”.

          df          
 amount  |  multi_ratio  |
--------------------------
    100  |      0.8,0.2  |
    200  |      0.5,0.5  |

Now I need to

  1. split every value in col “multi_ratio” into multiple values (e.g., convert “0.8,0.2” to List of 0.8, 0.2) ,
  2. amount * multi_ratio (split) (e.g., 100 * List of 0.8, 0.2 -> List of 80, 20) ,
  3. result of 2. expanded to multiple rows.

So the final result I need would be.

                      df2                       
 amount  |  multi_ratio_single  |  multiply_result  |
-----------------------------------------------------
    100  |                 0.8  |               80  |
    100  |                 0.2  |               20  |
    200  |                 0.5  |              100  |
    200  |                 0.5  |              100  |

To achieve this goal, I firstly make an empty copy of df and add empty columns.

Table df2 = df.emptyCopy();

df2.addColumns(
        StringColumn.create("multi_ratio_single"),
        DoubleColumn.create("result")
);

Then, I tried to operate on each row of df, to generate new rows and add them to df2, but it seems just not work.

Does anybody have suggests to make this happen efficiently?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
lwhite1commented, Sep 13, 2021

Hi @eric-liuyd https://github.com/eric-liuyd,

I would look at the Table:melt() method, which implements the “tidy” melt operation, but I don’t have time to confirm and write it up.

OTOH, if I were doing it myself, I might do something like what @ccleva suggests: basically copying into a new table in a loop.

Hope one of these works.

larry

On Mon, Sep 13, 2021 at 6:13 AM ccleva @.***> wrote:

Hi @eric-liuyd https://github.com/eric-liuyd,

For this you can write a slightly different (double) loop:

Table df = Table.create("orig", IntColumn.create("amount"), StringColumn.create("multi_ratio"));
df.intColumn("amount").append(100).append(200);
df.stringColumn("multi_ratio").append("0.8,0.2").append("0.5,0.5");

Table df2 = Table.create("result", IntColumn.create("amount"), DoubleColumn.create("multi_ratio_single"));
for(Row row : df) {
    for(String s : row.getString("multi_ratio").split(",")) {
        df2.intColumn("amount").append(row.getInt("amount"));
        df2.doubleColumn("multi_ratio_single").append(Double.parseDouble(s));
    }
}

df2.addColumns(df2.doubleColumn("multi_ratio_single").multiply(df2.intColumn("amount")).setName("multiply_result"));

Note that this code is not production grade …

You probably can also implement it by extracting the columns as in @lwhite1 https://github.com/lwhite1 example and do some pivoting. Not sure which is better.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/986#issuecomment-918042582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAUFQH33W73MAQHBVHTUBXFFTANCNFSM5DBABISA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

0reactions
eric-liuydcommented, Sep 14, 2021

Hi @eric-liuyd,

For this you can write a slightly different (double) loop:

    Table df = Table.create("orig", IntColumn.create("amount"), StringColumn.create("multi_ratio"));
    df.intColumn("amount").append(100).append(200);
    df.stringColumn("multi_ratio").append("0.8,0.2").append("0.5,0.5");

    Table df2 = Table.create("result", IntColumn.create("amount"), DoubleColumn.create("multi_ratio_single"));
    for(Row row : df) {
        for(String s : row.getString("multi_ratio").split(",")) {
            df2.intColumn("amount").append(row.getInt("amount"));
            df2.doubleColumn("multi_ratio_single").append(Double.parseDouble(s));
        }
    }
    
    df2.addColumns(df2.doubleColumn("multi_ratio_single").multiply(df2.intColumn("amount")).setName("multiply_result"));

Note that this code is not production grade …

You probably can also implement it by extracting the columns as in @lwhite1 example and do some pivoting. Not sure which is better.

Hi ccleva. Thanks for your code. It really works. I didn’t get the use of append() before.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python | Pandas Split strings into two List/Columns using str ...
Example #1: Splitting string into list In this data, the split function is used to split the Team column at every “t”.
Read more >
Convert a columns of string to list in pandas - Stack Overflow
Basically the column is saved in a csv file as a string, and I wanna use it as a tuple to be able...
Read more >
Pandas Convert Column to String Type? - Spark by {Examples}
In this article, I will explain how to convert single column or multiple columns to string type in pandas DataFrame, here, I will...
Read more >
How to split a Pandas column string or list into separate columns
First, we'll split or “explode” our string of comma separated values into a Python list using the str.split() function. We'll append the str....
Read more >
How to Split Strings in Pandas: The Beginner's Guide [+ ...
df: The DataFrame you want to insert the new columns into. · start_index: The index to start inserting the new columns at. ·...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found