Second concatenation of datasets produces errors
See original GitHub issueHi,
I am need to concatenate my dataset with others several times, and after I concatenate it for the second time, the features of features (e.g. tags names) are collapsed. This hinders, for instance, the usage of tokenize function with data.map
.
from datasets import load_dataset, concatenate_datasets
data = load_dataset('trec')['train']
concatenated = concatenate_datasets([data, data])
concatenated_2 = concatenate_datasets([concatenated, concatenated])
print('True features of features:', concatenated.features)
print('\nProduced features of features:', concatenated_2.features)
outputs
True features of features: {'label-coarse': ClassLabel(num_classes=6, names=['DESC', 'ENTY', 'ABBR', 'HUM', 'NUM', 'LOC'], names_file=None, id=None), 'label-fine': ClassLabel(num_classes=47, names=['manner', 'cremat', 'animal', 'exp', 'ind', 'gr', 'title', 'def', 'date', 'reason', 'event', 'state', 'desc', 'count', 'other', 'letter', 'religion', 'food', 'country', 'color', 'termeq', 'city', 'body', 'dismed', 'mount', 'money', 'product', 'period', 'substance', 'sport', 'plant', 'techmeth', 'volsize', 'instru', 'abb', 'speed', 'word', 'lang', 'perc', 'code', 'dist', 'temp', 'symbol', 'ord', 'veh', 'weight', 'currency'], names_file=None, id=None), 'text': Value(dtype='string', id=None)}
Produced features of features: {'label-coarse': Value(dtype='int64', id=None), 'label-fine': Value(dtype='int64', id=None), 'text': Value(dtype='string', id=None)}
I am using datasets
v.1.11.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
15.3 - Concatenating Two or More Data Sets | STAT 481
To concatenate two or more SAS data sets means to stack one "on top" of the other into a single SAS data set....
Read more >Concatenating Data Sets with the SET Statement
The following program creates the SALES and CUSTOMER_SUPPORT data sets ... To concatenate the two data sets, list them in the SET statement....
Read more >Why does my memory usage explode when concatenating ...
In this article we will take a look at a memory issue that I've run into multiple times in real life datasets -...
Read more >Dataset concatenation from random link split but it just ends ...
while i can concatenate the two datasets successfully. when trying to access the concatenated dataset im ending up with a keyerror in every ......
Read more >Combining Datasets: Concat and Append
While this is valid within DataFrame s, the outcome is often undesirable. pd.concat() gives us a few ways to handle it. Catching the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @Aktsvigun, thanks for reporting.
I’m investigating this.
@albertvillanova