Unable to rename columns in streaming dataset
See original GitHub issueDescribe the bug
Trying to rename column in a streaming datasets, destroys the features object.
Steps to reproduce the bug
The following code illustrates the error:
from datasets import load_dataset
dataset = load_dataset('mc4', 'en', streaming=True, split='train')
dataset.info.features
# {'text': Value(dtype='string', id=None), 'timestamp': Value(dtype='string', id=None), 'url': Value(dtype='string', id=None)}
dataset = dataset.rename_column("text", "content")
dataset.info.features
# This returned object is now None!
Expected behavior
This should just alter the renamed column.
Environment info
datasets 2.6.1
Issue Analytics
- State:
- Created 10 months ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Cannot rename columns in PBI Desktop when using Di...
I have a PBI Desktop report connected to a series of PBI dataflows. I know DirectQuery supports the renaming of columns but am...
Read more >Renaming column names of a DataFrame in Spark Scala
Hi @zero323 When using withColumnRenamed I am getting AnalysisException can't resolve 'CC8. 1' given input columns... It fails even though CC8.1 ...
Read more >Solved: Rename columns referencing column number rather th...
The biggest issue is that you simply cannot rename column names dynamically (either the old or new name) - you have to "hard...
Read more >How to Rename Column in R - Spark by {Examples}
Use setnames() function from data.table library to change columns with list. data.table is also a third-party library hence, you need to first ...
Read more >Automate dynamic mapping and renaming of column names ...
A common challenge ETL and big data developers face is working with data files that don't have proper name header records.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If we know the features before renaming, then we know the features after renaming, so we can pass the new features to the returned dataset in
rename_column
indeed ! If anyone is interested in contributing, feel free to open a PR and I’d be happy to help / give some pointers 😃#self-assign