How to choose proper download_mode in function load_dataset?
See original GitHub issueHi, I am a beginner to datasets and I try to use datasets to load my csv file. my csv file looks like this
text,label
"Effective but too-tepid biopic",3
"If you sometimes like to go to the movies to have fun , Wasabi is a good place to start .",4
"Emerges as something rare , an issue movie that 's so honest and keenly observed that it does n't feel like one .",5
First I try to use this command to load my csv file .
dataset=load_dataset('csv', data_files=['sst_test.csv'])
It seems good, but when i try to overwrite the convert_options to convert ‘label’ columns from int64 to float32 like this.
import pyarrow as pa
from pyarrow import csv
read_options = csv.ReadOptions(block_size=1024*1024)
parse_options = csv.ParseOptions()
convert_options = csv.ConvertOptions(column_types={'text': pa.string(), 'label': pa.float32()})
dataset = load_dataset('csv', data_files=['sst_test.csv'], read_options=read_options,
parse_options=parse_options, convert_options=convert_options)
It keeps the same:
Dataset(features: {'text': Value(dtype='string', id=None), 'label': Value(dtype='int64', id=None)}, num_rows: 2210)
I think this issue is caused by the parameter “download_mode” Default to REUSE_DATASET_IF_EXISTS because after I delete the cache_dir, it seems right.
Is it a bug? How to choose proper download_mode to avoid this issue?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Loading a Dataset - Hugging Face
load_dataset () function handles already downloaded data by setting its download_mode parameter. By default, download_mode is set to "reuse_dataset_if_exists" .
Read more >ImportError: cannot import name 'Dataset' · Issue #9631 - GitHub
Anyone. Information. Completely install Transformers + datasets. by pip command. The problem arises when using: When i try to import Lib: from ...
Read more >NLP Datasets from HuggingFace: How to Access and Train ...
The load_dataset function will do the following. Download and import in the library the file processing script from the Hugging Face GitHub ...
Read more >by Nabarun Barua | MLearning.ai | Medium
Example taken from Standard Huggingface Dataset Documentation. Step 1: Load the the Dataset from datasets import load_dataset squad = ...
Read more >i.MX RT Knowledge Base - NXP Community
Select the XIP encrypted mode in the MCUXpresso Secure ... if you want to use normal RT chips, how to achieve speech recognition...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
It’s no big deal, but since it can be confusing to users I think it’s worth renaming it, and deprecate
GenerateMode
untildatasets
2.0 at least. IMO it’s confusing to havedownload_mode=GenerateMode.something
download_mode=datasets.GenerateMode.FORCE_REDOWNLOAD
should work. This makes me think we we should rename this to DownloadMode.FORCE_REDOWNLOAD. Currently that’s confusing