question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorFlow Question-Answering example fails to run (cardinality error)

See original GitHub issue

Environment info

  • transformers version: 4.4.0.dev0
  • Platform: Linux-4.15.0-111-generic-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.8
  • PyTorch version (GPU?): 1.7.1 (False)
  • Tensorflow version (GPU?): 2.2.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Information

Model I am using (Bert, XLNet …): bert-base-uncased or roberta-base

The problem arises when using:

  • the official example scripts: question-answering (run_tf_squad.py)

Error message: Instructions for updating: back_prop=False is deprecated. Consider using tf.stop_gradient instead. Instead of: results = tf.map_fn(fn, elems, back_prop=False) Use: results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems)) 87599it [01:00, 1437.03it/s] 10570it [00:11, 958.83it/s] convert squad examples to features: 2%|_ | 1697/87599 [00:13<10:43, 133.40it/s][WARNING|squad.py:118] 2021-02-17 22:20:03,736 >> Could not find answer: ‘municipal building and’ vs. ‘a municipal building’ convert squad examples to features: 50%|_____ | 43393/87599 [05:04<05:04, 145.24it/s][WARNING|squad.py:118] 2021-02-17 22:24:55,103 >> Could not find answer: ‘message stick,’ vs. ‘a message stick’ convert squad examples to features: 100%|| 87599/87599 [10:10<00:00, 143.59it/s] add example index and unique id: 100%|| 87599/87599 [00:00<00:00, 784165.53it/s] convert squad examples to features: 100%|| 10570/10570 [01:14<00:00, 140.99it/s] add example index and unique id: 100%|| 10570/10570 [00:00<00:00, 510000.04it/s] [WARNING|integrations.py:60] 2021-02-17 22:31:16,214 >> Using the WAND_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none). [INFO|trainer_tf.py:125] 2021-02-17 22:31:16,214 >> To use comet_ml logging, run pip/conda install comet_ml see https://www.comet.ml/docs/python-sdk/huggingface/ Traceback (most recent call last): File “run_tf_squad.py”, line 256, in <module> main() File “run_tf_squad.py”, line 250, in main trainer.train() File “/home/transformers/src/transformers/trainer_tf.py”, line 457, in train train_ds = self.get_train_tfdataset() File “/home/transformers/src/transformers/trainer_tf.py”, line 141, in get_train_tfdataset self.num_train_examples = self.train_dataset.cardinality().numpy() AttributeError: ‘_AssertCardinalityDataset’ object has no attribute ‘cardinality’

The tasks I am working on is:

  • an official GLUE/SQUaD task: SQUaD v1

To reproduce

  1. Use the latest master from huggingface/transformers
  2. Go to examples/question-answering
  3. Run WANDB_DISABLED=true python run_tf_squad.py --model_name_or_path roberta-base --output_dir model --max_seq_length 384 --num_train_epochs 2 --per_device_train_batch_size 8 --per_device_eval_batch_size 16 --do_train --do_eval --logging_dir logs --logging_steps 10 --learning_rate 3e-5 --no_cuda=True --doc_stride 128

Could you take a look @sgugger?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

6reactions
jeehunkangcommented, Jul 9, 2022

Hello, I am getting the same error.

It says “AttributeError: ‘Dataset’ object has no attribute ‘cardinality’” when I train it. Does anyone know how I should address this issue?

0reactions
github-actions[bot]commented, Apr 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cardinality issue when training bert from scratch (tensorflow)
I saw a related issue here TensorFlow Question-Answering example fails to run (cardinality error) · Issue #10246 · huggingface/transformers ...
Read more >
How to fix cardinality error in my CNN model - python
I used the following code to generate random data and was able to replicate the issue with it. # Generating random X and...
Read more >
Recovering Question Answering Errors via Query Revision
r = r1r2 ...rk is used to retrieve an answer set. Figure 2: Illustration of different question revision strategies on the running example...
Read more >
"ValueError: Data cardinality is ambiguous" in model ...
So the model takes one sample each time to move it through layers. Problem: Now, regarding you have 502 and 1002 samples, the...
Read more >
Pre-training Summarization Models of Structured Datasets
We consider the problem of pre-training models which convert structured datasets into succinct summaries that can be used to answer cardinality estimation ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found