Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trying to use metric.compute but get OSError

See original GitHub issue

I want to use metric.compute from load_metric(‘accuracy’) to get training accuracy, but receive OSError. I am wondering what is the mechanism behind the metric calculation, why would it report an OSError?

195     for epoch in range(num_train_epochs):
196         model.train()
197         for step, batch in enumerate(train_loader):
198             # print(batch['input_ids'].shape)
199             outputs = model(**batch)
200
201             loss = outputs.loss
202             loss /= gradient_accumulation_steps
203             accelerator.backward(loss)
204
205             predictions = outputs.logits.argmax(dim=-1)
206             metric.add_batch(
207                 predictions=accelerator.gather(predictions),
208                 references=accelerator.gather(batch['labels'])
209             )
210             progress_bar.set_postfix({'loss': loss.item(), 'train batch acc.': train_metrics})
211
212             if (step + 1) % 50 == 0 or step == len(train_loader) - 1:
213                 train_metrics = metric.compute()

the error message is as below:

Traceback (most recent call last):
  File "run_multi.py", line 273, in <module>
    main()
  File "/home/yshuang/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/yshuang/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/yshuang/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/yshuang/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "run_multi.py", line 213, in main
    train_metrics = metric.compute()
  File "/home/yshuang/.local/lib/python3.8/site-packages/datasets/metric.py", line 391, in compute
    self._finalize()
  File "/home/yshuang/.local/lib/python3.8/site-packages/datasets/metric.py", line 342, in _finalize
    self.writer.finalize()
  File "/home/yshuang/.local/lib/python3.8/site-packages/datasets/arrow_writer.py", line 370, in finalize
    self.stream.close()
  File "pyarrow/io.pxi", line 132, in pyarrow.lib.NativeFile.close
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: error closing file

Environment info

datasets version: 1.6.1
Platform: Linux NAME=“Ubuntu” VERSION=“20.04.1 LTS (Focal Fossa)”
Python version: python3.8.5
PyArrow version: 4.0.0

Issue Analytics

State:
Created 2 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

lhoestqcommented, Sep 6, 2021

Hi ! By default it caches the predictions and references used to compute the metric in ~/.cache/huggingface/datasets/metrics (not ~/.datasets/). Let me update the documentation @bhavitvyamalik .

The cache is used to store all the predictions and references passed to add_batch for example in order to compute the metric later when compute is called.

I think the issue might come from the cache directory that is used by default. Can you check that you have the right permissions ? Otherwise feel free to set cache_dir to another location.

0reactions

lvwerracommented, Jun 14, 2022

Closing this for now. Will re-open it should the issue still persist.