question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I/O operation on closed file

See original GitHub issue

What happened:

I want to open a netcdf file from s3 using xarray and preserve a lazy reference to the underlying data. This is what I tried

import xarray as xr
import s3fs

url = 'noaa-goes16/ABI-L2-RRQPEF/2020/001/00/OR_ABI-L2-RRQPEF-M6_G16_s20200010000216_e20200010009524_c20200010010034.nc'
fs = s3fs.S3FileSystem(anon=True)
with fs.open(url) as f:
    ds = xr.open_dataset(f)

This raises the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-20010df1a169> in <module>
      5 fs = s3fs.S3FileSystem(anon=True)
      6 with fs.open(url) as f:
----> 7     ds = xr.open_dataset(f)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
    543 
    544     with close_on_error(store):
--> 545         ds = maybe_decode_store(store)
    546 
    547     # Ensure source filename always stored in dataset object (GH issue #2550)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock)
    457             drop_variables=drop_variables,
    458             use_cftime=use_cftime,
--> 459             decode_timedelta=decode_timedelta,
    460         )
    461 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    594         drop_variables=drop_variables,
    595         use_cftime=use_cftime,
--> 596         decode_timedelta=decode_timedelta,
    597     )
    598     ds = Dataset(vars, attrs=attrs)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    496             stack_char_dim=stack_char_dim,
    497             use_cftime=use_cftime,
--> 498             decode_timedelta=decode_timedelta,
    499         )
    500         if decode_coords:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime, decode_timedelta)
    336         var = times.CFTimedeltaCoder().decode(var, name=name)
    337     if decode_times:
--> 338         var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
    339 
    340     dimensions, data, attributes, encoding = variables.unpack_for_decoding(var)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode(self, variable, name)
    425             units = pop_to(attrs, encoding, "units")
    426             calendar = pop_to(attrs, encoding, "calendar")
--> 427             dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
    428             transform = partial(
    429                 decode_cf_datetime,

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime)
     71     values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
     72     example_value = np.concatenate(
---> 73         [first_n_items(values, 1) or [0], last_item(values) or [0]]
     74     )
     75 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/formatting.py in first_n_items(array, n_desired)
     68         indexer = _get_indexer_at_least_n_items(array.shape, n_desired, from_end=False)
     69         array = array[indexer]
---> 70     return np.asarray(array).flat[:n_desired]
     71 
     72 

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    489 
    490     def __array__(self, dtype=None):
--> 491         return np.asarray(self.array, dtype=dtype)
    492 
    493     def __getitem__(self, key):

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    555     def __array__(self, dtype=None):
    556         array = as_indexable(self.array)
--> 557         return np.asarray(array[self.key], dtype=None)
    558 
    559     def transpose(self, order):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in __getitem__(self, key)
     27     def __getitem__(self, key):
     28         return indexing.explicit_indexing_adapter(
---> 29             key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR, self._getitem
     30         )
     31 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    835     """
    836     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 837     result = raw_indexing_method(raw_key.tuple)
    838     if numpy_indices.tuple:
    839         # index the loaded np.ndarray

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in _getitem(self, key)
     36         with self.datastore.lock:
     37             array = self.get_array(needs_lock=False)
---> 38             return array[key]
     39 
     40 

/srv/conda/envs/notebook/lib/python3.7/site-packages/h5netcdf/core.py in __getitem__(self, key)
    144 
    145     def __getitem__(self, key):
--> 146         return self._h5ds[key]
    147 
    148     def __setitem__(self, key, value):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

/srv/conda/envs/notebook/lib/python3.7/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
    541             arr = numpy.ndarray(selection.mshape, dtype=new_dtype)
    542             for mspace, fspace in selection:
--> 543                 self.id.read(mspace, fspace, arr, mtype)
    544             if len(names) == 1:
    545                 arr = arr[names[0]]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()

h5py/defs.pyx in h5py.defs.H5Dread()

h5py/h5fd.pyx in h5py.h5fd.H5FD_fileobj_read()

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in readinto(self, b)
   1246         https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
   1247         """
-> 1248         data = self.read(len(b))
   1249         memoryview(b).cast("B")[: len(data)] = data
   1250         return len(data)

/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/spec.py in read(self, length)
   1232             length = self.size - self.loc
   1233         if self.closed:
-> 1234             raise ValueError("I/O operation on closed file.")
   1235         logger.debug("%s read: %i - %i" % (self, self.loc, self.loc + length))
   1236         if length == 0:

ValueError: I/O operation on closed file.

What you expected to happen:

I would like to not get an error and have the code work. I can make it work by doing

    ds = xr.open_dataset(f).load()

However, I lose the lazy access. I can also make it work by avoiding the context manager:

ds = xr.open_dataset(fs.open(url))

However, I’m not sure this is a recommended practice.

For context, I need to open 30,000 such files and pass them around a dask distributed cluster.

Environment:

  • s3fs version: 0.4.2
  • Dask version: 2.21.0
  • Python version: 3.7.6
  • Operating System: linux
  • Install method (conda, pip, source): conda

cc @rrsignell-usgs

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Nov 3, 2021

If not the context manager, then the close should still trigger when the file is garbage collected - that’s the intended behaviour.

0reactions
betolinkcommented, Nov 3, 2021

Oh right my bad! I just realized that when it closes the file there is no buffer to read.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python ValueError: I/O operation on closed file Solution
The “ValueError : I/O operation on closed file” error is raised when you try to read from or write to a file that...
Read more >
ValueError : I/O operation on closed file - Stack Overflow
When I try to write to the file it reports the error: ValueError: I/O operation on closed file. python · csv · file-io...
Read more >
ValueError : I/O operation on closed file ( Solved )
The most common reason for getting this I/O operation on a closed file error is that you are performing an input or output...
Read more >
ValueError: I/O operations on closed file - STechies
After the file has created we entered some data into it using the write() method. After the data has been entered we closed...
Read more >
[Solved] ValueError: I/O operation on closed file. - Python Pool
ValueError i/o operation on closed file, BytesIO ... BytesIO is used for manipulating bytes data in memory and is part of the io...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found