[KED-2292] Updating pyarrow version constraint
See original GitHub issueDescription
The pyarrow version constraint for ParquetDataSets is now set to =0.12.0, <1.0.0. Pyarrow has since had several major version upgrades (current release is 2.0.0). I am wondering if the version constraint on pyarrow could be relaxed, so we could get more recent versions of pyarrow.
Context
Many other python packages already depend on pyarrow >= 1.0.0, or even >= 2.0.0. An example would be awswrangler. By restricting pyarrow to <1.0.0, this means we either get version conflicts or have to use increasingly outdated packages.
Possible Implementation
I would suggest relaxing the pyarrow version constraint to >=0.12.0, <3.0.0. As per the arrow documentation, files created with any pyarrow version since 0.8.0 should stay readable in versions >= 1.0.0. Files created with pyarrow >= 1.0.0 are, however, not readable for versions < 1.0.0. Version 2.0.0 does not change the data format at all. It does, however, deprecate some functionality in the library (pyarrow.filesystem, pyarrow.serialize, pyarrow.deserialize). I’m not sure if Kedro uses this functionality.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5 (4 by maintainers)

Top Related StackOverflow Question
@debugger24 it’s been fixed by https://github.com/quantumblacklabs/kedro/commit/9acca4688389930b1744e241a94dd20cc5918bb3, will be available in 0.17.1. 😃
Hi @sndrtj thanks a lot for bringing this to our attention! That sounds like a very reasonable request to me. Happy for you to make this contribution if you like, otherwise someone in the team will pick it up. 😊