Format Breaking Changes Candidates
See original GitHub issueThis is a meta issue describing which changes we would like to make now but which are incompatible with the current format.
Type Normalization
The following type normalizations are currently not implemented but could be:
decimal128[P, S] -> decimal128[38, S]
(38 is the max for 128 bits)date{32, 64} -> date64
time{32, 64}[U] -> time64[U]
- structs (nested normalization)
Index Handling
Reject non-integer/range indices, use reset_index
and drop index information before writing data. Always restore as normalized (reset_index
) indices, even when applying predicates.
Pandas-specific Metadata
Pandas-specific metadata is part of the Arrow schema but is not part of the Arrow Type system. It captures information like the index type. If Index Handling is implemented, we could drop the entire pandas metadata field. This would simplify interopt with other languages/frameworks.
Labels
Use UUIDs everwhere and reject user-provided labels.
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Breaking Changes · microsoft/TypeScript Wiki - GitHub
These changes list where implementation differs between versions as the spec and compiler are simplified and inconsistencies are corrected. For ...
Read more >Semantic Versioning 2.0.0 | Semantic Versioning
Consider a version format of X.Y.Z (Major.Minor.Patch). Bug fixes not affecting the API increment the patch version, backwards compatible API additions/changes ...
Read more >Breaking changes in 7.0 | Elasticsearch Guide [7.17] | Elastic
Breaking changes in 7.0edit. This section discusses the changes that you need to be aware of when migrating your application to Elasticsearch 7.0....
Read more >Legislative committee considers election format changes
Voters would rank candidates by preference on their ballots, and if a candidate wins more than half of first-preference votes, they are declared ......
Read more >Candidates Tournament 2022 - Wikipedia
The 2022 Candidates Tournament was an eight-player chess tournament to decide the challenger ... any Candidates tournament since the modern format was introduced...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
If partition labels and partition indices (recently deprecated) are both removed, it will be nice for UX if
parse_input_to_metapartition
accepts what is currently the value of thedata
argument, given thatdata
would be the only key remaining in the input dictionary.That is, instead of the current:
Allow:
We use the largest because it the common metadata then describes a type that can hold all variables of all partitions (aka container type). That’s the whole point of the type system documentation and also explains why ints cannot be packed into floats or the other way around.
Depends. It would make sense to upcast to int128 in that case, but we might not want to that if this means that all libs break because they cannot handle this type (numpy for example). Or in other words (also as described in the type system docs): find a container type that still doesn’t break the ecosystem.
Pandas is exactly NOT a good blueprint for the type system (see float VS int discussion again). Most common means “the container / common type can hold all values of all types that are upcasted into that exact container / common type” (which is not the case for int<->float) and that “make semantically sense” (see discussion on why bools should not be upcasted to ints in the type system docs).