question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Format Breaking Changes Candidates

See original GitHub issue

This is a meta issue describing which changes we would like to make now but which are incompatible with the current format.

Type Normalization

The following type normalizations are currently not implemented but could be:

  • decimal128[P, S] -> decimal128[38, S] (38 is the max for 128 bits)
  • date{32, 64} -> date64
  • time{32, 64}[U] -> time64[U]
  • structs (nested normalization)

Index Handling

Reject non-integer/range indices, use reset_index and drop index information before writing data. Always restore as normalized (reset_index) indices, even when applying predicates.

Pandas-specific Metadata

Pandas-specific metadata is part of the Arrow schema but is not part of the Arrow Type system. It captures information like the index type. If Index Handling is implemented, we could drop the entire pandas metadata field. This would simplify interopt with other languages/frameworks.

Labels

Use UUIDs everwhere and reject user-provided labels.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
lr4dcommented, Jun 6, 2019

If partition labels and partition indices (recently deprecated) are both removed, it will be nice for UX if parse_input_to_metapartition accepts what is currently the value of the data argument, given that data would be the only key remaining in the input dictionary.

That is, instead of the current:

dfs = [
    {
        "data": {
            "core-table": pd.DataFrame({"col1": ["x"]}),
            "aux-table": pd.DataFrame({"f": [1.1]}),
        }
    },
    {
        "data": {
            "core-table": pd.DataFrame({"col1": ["y"]}),
            "aux-table": pd.DataFrame({"f": [1.2]}),
        }
    },
]

Allow:

dfs = [
    {
        "core-table": pd.DataFrame({"col1": ["x"]}),
        "aux-table": pd.DataFrame({"f": [1.1]}),
    },
    {
        "core-table": pd.DataFrame({"col1": ["y"]}),
        "aux-table": pd.DataFrame({"f": [1.2]}),
    },
]
1reaction
crepererumcommented, May 24, 2019

Why do we upcast to the broadest width? Why not the smallest?

We use the largest because it the common metadata then describes a type that can hold all variables of all partitions (aka container type). That’s the whole point of the type system documentation and also explains why ints cannot be packed into floats or the other way around.

What would we do if pyarrow introduces a int96 or int128? Do we change our casting rules?

Depends. It would make sense to upcast to int128 in that case, but we might not want to that if this means that all libs break because they cannot handle this type (numpy for example). Or in other words (also as described in the type system docs): find a container type that still doesn’t break the ecosystem.

Most common in the sense that these types are used by pandas as defaults for the given type family.

Pandas is exactly NOT a good blueprint for the type system (see float VS int discussion again). Most common means “the container / common type can hold all values of all types that are upcasted into that exact container / common type” (which is not the case for int<->float) and that “make semantically sense” (see discussion on why bools should not be upcasted to ints in the type system docs).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Breaking Changes · microsoft/TypeScript Wiki - GitHub
These changes list where implementation differs between versions as the spec and compiler are simplified and inconsistencies are corrected. For ...
Read more >
Semantic Versioning 2.0.0 | Semantic Versioning
Consider a version format of X.Y.Z (Major.Minor.Patch). Bug fixes not affecting the API increment the patch version, backwards compatible API additions/changes ...
Read more >
Breaking changes in 7.0 | Elasticsearch Guide [7.17] | Elastic
Breaking changes in 7.0edit. This section discusses the changes that you need to be aware of when migrating your application to Elasticsearch 7.0....
Read more >
Legislative committee considers election format changes
Voters would rank candidates by preference on their ballots, and if a candidate wins more than half of first-preference votes, they are declared ......
Read more >
Candidates Tournament 2022 - Wikipedia
The 2022 Candidates Tournament was an eight-player chess tournament to decide the challenger ... any Candidates tournament since the modern format was introduced...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found