API: Specify the behaviour for operating on empty objects
See original GitHub issueThere isn’t an issue for empty inputs.
There is a need to specify the behaviour for empty input. Note that “empty” input can mean many things, and even combinations of them:
- zero rows but non-zero columns
- zero columns but non-zero rows
- zero rows and zero columns
- other: e.g., empty list provided to
agg()
As suggested by @shwina in the Pandas Standardisation docs, the idea is to start by writing a bunch of tests across many different Pandas operations and see how many pass across libraries.
Classic example (quiz: what should this do?)
df = pd.DataFrame({“a”: [], “b”: []}) df.groupby(“a”).agg({})
A starting list of operations where tests are needed would be particularly helpful, I plan to start going through them and adding all the tests.
- The above example returns a
ValueError: No objects to concatenate
, is this the expected behaviour?
Issue Analytics
- State:
- Created a year ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Use empty string, null or remove empty property in API request ...
TLDR; Remove null properties. The first thing to bear in mind is that applications at their edges are not object-oriented (nor functional if ......
Read more >Special behavior of a stream if there are no elements
class ItemConsumer implements Consumer<Object> { private volatile boolean ... b, c (or empty stream when it is empty) // for type inference static...
Read more >Object - JavaScript - MDN Web Docs
Chrome Edge
Object Full support. Chrome1. Toggle history Full support. Edge12. Toggle hi...
Object() constructor Full support. Chrome1. Toggle history Full support. Edge12. Toggle hi...
assign...
Read more >RESTful web API design - Best Practices - Microsoft Learn
Organize the API design around resources; Define API operations in terms of HTTP methods; Conform to HTTP semantics; Filter and paginate ...
Read more >ObjectUtils (Apache Commons Lang 3.12.0 API)
Operations on Object . ... Each method documents its behavior in more detail. ... Checks that the specified object reference is not null...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I believe this issue is meant only for methods that take a UDF (e.g. agg, apply, transform), is that correct? For other methods (e.g. sum, mean, fillna) I think there is a well-defined answer. It’s only when given an unknown UDF that there is no “right” answer, but we should certainly strive for a consistent one.
One idea expressed in another issue (I’ll have to track down where it is) is to call the UDF with an empty object, returning a default result if it raises. Something very roughly like:
and documenting this behavior for working with empty objects. I’m attracted to the idea because it would allow users to write UDFs as:
allowing them to have complete control over the result. Also, if a user does not opt-in to implementing the “empty-object” path, it returns a default which is at least easy to reason about.
For agg we would use
.grouper.result_index
and for transform we could use.obj.index
for a more accurate result. But for apply there is no clear default because it can be used with reducers, transformers, and anything else for that matter.CC @jakirkham @jcrist. I know that Dask uses empty DataFrames for metadata. Are there any Dask specific uses of empty dataframes that should be kept in mind here, or any other comments you have on how empty dataframes should behave?