question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: Specify the behaviour for operating on empty objects

See original GitHub issue

There isn’t an issue for empty inputs.

There is a need to specify the behaviour for empty input. Note that “empty” input can mean many things, and even combinations of them:

  • zero rows but non-zero columns
  • zero columns but non-zero rows
  • zero rows and zero columns
  • other: e.g., empty list provided to agg()

As suggested by @shwina in the Pandas Standardisation docs, the idea is to start by writing a bunch of tests across many different Pandas operations and see how many pass across libraries.

Classic example (quiz: what should this do?)

df = pd.DataFrame({“a”: [], “b”: []}) df.groupby(“a”).agg({})

A starting list of operations where tests are needed would be particularly helpful, I plan to start going through them and adding all the tests.

  • The above example returns a ValueError: No objects to concatenate, is this the expected behaviour?

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
rhshadrachcommented, Aug 8, 2022

I believe this issue is meant only for methods that take a UDF (e.g. agg, apply, transform), is that correct? For other methods (e.g. sum, mean, fillna) I think there is a well-defined answer. It’s only when given an unknown UDF that there is no “right” answer, but we should certainly strive for a consistent one.

One idea expressed in another issue (I’ll have to track down where it is) is to call the UDF with an empty object, returning a default result if it raises. Something very roughly like:

# data is an empty DataFrame
try:
    result = self.call_method(data)
except Exception:
    result = data.copy()
return result

and documenting this behavior for working with empty objects. I’m attracted to the idea because it would allow users to write UDFs as:

def my_udf(x):
    if x.empty:
        return my_default_object
    ...

allowing them to have complete control over the result. Also, if a user does not opt-in to implementing the “empty-object” path, it returns a default which is at least easy to reason about.

For agg we would use .grouper.result_index and for transform we could use .obj.index for a more accurate result. But for apply there is no clear default because it can be used with reducers, transformers, and anything else for that matter.

1reaction
asmeurercommented, Aug 25, 2022

CC @jakirkham @jcrist. I know that Dask uses empty DataFrames for metadata. Are there any Dask specific uses of empty dataframes that should be kept in mind here, or any other comments you have on how empty dataframes should behave?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use empty string, null or remove empty property in API request ...
TLDR; Remove null properties. The first thing to bear in mind is that applications at their edges are not object-oriented (nor functional if ......
Read more >
Special behavior of a stream if there are no elements
class ItemConsumer implements Consumer<Object> { private volatile boolean ... b, c (or empty stream when it is empty) // for type inference static...
Read more >
Object - JavaScript - MDN Web Docs
Chrome Edge Object Full support. Chrome1. Toggle history Full support. Edge12. Toggle hi... Object() constructor Full support. Chrome1. Toggle history Full support. Edge12. Toggle hi... assign...
Read more >
RESTful web API design - Best Practices - Microsoft Learn
Organize the API design around resources; Define API operations in terms of HTTP methods; Conform to HTTP semantics; Filter and paginate ...
Read more >
ObjectUtils (Apache Commons Lang 3.12.0 API)
Operations on Object . ... Each method documents its behavior in more detail. ... Checks that the specified object reference is not null...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found