question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OmniSci Histogram not working

See original GitHub issue

I am trying to do a histogram, to replicate this geospatial analysis:

import ibis

conn = ibis.omniscidb.connect(
    host='metis.mapd.com', user='mapd', password='HyperInteractive',
    port=443, database='mapd', protocol= 'https'
)
t = conn.table("tweets_nov_feb")
x, y = t.goog_x, t.goog_y

WIDTH = 385
HEIGHT = 564
X_DOMAIN = [
        -3650484.1235206556,
        7413325.514451755
      ]
Y_DOMAIN = [
        -5778161.9183506705,
        10471808.487466192
      ]

t[(X_DOMAIN[0] < x) & (x < X_DOMAIN[1])].group_by(
    t.goog_x.histogram(WIDTH).name("x_bin")
).aggregate(t.count()).execute()

However it fails with:

Exception: Exception: Inconsistent return type for FLOOR: SELECT floor((t0."goog_x" - (t1."min_1a6124" - 1e-13)) / ((t1."max_1a6124" - (t1."min_1a6124" - 1e-13)) / 384)) AS x_bin,
       count(*) AS "count"
FROM tweets_nov_feb t0
  JOIN (
    SELECT min("goog_x") AS min_1a6124, max("goog_x") AS max_1a6124
    FROM tweets_nov_feb
  ) t1    ON TRUE
WHERE (t0."goog_x" > -3650484.1235206556) AND
      (t0."goog_x" < 7413325.514451755)
GROUP BY x_bin

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
saulshanabrookcommented, Aug 21, 2019

So I think what would be nice on ibis’s side would be to do this cast stuff for me.

2reactions
saulshanabrookcommented, Aug 21, 2019

OK fixed that one by specifying binwidth manually:

t[
    (X_DOMAIN[0] < x) & (x < X_DOMAIN[1]) &
    (Y_DOMAIN[0] < y) & (y < Y_DOMAIN[1])
].group_by([
    t.goog_x.histogram(
        base=ibis.literal(X_DOMAIN[0], 'float64').cast('float32'),
        binwidth=ibis.literal((X_DOMAIN[1] -  X_DOMAIN[0]) / WIDTH, 'float64').cast('float32')
    ).name("x_bin"),
    t.goog_y.histogram(
        base=ibis.literal(Y_DOMAIN[0], 'float64').cast('float32'),
        binwidth=ibis.literal((Y_DOMAIN[1] -  Y_DOMAIN[0]) / HEIGHT, 'float64').cast('float32')
    ).name("y_bin")
]).aggregate(
    t.count()
).execute()
Read more comments on GitHub >

github_iconTop Results From Across the Web

Histogram - docs
The Histogram displays the distribution of data across a continuous (typically time-based) variable, by aggregating the data into bins of a fixed size....
Read more >
Known Issues and Limitations - HEAVY.AI Docs
Following are known issues, limitations, and changes to default behavior in the OmniSci Platform.
Read more >
Getting Started With OmniSci, Part 2: Electricity Dataset
Now let's take a step back and explain the dataset, show how to format the data using Python that was loaded into MapD,...
Read more >
Working and saving histogram of data - GIS Stack Exchange
You should always try and shrink your problem down to the minimum amount of code needed. And also read and think about the...
Read more >
How to Make a Histogram in Excel - and Adjust Bin Size (2022)
Why does the range start from 7 only and not a round number like maybe 5 or 0?. Take a quick look at...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found