question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Left Join becomes Inner Join for inequality conditions

See original GitHub issue

Code:

import pandas as pd
import numpy as np
from dask.distributed import Client
from dask_sql import Context
client = Client()
cont = Context()

df1 = pd.DataFrame({
    'dated': pd.date_range(pd.Timestamp('2021-01-01'), pd.Timestamp('2021-01-10')),
    'var1': np.ones(10)})
df2 = pd.DataFrame({
    'startdate': [pd.Timestamp('2020-12-30'), pd.Timestamp('2021-01-09')],
    'enddate': [pd.Timestamp('2021-01-03'), pd.Timestamp('2021-01-20')],
    'var2': np.array([2.0, 3.0])})

cont.create_table('df1', df1)
cont.create_table('df2', df2)

df3 = cont.sql(
    """select a.*, b.var2
    from df1 a left join df2 b
    on b.startdate<=a.dated and a.dated<=b.enddate""").compute()

Results:

  • df1:
       dated  var1
0 2021-01-01   1.0
1 2021-01-02   1.0
2 2021-01-03   1.0
3 2021-01-04   1.0
4 2021-01-05   1.0
5 2021-01-06   1.0
6 2021-01-07   1.0
7 2021-01-08   1.0
8 2021-01-09   1.0
9 2021-01-10   1.0
  • df2:
   startdate    enddate  var2
0 2020-12-30 2021-01-03   2.0
1 2021-01-09 2021-01-20   3.0
  • df3:
        dated  var1  var2
0  2021-01-01   1.0   2.0
2  2021-01-02   1.0   2.0
4  2021-01-03   1.0   2.0
17 2021-01-09   1.0   3.0
19 2021-01-10   1.0   3.0

This is an Inner Join, not Left Join.

The correct output should be as follows, using sqlite3:

import sqlite3
# Connect database
conn = sqlite3.connect(':memory:')
df1.to_sql('df1', conn, index=False)
df2.to_sql('df2', conn, index=False)

df3 = pd.read_sql_query(
    """select a.*, b.var2
    from df1 a left join df2 b
    on b.startdate<=a.dated and a.dated<=b.enddate""", 
    conn, 
    parse_dates=['dated'])

where df3 is

       dated  var1  var2
0 2021-01-01   1.0   2.0
1 2021-01-02   1.0   2.0
2 2021-01-03   1.0   2.0
3 2021-01-04   1.0   NaN
4 2021-01-05   1.0   NaN
5 2021-01-06   1.0   NaN
6 2021-01-07   1.0   NaN
7 2021-01-08   1.0   NaN
8 2021-01-09   1.0   3.0
9 2021-01-10   1.0   3.0

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
flcongcommented, Aug 19, 2021

Hi, @nils-braun . Yeah, I’ve finished editing the join.py file to make it work, but I have not fully tested it yet. I plan to add more unit tests in the next week.

0reactions
nils-brauncommented, Aug 19, 2021

Hi @flcong! Did you have time to look into the issue with the joins further? Is there anything I can help you with?

Read more comments on GitHub >

github_iconTop Results From Across the Web

left join turns into inner join
When you move the condition to the ON clause, it becomes part of the JOIN row matching, rather than the final filter. The...
Read more >
When does an SQL left join act like an inner join?
Inner join only returns data from the left table if it matches data in the right table while left join returns all data...
Read more >
Learn SQL: INNER JOIN vs LEFT JOIN
You'll use INNER JOIN when you want to return only records having pair on both sides, and you'll use LEFT JOIN when you...
Read more >
SQL Gotcha: When an OUTER JOIN Accidentally Becomes ...
One of the first concepts you learn when writing SQL is the difference between an INNER JOIN and an OUTER JOIN (e.g., LEFT...
Read more >
Joins (SQL Server)
Inner join ; Left outer join; Right outer join; Full outer join ... Although join conditions usually have equality comparisons (=), other ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found