question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Joins don't work when one table has only one column

See original GitHub issue

Hi guys! I’m trying to do an inner join on simple tables:

        val table1 = Table.create(
            "table1",
            StringColumn.create("customer_id", arrayOf("1", "2", "3", "4", "5")),
            DoubleColumn.create("amount", arrayOf(1, 2, 3, 4, 5))
        )

        val table2 = Table.create(
            "table2",
            DoubleColumn.create("amount", arrayOf(1, 2, 3, 4, 5))
        )

        val joined = table1
            .join("amount")
            .inner(table2, "amount")

        log(table1.shape())
        log(table2.shape())
        log(joined.shape())

and here is result:

2018-09-06 17:17:39,117 [main] INFO  Script - 5 rows X 2 cols 
2018-09-06 17:17:39,117 [main] INFO  Script - 5 rows X 1 cols 
2018-09-06 17:17:39,117 [main] INFO  Script - 0 rows X 2 cols 

The table is empty. Am i doing it wrong? I’m using v0.25.2

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
lwhite1commented, Sep 7, 2018

it doesn’t work if the 2nd table only has the join column and no other columns

yes. this is exactly my case

Is this a real case? It doesn’t seem too useful unless you’re using the join as a filtering mechanism.

as for tests, they are still don’t pass on windows

java.lang.ExceptionInInitializerError
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:217)
	at org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:266)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:263)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.IllegalStateException: tech.tablesaw.io.csv.AddCellToColumnException: Error while adding cell from row 1 and column Date(position:0): Text 'Dec 1, 2017' could not be parsed, unparsed text found at index 0
	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:41)
	at tech.tablesaw.joining.DataFrameJoinerTest.<clinit>(DataFrameJoinerTest.java:15)
	... 21 more
Caused by: tech.tablesaw.io.csv.AddCellToColumnException: Error while adding cell from row 1 and column Date(position:0): Text 'Dec 1, 2017' could not be parsed, unparsed text found at index 0
	at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:237)
	at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:155)
	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62)
	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58)
	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:54)
	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:39)
	... 22 more
Caused by: java.time.format.DateTimeParseException: Text 'Dec 1, 2017' could not be parsed, unparsed text found at index 0
	at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)
	at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
	at java.time.LocalDate.parse(LocalDate.java:400)
	at tech.tablesaw.columns.dates.DateParser.parse(DateParser.java:64)
	at tech.tablesaw.columns.dates.DateParser.parse(DateParser.java:14)
	at tech.tablesaw.api.DateColumn.appendCell(DateColumn.java:357)
	at tech.tablesaw.api.DateColumn.appendCell(DateColumn.java:54)
	at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:235)
	... 27 more

Looks like some type autodetecting or date converting issue. It was working fine when i forced all columns to STRING

On the surface, this seems like a bug to me. We changed our parsing logic recently and that may have introduced something, but autodetection does not always work out of the box.

To check whether it’s a bug, it would be helpful to get a sample of a few lines of the file. Is that possible?

When autodetection does not work in general, there are a few work arounds.

  1. It uses a sample of data to detect types. You can tell it to examine the entire file. If your file is not too big, you might try this. You need to set the sample option to “false”.
  2. More often with dates, it’s possible the date is not in a format that the parser handles. You can pass in your own date parser, again using CsvReadOptions.
  3. Again with dates and times, you might try passing a locale if you’re not in the US.
  4. Finally, you can try telling it what the fields really are, as you did when you forced everything to String. If the has many columns, you can use the method CsvReader.printColumnTypes() to get the parser’s best guesses, edit the result by hand and paste it into your code.

In this case, again, I suspect this is a bug of some kind since it’s trying to parse a date (or date time, I can’t tell without the data) and the value looks like a date.

0reactions
ryancerfcommented, Aug 1, 2019

I found this surprising and I think it is worth fixing.

People familiar with SQL will always assume SELECT * FROM table1 INNER JOIN table2 will equal SELECT * FROM table2 INNER JOIN table1. This is currently not true when the only columns in the right hand side table are the join columns.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to join only one column? - mysql - Stack Overflow
I need to join only one column from table 2, say first_name. How can I do that? mysql · join · Share.
Read more >
SQL joins and how to use them - Launch School
A RIGHT JOIN is similar to a LEFT JOIN except that the roles between the two tables are reversed, and all the rows...
Read more >
Can you Join two Tables Without a Common Column?
Yes, you can ! The longer answer is yes, there are a few ways to combine two tables without a common column, including...
Read more >
Working with Joins - Snowflake Documentation
A natural join is used when two tables contain columns that have the same name and in which the data in those columns...
Read more >
Joins (SQL Server) - Microsoft Learn
The SELECT list is not required to contain columns from every table in the join. For example, in a three-table join, only one...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found