question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DISTINCT function broken since 3.1.3

See original GitHub issue

CrateDB version: 3.1.3

Environment description: any environment

Problem description: DISTINCT function for selects is not working since 3.1.3. Results are not unique whenever selecting more than one field in the query. Limiting DISTINCT with brackets, i.e., DISTINCT(field), does not help.

Steps to reproduce:

  1. Import sample data (tweets).

  2. Run queries: select distinct created_at, id from tweets limit 100; yields all tweets. select distinct created_at from tweets limit 100; yields only a few tweets.

  3. Run same queries for 3.1.2 and both queries yield the same results.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
seutcommented, Nov 30, 2018

@hnstbndr Ok thanks, we’ll look into this asap.

0reactions
hnstbndrcommented, Dec 8, 2018

I did some digging this morning. Below the tables and data from my experiment.

create table c_user (
    id string,
    name string,

    primary key (id)
);

create table c_comment (
    comment string INDEX OFF,
    id string,
    user_id string,
    
    primary key (id, user_id)
);
INSERT INTO c_user (id, name) VALUES ('667D594D-1BD5-45C6-BFB3-ACA92C61EAE8', '1'), ('B7F2D1A6-BCE2-4C7E-8836-AD3C5ABFB572', '2'), ('56D38E1C-9F68-4A3A-8020-4F51D5431B1A', '3'), ('5E13AEF4-0B25-4F72-9E11-13C4EDC907A7', '4'), ('A7C06E8F-BD28-4E2D-9C0D-E2E6B94F70C9', '5'), ('EB64BEF0-D25F-4BD3-86EC-7BFB1B2A611E', '6'), ('2BB6748B-B836-443A-8AA4-F3802D90BCBB', '7'), ('58A03093-F3EB-477B-8C99-858DA55A0BD7', '8'), ('B102598E-B82F-4A66-B6F0-02A521EE02F0', '9'), ('96DED9A0-75B3-48C3-A442-98D68F8FEA6C', '10');

INSERT INTO c_comment (comment, id, user_id) VALUES ('1000', '56C318DE-B92E-408D-BFF8-275C228AA6AA', '667D594D-1BD5-45C6-BFB3-ACA92C61EAE8'), ('2000', 'E8020588-AE76-4286-AB79-B8ACD1B94541', 'B7F2D1A6-BCE2-4C7E-8836-AD3C5ABFB572'), ('3000', '01693D4A-8A42-4497-9005-C164CF9761EF', '56D38E1C-9F68-4A3A-8020-4F51D5431B1A'), ('4000', 'ECE80A51-D742-44FE-99B7-D118663273AD', '5E13AEF4-0B25-4F72-9E11-13C4EDC907A7'), ('5000', '679BEA61-5CD4-4A5C-9BBD-85F5685C27DA', 'A7C06E8F-BD28-4E2D-9C0D-E2E6B94F70C9'), ('6000', '4C75EC2F-886A-4F0C-9756-D6A5EE6601B5', 'EB64BEF0-D25F-4BD3-86EC-7BFB1B2A611E'), ('7000', '74C8B57D-1C11-4FD8-9D45-6150598A36F8', '2BB6748B-B836-443A-8AA4-F3802D90BCBB'), ('8000', 'CCF67996-877E-4D28-9328-8137E04A3674', '58A03093-F3EB-477B-8C99-858DA55A0BD7'), ('9000', '54D26220-E972-4570-87FA-CD1A917F100C', 'B102598E-B82F-4A66-B6F0-02A521EE02F0'), ('10000', '76090566-69F4-4A95-8D01-88D10BA7056E', '96DED9A0-75B3-48C3-A442-98D68F8FEA6C');

INSERT INTO c_comment (comment, id, user_id) VALUES ('9999', '47FC2636-0316-4031-B017-0F01ED3FF86A', '667D594D-1BD5-45C6-BFB3-ACA92C61EAE8'), ('19998', 'B9C01484-C13C-492F-B807-0B9DA6CE330D', 'B7F2D1A6-BCE2-4C7E-8836-AD3C5ABFB572'), ('29997', 'CCD53409-ABB4-4B20-AD73-C6968FE63133', '56D38E1C-9F68-4A3A-8020-4F51D5431B1A'), ('39996', '237E9395-5815-4499-9804-7853B1BA5AF8', '5E13AEF4-0B25-4F72-9E11-13C4EDC907A7'), ('49995', '5491393B-5DA7-4940-9B4E-97FB2A48CDCA', 'A7C06E8F-BD28-4E2D-9C0D-E2E6B94F70C9'), ('59994', 'E4B0D400-B5CE-4D21-A6A6-5B3E33F16C7E', 'EB64BEF0-D25F-4BD3-86EC-7BFB1B2A611E'), ('69993', '90974138-0ECC-4AEF-AB91-CBC9700836BD', '2BB6748B-B836-443A-8AA4-F3802D90BCBB'), ('79992', '883CE173-A72C-4319-AABF-CAF9C905EE87', '58A03093-F3EB-477B-8C99-858DA55A0BD7'), ('89991', '08CCD65E-4622-4CC4-ADCA-10798C1FD8CC', 'B102598E-B82F-4A66-B6F0-02A521EE02F0'), ('99990', '12368890-CB1D-46D2-BA78-4E43F4D801B4', '96DED9A0-75B3-48C3-A442-98D68F8FEA6C');

The query is:

SELECT distinct (c_user.id),
      c_comment.id
     FROM c_comment
     INNER JOIN c_user
     ON c_user.id=c_comment.user_id
    order by c_user.id LIMIT 50;

If c_comment.id is included in the select c_user.id is not unique in the returned set (independent from the crate version). In my real query I use _score in the select statement. And I think this is the problem. For crate 3.1.2 _score was not unique in the returned set and thus crate was able to consolidate it. When I switched to crate 3.1.3 _score was unique and data was not consolidated. I have the same data in both versions… Thus, I thought there is an issue 😃 Turns out there is no issue. Closing this issue!

Apologies for the delay…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Entity Framework select distinct name - Stack Overflow
Using lambda expression.. var result = EFContext.TestAddresses.Select(m => m.Name).Distinct();. Another variation using where, var result = EFContext.
Read more >
Distinct function in Power Apps - Microsoft Learn
The Distinct function evaluates a formula across each record of a table and returns a one-column table of the results with duplicate values...
Read more >
5 easy ways to extract Unique Distinct Values - Get Digital Help
One of those new functions is the UNIQUE function, it allows you to easily extract a unique distinct list using only one function....
Read more >
PySpark Groupby Count Distinct - Spark by {Examples}
In this article, I will explain how to count distinct values of the column after groupBy() in PySpark Dataframe. 1. Quick Examples of...
Read more >
COUNT DISTINCT and COUNT UNIQUE functions - IBM
If the COUNT DISTINCT function encounters NULL values, it ignores them unless every value in the specified column is NULL. If every column...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found