question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Composite and partial unique constraints

See original GitHub issue

_See previous discussion on https://github.com/frictionlessdata/goodtables-py/pull/252#issuecomment-369336642_

The use case is where you want to validate that multiple fields are unique as a whole. Given the following table:

id_1 id_2
1 1
1
1

Both id_1 and id_2 are nullable, but we want (id_1, id_2) to be unique. All the following rows couldn’t be added to this table:

id_1 id_2
1 1
1
1

But these would be valid:

id_1 id_2
1 2
2 1
2
2

If this were SQL, I would do this creating the indexes:

CREATE UNIQUE INDEX my_table_unique_keys ON my_table (id_1, id_2);
CREATE UNIQUE INDEX my_table_unique_keys ON my_table (id_1) WHERE id_2 IS NULL;
CREATE UNIQUE INDEX my_table_unique_keys ON my_table (id_2) WHERE id_1 IS NULL;

Currently there’s no way of doing this with Table Schema. For the simple case where we just want (id_1, id_2) to be unique, we could follow a pattern similar to primaryKeys, like:

"schema": {
  "unique": ["id_1", "id_2"]
}

But it wouldn’t work for partial indexes. In that case, we also need a WHERE clause. Maybe something like:

"schema": {
  "unique": [
    { "fields": ["id_1", "id_2"] },
    { "fields": ["id_1"], "where": "id_2 IS NULL" },
    { "fields": ["id_2"], "where": "id_1 IS NULL" },
  ]
}

But this can grow the implementation complexity pretty quickly, as now we have to handle the where clause somehow.

Maybe a good middle ground is just configuring what happens when some of the unique fields are null? The default behaviour on a composite index in SQL is that if any field is null, the entire index isn’t used, so there could these two rows would be valid:

id_1 id_2
2
2

Maybe this behaviour could be controlled by a boolean flag. Something like:

"schema": {
  "unique": [
    { "fields": ["id_1", "id_2"], "ignoreNullValues": true },
  ]
}

The naming can be improved.

cc @roll @Stephen-Gates @hydrosquall

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:26 (25 by maintainers)

github_iconTop GitHub Comments

1reaction
ezweltycommented, Sep 20, 2019

What follows is a bit long, but hopefully clarifies the conversation

What is being asked here and in https://github.com/frictionlessdata/goodtables-py/pull/252 is a row uniqueness constraint that treats null as a regular value. In that sense, it is a unique key that can also serve as a primary key, despite the presence of null, and can apply to both single and multiple fields.

For a single field, the requested constraint is met by the following:

x
1
2
null

but not met by:

x
1
null
null

Yet this x with repeating null is considered unique in PostgreSQL(, MySQL, Oracle, Firebird, SQLite): https://dbfiddle.uk/?rdbms=postgres_11&fiddle=21d04f1b1d151f5d0180a7f753544f95 But not in Microsoft SQL Server: https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=21d04f1b1d151f5d0180a7f753544f95

For multiple fields, the requested constraint is met by the following:

x y
1 1
2 1
null 2

but not met by:

x y
1 1
2 null
2 null

Yet this x, y with repeating 2, null is considered unique by PostgreSQL(, MySQL, Oracle, Firebird, SQLite): https://dbfiddle.uk/?rdbms=postgres_11&fiddle=5845f4945ba2fcd20ab710530b2348de But not by Microsoft SQL Server: https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=5845f4945ba2fcd20ab710530b2348de

So I believe Table Schema needs to be amended in two ways:

  • Clarify that the fields in primaryKey cannot contain null (i.e. an implicit required: true).
  • Then:
    • Allow null in primaryKey fields (with either explicit required: false in field constraints or a boolean switch on primaryKey) and clarify that null should be treated as a regular value (per Microsoft SQL Server).
    • and/or Add a new unique key constraint which treats null as a regular value either by default or with a boolean switch.
1reaction
vitorbaptistacommented, Sep 12, 2019

Hello all! o/

I agree with @roll that primary keys shouldn’t be nullable. It seems to me that something like:

"schema": {
  "unique": ["col_1", "col_2"]
}

that treats null as any other value could solve this. This is different than SQL, where unique constraints aren’t used when some of its components are null, but I don’t see a problem in deviating from that. WDYT?

Read more comments on GitHub >

github_iconTop Results From Across the Web

is there a better way to make a composite key and a partial ...
Here I create two separate keys for the unique constraint and the composite key. I have a feeling that there is a better...
Read more >
Custom unique constraint, only enforced if one column has a ...
A partial unique index will do that: CREATE UNIQUE INDEX tbl_some_name_idx ON tbl (subset) WHERE type = 'true';. Data type does matter.
Read more >
Documentation: 15: 5.4. Constraints - PostgreSQL
Unique constraints ensure that the data contained in a column, or a group of columns, is unique among all the rows in the...
Read more >
Defining Constraints and Indexes
And then a table invoice_item with a composite foreign key referencing invoice : ... Unique constraints can be created anonymously on a single...
Read more >
Unique Indexes — MongoDB Manual
A partial index with a unique constraint does not prevent the insertion of documents that do not meet the unique constraint if the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found