Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Composite and partial unique constraints

See original GitHub issue

_See previous discussion on https://github.com/frictionlessdata/goodtables-py/pull/252#issuecomment-369336642_

The use case is where you want to validate that multiple fields are unique as a whole. Given the following table:

id_1	id_2
1	1
1
	1

Both id_1 and id_2 are nullable, but we want (id_1, id_2) to be unique. All the following rows couldn’t be added to this table:

id_1	id_2
1	1
1
	1

But these would be valid:

id_1	id_2
1	2
2	1
2
	2

If this were SQL, I would do this creating the indexes:

CREATE UNIQUE INDEX my_table_unique_keys ON my_table (id_1, id_2);
CREATE UNIQUE INDEX my_table_unique_keys ON my_table (id_1) WHERE id_2 IS NULL;
CREATE UNIQUE INDEX my_table_unique_keys ON my_table (id_2) WHERE id_1 IS NULL;

Currently there’s no way of doing this with Table Schema. For the simple case where we just want (id_1, id_2) to be unique, we could follow a pattern similar to primaryKeys, like:

"schema": {
  "unique": ["id_1", "id_2"]
}

But it wouldn’t work for partial indexes. In that case, we also need a WHERE clause. Maybe something like:

"schema": {
  "unique": [
    { "fields": ["id_1", "id_2"] },
    { "fields": ["id_1"], "where": "id_2 IS NULL" },
    { "fields": ["id_2"], "where": "id_1 IS NULL" },
  ]
}

But this can grow the implementation complexity pretty quickly, as now we have to handle the where clause somehow.

Maybe a good middle ground is just configuring what happens when some of the unique fields are null? The default behaviour on a composite index in SQL is that if any field is null, the entire index isn’t used, so there could these two rows would be valid:

id_1	id_2
2
2

Maybe this behaviour could be controlled by a boolean flag. Something like:

"schema": {
  "unique": [
    { "fields": ["id_1", "id_2"], "ignoreNullValues": true },
  ]
}

The naming can be improved.

cc @roll @Stephen-Gates @hydrosquall

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:26 (25 by maintainers)

Top GitHub Comments

1reaction

ezweltycommented, Sep 20, 2019

What follows is a bit long, but hopefully clarifies the conversation

What is being asked here and in https://github.com/frictionlessdata/goodtables-py/pull/252 is a row uniqueness constraint that treats null as a regular value. In that sense, it is a unique key that can also serve as a primary key, despite the presence of null, and can apply to both single and multiple fields.

For a single field, the requested constraint is met by the following:

x
1
2
`null`

but not met by:

x
1
`null`
`null`

Yet this x with repeating null is considered unique in PostgreSQL(, MySQL, Oracle, Firebird, SQLite): https://dbfiddle.uk/?rdbms=postgres_11&fiddle=21d04f1b1d151f5d0180a7f753544f95 But not in Microsoft SQL Server: https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=21d04f1b1d151f5d0180a7f753544f95

For multiple fields, the requested constraint is met by the following:

x	y
1	1
2	1
`null`	2

but not met by:

x	y
1	1
2	`null`
2	`null`

Yet this x, y with repeating 2, null is considered unique by PostgreSQL(, MySQL, Oracle, Firebird, SQLite): https://dbfiddle.uk/?rdbms=postgres_11&fiddle=5845f4945ba2fcd20ab710530b2348de But not by Microsoft SQL Server: https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=5845f4945ba2fcd20ab710530b2348de

So I believe Table Schema needs to be amended in two ways:

Clarify that the fields in primaryKey cannot contain null (i.e. an implicit required: true).
Then:
- Allow null in primaryKey fields (with either explicit required: false in field constraints or a boolean switch on primaryKey) and clarify that null should be treated as a regular value (per Microsoft SQL Server).
- and/or Add a new unique key constraint which treats null as a regular value either by default or with a boolean switch.

1reaction

vitorbaptistacommented, Sep 12, 2019

Hello all! o/

I agree with @roll that primary keys shouldn’t be nullable. It seems to me that something like:

"schema": {
  "unique": ["col_1", "col_2"]
}

that treats null as any other value could solve this. This is different than SQL, where unique constraints aren’t used when some of its components are null, but I don’t see a problem in deviating from that. WDYT?