Labels query performance
See original GitHub issueRequests to Cromwell’s query
endpoint specifying label-related parameters perform poorly. JMUI will rely on this endpoint being performant with label parameters.
Plan for investigation:
- Set up a local CromIAM and Cromwell writing to a local MySQL instance.
- Submit a workflow to this CromIAM specifying a non-default collection name to associate the requesting auth with an additional collection besides the auth’s email address.
- Capture the query actually issued by Cromwell for the JMUI request and analyze its performance on real-world-sized databases.
The captured query appears at the end of this document since it takes up a ton of vertical space when linted. 🙂 This query takes about 0.7 seconds to execute on CaaS Prod and around 13 seconds on FC Prod. The EXPLAIN
output on CaaS Prod looks like:
mysql> EXPLAIN select...
+----+--------------------+--------------------+------+---------------------------------------------+-------------------------------+---------+-------------------------------------------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+---------------------------------------------+-------------------------------+---------+-------------------------------------------+--------+------------------------------------+
| 1 | PRIMARY | x2 | ALL | NULL | NULL | NULL | NULL | 185900 | Using where |
| 4 | DEPENDENT SUBQUERY | CUSTOM_LABEL_ENTRY | ref | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,SYS_IDX_11226 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1070 | const,cromwell.x2.WORKFLOW_EXECUTION_UUID | 1 | Using index condition; Using where |
| 3 | DEPENDENT SUBQUERY | CUSTOM_LABEL_ENTRY | ref | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,SYS_IDX_11226 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1070 | const,cromwell.x2.WORKFLOW_EXECUTION_UUID | 1 | Using index condition; Using where |
| 2 | DEPENDENT SUBQUERY | CUSTOM_LABEL_ENTRY | ref | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,SYS_IDX_11226 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1070 | const,cromwell.x2.WORKFLOW_EXECUTION_UUID | 1 | Using index condition; Using where |
+----+--------------------+--------------------+------+---------------------------------------------+-------------------------------+---------+-------------------------------------------+--------+------------------------------------+
The referenced index on CUSTOM_LABEL_ENTRY
is UC_CUSTOM_LABEL_ENTRY_CLK_WEU
which looks like:
mysql> show index from CUSTOM_LABEL_ENTRY;
+--------------------+------------+-------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| CUSTOM_LABEL_ENTRY | 0 | PRIMARY | 1 | CUSTOM_LABEL_ENTRY_ID | A | 531285 | NULL | NULL | | BTREE | | |
| CUSTOM_LABEL_ENTRY | 0 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1 | CUSTOM_LABEL_KEY | A | 31 | NULL | NULL | YES | BTREE | | |
| CUSTOM_LABEL_ENTRY | 0 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 2 | WORKFLOW_EXECUTION_UUID | A | 531285 | NULL | NULL | | BTREE | | |
| CUSTOM_LABEL_ENTRY | 1 | SYS_IDX_11226 | 1 | WORKFLOW_EXECUTION_UUID | A | 132821 | NULL | NULL | | BTREE | | |
+--------------------+------------+-------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
So MySQL appears to be table scanning WORKFLOW_METADATA_SUMMARY_ENTRY
and then finding the the at-most-one matching rows in CUSTOM_LABEL_ENTRY
for each label parameter using the unique index on WORKFLOW_EXECUTION_UUID
+ CUSTOM_LABEL_KEY
. So the labels table access should be fast but the summary table is table scanning.
I experimented with adding a non-unique index on CUSTOM_LABEL_ENTRY
for CUSTOM_LABEL_KEY
+ CUSTOM_LABEL_VALUE
in the hope that MySQL could use that first and then join back to the summary table on workflow ID. However I haven’t had any luck getting MySQL to use this index for even the simplest possible queries:
mysql> create index IDX_KEY_VALUE on CUSTOM_LABEL_ENTRY (CUSTOM_LABEL_KEY, CUSTOM_LABEL_VALUE);
.
.
.
mysql> explain select WORKFLOW_EXECUTION_UUID from CUSTOM_LABEL_ENTRY where CUSTOM_LABEL_KEY = 'caas-collection-name' AND CUSTOM_LABEL_VALUE = 'miguel-collection';
+----+-------------+--------------------+------------+------+---------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------+------------+------+---------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | CUSTOM_LABEL_ENTRY | NULL | ref | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,IDX_KEY_VALUE | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1023 | const | 1 | 10.00 | Using where |
+----+-------------+--------------------+------------+------+---------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
So MySQL sees the new index I created but then uses the unique WORKFLOW_EXECUTION_UUID
+ CUSTOM_LABEL_KEY
index instead even though I don’t see how that’s applicable here.
Actual CromIAM + Cromwell query issued against local MySQL:
select
x2.`WORKFLOW_EXECUTION_UUID`,
x2.`WORKFLOW_NAME`,
x2.`WORKFLOW_STATUS`,
x2.`START_TIMESTAMP`,
x2.`END_TIMESTAMP`,
x2.`SUBMISSION_TIMESTAMP`,
x2.`WORKFLOW_METADATA_SUMMARY_ENTRY_ID`
from
`WORKFLOW_METADATA_SUMMARY_ENTRY` x2
where
(
(
(
(
(
(
(
(
(
true
and true
)
and true
)
and exists(
select
`CUSTOM_LABEL_KEY`,
`CUSTOM_LABEL_VALUE`,
`WORKFLOW_EXECUTION_UUID`,
`CUSTOM_LABEL_ENTRY_ID`
from
`CUSTOM_LABEL_ENTRY`
where
(
(
`WORKFLOW_EXECUTION_UUID` = x2.`WORKFLOW_EXECUTION_UUID`
)
and (
`CUSTOM_LABEL_KEY` = 'submissionId'
)
)
and (
`CUSTOM_LABEL_VALUE` = 'submissionIdValue'
)
)
)
and (
exists(
select
`CUSTOM_LABEL_KEY`,
`CUSTOM_LABEL_VALUE`,
`WORKFLOW_EXECUTION_UUID`,
`CUSTOM_LABEL_ENTRY_ID`
from
`CUSTOM_LABEL_ENTRY`
where
(
(
`WORKFLOW_EXECUTION_UUID` = x2.`WORKFLOW_EXECUTION_UUID`
)
and (
`CUSTOM_LABEL_KEY` = 'caas-collection-name'
)
)
and (
`CUSTOM_LABEL_VALUE` = 'me@gmail.com'
)
)
or exists(
select
`CUSTOM_LABEL_KEY`,
`CUSTOM_LABEL_VALUE`,
`WORKFLOW_EXECUTION_UUID`,
`CUSTOM_LABEL_ENTRY_ID`
from
`CUSTOM_LABEL_ENTRY`
where
(
(
`WORKFLOW_EXECUTION_UUID` = x2.`WORKFLOW_EXECUTION_UUID`
)
and (
`CUSTOM_LABEL_KEY` = 'caas-collection-name'
)
)
and (
`CUSTOM_LABEL_VALUE` = 'miguel-collection'
)
)
)
)
and true
)
and true
)
and true
)
and true
)
and true
)
and true
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
@danbills I just used SAM dev. 🙂
Contradicting what I said in standup today, the performant rewrites of the labels query actually are using the new non-unique key+value index I created on
CUSTOM_LABEL_ENTRY
(see the fourth row of theEXPLAIN
above referencingIDX_KEY_VALUE
as itskey
). I confirmed that without that index performance reverts to being terrible. The version of the query generated by Slick doesn’t use the index and still performs terribly.