question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Labels query performance

See original GitHub issue

Requests to Cromwell’s query endpoint specifying label-related parameters perform poorly. JMUI will rely on this endpoint being performant with label parameters.

Plan for investigation:

  1. Set up a local CromIAM and Cromwell writing to a local MySQL instance.
  2. Submit a workflow to this CromIAM specifying a non-default collection name to associate the requesting auth with an additional collection besides the auth’s email address.
  3. Capture the query actually issued by Cromwell for the JMUI request and analyze its performance on real-world-sized databases.

The captured query appears at the end of this document since it takes up a ton of vertical space when linted. 🙂 This query takes about 0.7 seconds to execute on CaaS Prod and around 13 seconds on FC Prod. The EXPLAIN output on CaaS Prod looks like:

mysql> EXPLAIN select...
+----+--------------------+--------------------+------+---------------------------------------------+-------------------------------+---------+-------------------------------------------+--------+------------------------------------+
| id | select_type        | table              | type | possible_keys                               | key                           | key_len | ref                                       | rows   | Extra                              |
+----+--------------------+--------------------+------+---------------------------------------------+-------------------------------+---------+-------------------------------------------+--------+------------------------------------+
|  1 | PRIMARY            | x2                 | ALL  | NULL                                        | NULL                          | NULL    | NULL                                      | 185900 | Using where                        |
|  4 | DEPENDENT SUBQUERY | CUSTOM_LABEL_ENTRY | ref  | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,SYS_IDX_11226 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1070    | const,cromwell.x2.WORKFLOW_EXECUTION_UUID |      1 | Using index condition; Using where |
|  3 | DEPENDENT SUBQUERY | CUSTOM_LABEL_ENTRY | ref  | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,SYS_IDX_11226 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1070    | const,cromwell.x2.WORKFLOW_EXECUTION_UUID |      1 | Using index condition; Using where |
|  2 | DEPENDENT SUBQUERY | CUSTOM_LABEL_ENTRY | ref  | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,SYS_IDX_11226 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1070    | const,cromwell.x2.WORKFLOW_EXECUTION_UUID |      1 | Using index condition; Using where |
+----+--------------------+--------------------+------+---------------------------------------------+-------------------------------+---------+-------------------------------------------+--------+------------------------------------+

The referenced index on CUSTOM_LABEL_ENTRY is UC_CUSTOM_LABEL_ENTRY_CLK_WEU which looks like:

mysql> show index from CUSTOM_LABEL_ENTRY;
+--------------------+------------+-------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table              | Non_unique | Key_name                      | Seq_in_index | Column_name             | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| CUSTOM_LABEL_ENTRY |          0 | PRIMARY                       |            1 | CUSTOM_LABEL_ENTRY_ID   | A         |      531285 |     NULL | NULL   |      | BTREE      |         |               |
| CUSTOM_LABEL_ENTRY |          0 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU |            1 | CUSTOM_LABEL_KEY        | A         |          31 |     NULL | NULL   | YES  | BTREE      |         |               |
| CUSTOM_LABEL_ENTRY |          0 | UC_CUSTOM_LABEL_ENTRY_CLK_WEU |            2 | WORKFLOW_EXECUTION_UUID | A         |      531285 |     NULL | NULL   |      | BTREE      |         |               |
| CUSTOM_LABEL_ENTRY |          1 | SYS_IDX_11226                 |            1 | WORKFLOW_EXECUTION_UUID | A         |      132821 |     NULL | NULL   |      | BTREE      |         |               |
+--------------------+------------+-------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

So MySQL appears to be table scanning WORKFLOW_METADATA_SUMMARY_ENTRY and then finding the the at-most-one matching rows in CUSTOM_LABEL_ENTRY for each label parameter using the unique index on WORKFLOW_EXECUTION_UUID + CUSTOM_LABEL_KEY. So the labels table access should be fast but the summary table is table scanning.

I experimented with adding a non-unique index on CUSTOM_LABEL_ENTRY for CUSTOM_LABEL_KEY + CUSTOM_LABEL_VALUE in the hope that MySQL could use that first and then join back to the summary table on workflow ID. However I haven’t had any luck getting MySQL to use this index for even the simplest possible queries:

mysql> create index IDX_KEY_VALUE on CUSTOM_LABEL_ENTRY (CUSTOM_LABEL_KEY, CUSTOM_LABEL_VALUE); 
.
.
.
mysql> explain select WORKFLOW_EXECUTION_UUID from CUSTOM_LABEL_ENTRY where CUSTOM_LABEL_KEY = 'caas-collection-name' AND CUSTOM_LABEL_VALUE = 'miguel-collection';
+----+-------------+--------------------+------------+------+---------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| id | select_type | table              | partitions | type | possible_keys                               | key                           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------------------+------------+------+---------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | CUSTOM_LABEL_ENTRY | NULL       | ref  | UC_CUSTOM_LABEL_ENTRY_CLK_WEU,IDX_KEY_VALUE | UC_CUSTOM_LABEL_ENTRY_CLK_WEU | 1023    | const |    1 |    10.00 | Using where |
+----+-------------+--------------------+------------+------+---------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+

So MySQL sees the new index I created but then uses the unique WORKFLOW_EXECUTION_UUID + CUSTOM_LABEL_KEY index instead even though I don’t see how that’s applicable here.

Actual CromIAM + Cromwell query issued against local MySQL:

select
        x2.`WORKFLOW_EXECUTION_UUID`,
        x2.`WORKFLOW_NAME`,
        x2.`WORKFLOW_STATUS`,
        x2.`START_TIMESTAMP`,
        x2.`END_TIMESTAMP`,
        x2.`SUBMISSION_TIMESTAMP`,
        x2.`WORKFLOW_METADATA_SUMMARY_ENTRY_ID` 
    from
        `WORKFLOW_METADATA_SUMMARY_ENTRY` x2 
    where
        (
            (
                (
                    (
                        (
                            (
                                (
                                    (
                                        (
                                            true 
                                            and true
                                        ) 
                                        and true
                                    ) 
                                    and exists(
                                        select
                                            `CUSTOM_LABEL_KEY`,
                                            `CUSTOM_LABEL_VALUE`,
                                            `WORKFLOW_EXECUTION_UUID`,
                                            `CUSTOM_LABEL_ENTRY_ID` 
                                        from
                                            `CUSTOM_LABEL_ENTRY` 
                                        where
                                            (
                                                (
                                                    `WORKFLOW_EXECUTION_UUID` = x2.`WORKFLOW_EXECUTION_UUID`
                                                ) 
                                                and (
                                                    `CUSTOM_LABEL_KEY` = 'submissionId'
                                                )
                                            ) 
                                            and (
                                                `CUSTOM_LABEL_VALUE` = 'submissionIdValue'
                                            )
                                    )
                                ) 
                                and (
                                    exists(
                                        select
                                            `CUSTOM_LABEL_KEY`,
                                            `CUSTOM_LABEL_VALUE`,
                                            `WORKFLOW_EXECUTION_UUID`,
                                            `CUSTOM_LABEL_ENTRY_ID` 
                                        from
                                            `CUSTOM_LABEL_ENTRY` 
                                        where
                                            (
                                                (
                                                    `WORKFLOW_EXECUTION_UUID` = x2.`WORKFLOW_EXECUTION_UUID`
                                                ) 
                                                and (
                                                    `CUSTOM_LABEL_KEY` = 'caas-collection-name'
                                                )
                                            ) 
                                            and (
                                                `CUSTOM_LABEL_VALUE` = 'me@gmail.com'
                                            )
                                    ) 
                                    or exists(
                                        select
                                            `CUSTOM_LABEL_KEY`,
                                            `CUSTOM_LABEL_VALUE`,
                                            `WORKFLOW_EXECUTION_UUID`,
                                            `CUSTOM_LABEL_ENTRY_ID` 
                                        from
                                            `CUSTOM_LABEL_ENTRY` 
                                        where
                                            (
                                                (
                                                    `WORKFLOW_EXECUTION_UUID` = x2.`WORKFLOW_EXECUTION_UUID`
                                                ) 
                                                and (
                                                    `CUSTOM_LABEL_KEY` = 'caas-collection-name'
                                                )
                                            ) 
                                            and (
                                                `CUSTOM_LABEL_VALUE` = 'miguel-collection'
                                            )
                                    )
                                )
                            ) 
                            and true
                        ) 
                        and true
                    ) 
                    and true
                ) 
                and true
            ) 
            and true
        ) 
        and true 

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mcovarrcommented, Jan 30, 2019

@danbills I just used SAM dev. 🙂

0reactions
mcovarrcommented, Jan 31, 2019

Contradicting what I said in standup today, the performant rewrites of the labels query actually are using the new non-unique key+value index I created on CUSTOM_LABEL_ENTRY (see the fourth row of the EXPLAIN above referencing IDX_KEY_VALUE as its key). I confirmed that without that index performance reverts to being terrible. The version of the query generated by Slick doesn’t use the index and still performs terribly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Queries based on labels have better performance than ...
Answer. The cleartool find queries are slower for attributes than labels because the database schema is optimized for labels and label queries.
Read more >
Why the node's label affect the query performance ...
RETURN n , the performance become better. But why? Since every nodes have Science label, the query time should be same in these...
Read more >
How Prometheus Querying Works (and Why You Should ...
Learn how Prometheus indexing works to understand your PromQL queries perform the way they do—and get some tips to achieve faster results.
Read more >
Jira Label Performance
Often times, performance issues with labels can be due to the sheer volume of labels that the Jira operation has to search through....
Read more >
Use query labels in Synapse SQL - Azure
Included in this article are essential tips for using query labels in Synapse SQL.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found