question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow, expensive metadata endpoint

See original GitHub issue

What happens

When a large workflow is queried for metadata, cromwell spends a considerable amount of time preparing the repsonse. This usually results in a timeout for the caller. In some cases, the preparation is so expensive that Cromwell either runs out of memory or enters a zombie-like state(#4105).

What should happen

The caller should receive a timely response, and Cromwell should not be endangered by operations on large workflows.

Speculation: Construction of result

The result is constructed in a two-phase manner: gather all the data, then produce a structured response.

This is done for two reasons:

  1. Unstructured metadata is difficult for a human to understand.
  2. There are possibly many duplicates due to the way restarts are handled.

Recommendation

~Stream results (using doobie SQL library?) and construct response while gathering data. This should mean that a large pool of data is never present in memory, only the current result set and the partial response.~

Not streaming for now. Instead going to foldMap large sequence into Map monoid, then combine all those maps together into a final result.

There is some manipulation to be done after combining a result.

  1. Sort calls by time
  2. Prune duplicates by taking the most recent. This has some special cases that need to be considered.

Speculation: Database table

The metadata table is currently an unindexed monster, comprising 10^6 - 10^9 rows and between 2-3 TB of data. The query has historically been surprisingly performant but is likely going to degrade over time.

Recommendation

punt on DB changes

Believe to be related to #4093 and #4105

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Hornethcommented, Sep 20, 2018

FWIW Slick also supports streaming

0reactions
davidangbcommented, Dec 12, 2018

you’ve done a bunch of investigation on this already, but adding for posterity: the metadata endpoint is particularly vulnerable to the joint-calling use case. In this use case (and in similar workflows), calls can scatter widely, and each call can have many inputs, each of which is a substantial value. So, calls * scatter * inputs * value length makes for a lot of data.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unresponsive Frontend Cromwells · Issue #4105 - GitHub
The metadata is too large to fit in memory. The present situation is that there is some processing done between DB and user...
Read more >
Tips for using the Crossref REST API - Documentation
The above is a massively expensive and slow query. If it doesn't time-out, you are likely to get a false negative anyway.
Read more >
Improve Serialization Performance in Django Rest Framework
A while back we noticed very poor performance from one of our main API endpoints. The endpoint fetched data from a very large...
Read more >
GraphQL performance explained - Medium
Parsing metadata into GraphQL AST can be expensive but also can be avoided in cases where developers care about performance and use limited...
Read more >
Best practices for handling EC2 Spot Instance interruptions
The Instance Metadata Service is a secure endpoint that you can query for ... For some workloads, interruptions can be very costly.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found