Help wanted: Catalog performance testing
See original GitHub issueHi,
We are looking for help measuring the performance of the software catalog to potentially find room for improvements. There have been improvements for large catalogs in recent months but we hope to improve the situation even further but we would like to back it up with data. We therefore added experimental prometheus metrics to help out measuring. If run a large software catalog, we’re interested in hearing from you!
Currently these metrics and other are emitted when enabled:
catalog_registered_locations_count
catalog_relations_count
catalog_entities_count
catalog_processing_queue_delay_seconds - The amount of delay between being scheduled for processing, and the start of actually being processed
catalog_processors_duration_seconds - Time spent executing catalog processors
catalog_processing_duration_seconds - Time spent executing the full processing flow
Information we would like to receive in your reply:
- Catalog version
- Amounts of entities in the catalog, output from
catalog_entities_count
metric - Relations, output from
catalog_relations_count
metric - Queue delay, output from
catalog_processing_queue_delay_seconds
, it’s best to leave the catalog running for a while as this number will be high for a newly started catalog with a backlog of work. - Processing duration
catalog_processing_duration_seconds{quantile=~'0.99|0.5'}
- CPU & memory pressure of the catalog/database instance together with available cores/memory.
How do I instrument?
The process for adding them is currently manual as we do not make any guarantees about the future format. Of course we want instrumentations to all parts of the backend but agnostic tools like OpenTelemetry are still not far enough in its development yet. We have for now settled on prom-client
to get Prometheus metrics but this WILL change in the future.
Add dependencies
"express-prom-bundle": "^6.3.6",
"prom-client": "^13.2.0",
Create metrics.ts
next to your catalog with this content
https://github.com/backstage/backstage/blob/6277060d72748ebd4edcf6dae4fd8eac1d23d470/packages/backend/src/metrics.ts
Initialise the metric: https://github.com/backstage/backstage/blob/6277060d72748ebd4edcf6dae4fd8eac1d23d470/packages/backend/src/index.ts#L76
Add the metrics endpoint: https://github.com/backstage/backstage/blob/6277060d72748ebd4edcf6dae4fd8eac1d23d470/packages/backend/src/index.ts#L129
Configure your prometheus instance to scrape the metrics.
Thanks!
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (8 by maintainers)
@Rugvip I just edited the comment with correct data. Sorry about earlier, I copied the wrong data
Sharing metrics from our org: