question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve ClickHouse performance

See original GitHub issue

What do we show currently?

  • total number of requests
  • requests per minute
  • number of unique operations
  • success and failure rates
  • p90, p95, p99 of latency
  • top 5 clients names (with number of requests)
  • top 5 client versions (with number of requests)
  • operations over time (total and failures)
  • RPM over time
  • latency over time
  • latency histogram (super heavy)
  • list of unique operations (with p90, p95, p99, number of requests, failure rate)

What filters do we have?

  • date range
  • operations

What filters do we want to have?

  • client names
  • date range
  • operations (if the number of selected operations is greater than half, let’s use NOT IN (not-selected-list)

What else do we want to show?

  • hide histogram
  • dedicated page for a single operation

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
kamilkisielacommented, Aug 19, 2022

The migration plan (long but stable and without any data loss):

  1. Create tables
  2. Insert rows to new tables (operations and operations_registry)
  3. Wait 31 days
  4. Switch to new tables
  5. Wait 31 days
  6. Drop old tables
0reactions
kamilkisielacommented, Aug 19, 2022

TODO:

  • write an updated version of OperationsReader
  • write an updated version of the usage-ingestor service

  • the structure of schema_coordinates_daily (with TTL and total)
  • schema coordinates in operations_registry + expires_at
CREATE TABLE IF NOT EXISTS default.operations_registry
  (
    target LowCardinality(String),
    hash String,
    name String,
    body String,
    operation_kind String,
    coordinates Array(String) CODEC(ZSTD(1)),
    total UInt32 CODEC(ZSTD(1)),
    timestamp DateTime('UTC'),
    expires_at DateTime('UTC'),
    INDEX idx_operation_kind (operation_kind) TYPE set(0) GRANULARITY 1
  )
  ENGINE = SummingMergeTree
  PARTITION BY toYYYYMMDD(timestamp)
  PRIMARY KEY (target, hash)
  ORDER BY (target, hash, timestamp, expires_at)
  TTL expires_at
  SETTINGS index_granularity = 8192;
CREATE MATERIALIZED VIEW IF NOT EXISTS default.schema_coordinates_daily
  (
    target LowCardinality(String) CODEC(ZSTD(1)),
    hash String CODEC(ZSTD(1)), 
    timestamp DateTime('UTC'),
    expires_at DateTime('UTC'),
    total UInt32 CODEC(ZSTD(1)),
    coordinate String CODEC(ZSTD(1))
  )
  ENGINE = SummingMergeTree
  PARTITION BY toYYYYMMDD(timestamp)
  PRIMARY KEY (target, coordinate, hash)
  ORDER BY (target, coordinate, hash, timestamp, expires_at)
  TTL expires_at
  SETTINGS index_granularity = 8192
  AS
  SELECT
    target,
    hash,
    toStartOfDay(timestamp) AS timestamp,
    toStartOfDay(expires_at) AS expires_at,
    sum(total) AS total,
    coordinate
  FROM default.operations_registry_test1
  ARRAY JOIN coordinates as coordinate
  GROUP BY
    target,
    coordinate,
    hash,
    timestamp,
    expires_at

An example insert

INSERT INTO operations_registry (
  target,
  hash,
  name,
  body,
  operation_kind,
  coordinates,
  total,
  timestamp,
  expires_at
) VALUES (
  'target1,
  'hash1',
  'name1',
  'body1',
  'query',
  array('coordinate1', 'coordinate2'),
  -- a number of the operations per report, matching this row insert
  10, 
  now(),
  now() + INTERVAL 30 DAY
);

How to get data?

SELECT name, target, body, sum(total) as total, coordinates FROM operations_registry GROUP BY name, body, target, hash, coordinates;

SELECT
    coordinate,
    sum(total) AS total,
    hash
FROM schema_coordinates_daily
GROUP BY
    coordinate,
    hash
Read more comments on GitHub >

github_iconTop Results From Across the Web

Optimizing for Performance | ClickHouse Docs
Ensure your buckets are located in the same region as your ClickHouse instances. This simple optimization can dramatically improve throughput performance, ...
Read more >
Improving Clickhouse query performance by tuning key order
Clickhouse key columns order does not only affects how efficient table compression is. Given primary key storage structure Clickhouse can faster or slower ......
Read more >
ClickHouse Query Performance Tips and Tricks - Altinity
ClickHouse Query Performance Tips and Tricks. ByAltinity Team 14th October 2019 9th June 2020. by Robert Hodges, Altinity CEO
Read more >
Improve Query Performance with Clickhouse Data Skipping ...
Clickhouse MergeTree table engine provides a few data skipping indexes which makes queries faster by skipping granules of data (A granule is the ......
Read more >
Designing a faster data model to personalize browsing in real ...
The bottleneck had to be somewhere else, so we began to explore other ways of improving performance. After reducing the data read by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found