tune GUC defaults for remote ClickHouse targets by iskakaushik · Pull Request #39 · ClickHouse/pg_stat_ch

iskakaushik · 2026-02-14T23:15:56Z

Summary

queue_capacity: 65,536 → 131,072 (~600MB shmem, buffers ~0.5s at 260K events/sec)
batch_max: 10,000 → 200,000 (6x drain rate to remote ClickHouse)
flush_interval_ms: 1,000 → 200 (faster wakeup for bursty workloads)

Previous defaults were tuned for local ClickHouse (sub-ms INSERT latency). With ClickHouse Cloud (~80ms RTT), the old defaults caused 53% event loss at 37K TPS. The new defaults reduce loss to ~9% for the same workload — still not zero (that needs architectural changes like disk buffering), but a significant improvement with no code changes needed from users.

Benchmark results (32 clients, 30s pgbench, CH Cloud us-west-2):

batch_max	Event Loss	Drain Rate
10,000 (old default)	53.2%	~20K events/s
100,000	23.6%	~80K events/s
200,000 (new default)	8.7%	~125K events/s
500,000	2.5%	~134K events/s

Users targeting remote ClickHouse with high-throughput workloads should further increase batch_max and queue_capacity based on their load profile.

Test plan

Builds clean (mise run build)
Regression tests pass
Verified with local ClickHouse Docker (no behavior change expected)

🤖 Generated with Claude Code

Previous defaults were tuned for local ClickHouse (sub-millisecond INSERT latency). With remote targets like ClickHouse Cloud (~80ms RTT), the old defaults caused 53% event loss under moderate load (37K TPS pgbench). Changes: - queue_capacity: 65536 -> 131072 (shmem: ~600MB) - batch_max: 10000 -> 100000 (amortizes per-INSERT network overhead) - flush_interval_ms: 1000 -> 200 (faster drain wakeup for bursty workloads) Benchmarked against ClickHouse Cloud (us-west-2, ~80ms RTT): batch_max=10K -> 53% event loss, 20K events/s drain batch_max=100K -> 24% event loss, 80K events/s drain

Copilot

Pull request overview

This PR tunes the GUC (Grand Unified Configuration) defaults for pg_stat_ch to optimize performance for remote ClickHouse targets (e.g., ClickHouse Cloud) which have significantly higher network latency (~80ms RTT) compared to local ClickHouse instances (sub-ms latency). The changes increase queue capacity and batch size while reducing flush interval to improve throughput and reduce event loss under high-load scenarios.

Changes:

Increased queue_capacity from 65,536 to 131,072 (2x increase, ~600MB shmem)
Increased batch_max from 10,000 to 100,000 (10x increase for higher drain rate)
Reduced flush_interval_ms from 1,000ms to 200ms (5x faster wake-up for bursty workloads)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 14, 2026 23:15

Copilot started reviewing on behalf of iskakaushik February 14, 2026 23:16 View session

iskakaushik force-pushed the tune-guc-defaults branch from 6b7bab5 to 5b7ea92 Compare February 14, 2026 23:16

Copilot AI reviewed Feb 14, 2026

View reviewed changes

fix tests to match new queue_capacity default (131072)

5ad63a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tune GUC defaults for remote ClickHouse targets#39

tune GUC defaults for remote ClickHouse targets#39
iskakaushik wants to merge 2 commits intomainfrom
tune-guc-defaults

iskakaushik commented Feb 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iskakaushik commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

iskakaushik commented Feb 14, 2026 •

edited

Loading