Diagnosing high parts count in Clickhouse

High parts/partitions warnings shouldn’t be ignored. They can indicate a systemic problem with Clickhouse or your ingestion pipeline. Here’s how to diagnose it.

Check actual part counts

Start by verifying partition and part numbers across the cluster:

SELECT
    `table`,
    uniq(partition_id),
    count()
FROM clusterAllReplicas(default, system.parts)
WHERE database = 'my_database'
GROUP BY ALL

    ┌─table──────────────────┬─uniq(partition_id)─┬─count()─┐
 1. │ samples                │                  1 │     837 │
 2. │ hourly__agg            │                 83 │    6363 │
 3. │ daily__agg             │                 83 │    6065 │
 4. │ rt_events_src          │                  3 │   57206 │
 5. │ rt_stats_hourly__agg   │                  1 │   52850 │
    └────────────────────────┴────────────────────┴─────────┘

For non-clustered setups, drop clusterAllReplicas() and query system.parts directly.

In this example, the rt_ tables (real-time ingestion) have very high part counts - 50k+ parts is a problem.

Check replication queue

See if the replication log is backed up:

SELECT count()
FROM clusterAllReplicas(default, system.replication_queue)
WHERE database = 'my_database'

   ┌─count()─┐
1. │   10296 │
   └─────────┘

Run this a few times over several minutes. The count should be decreasing. If it’s stuck or growing, you have a problem - merges can’t keep up with ingestion.

Check active merges

SELECT count()
FROM clusterAllReplicas(default, system.merges)
WHERE database = 'my_database'

   ┌─count()─┐
1. │      10 │
   └─────────┘

You want small numbers here - dozens, not hundreds. If merges are piling up, Clickhouse is struggling to consolidate parts.

Check the logs

Look for serious errors in Clickhouse logs:

tail -f /var/log/clickhouse-server/clickhouse-server.err.log -n 100

You’re looking for:

Memory errors (OOM, RSS limits)
Large stack traces
Repeated failures

These warnings are normal and can be ignored:

<Warning> ... (ReplicatedMergeTreePartCheckThread): We have part 20251027_129034_129044_2
covering part 20251027_129040_129040_1, will not check

This just means a merge already happened and the old part is covered by a newer merged part.

Common causes

If parts are accumulating:

Ingestion too fast - you’re inserting faster than merges can consolidate
Too many partitions - partition by day, not by hour or minute
Merge threads starved - check background_pool_size setting
Disk I/O bottleneck - merges are I/O heavy

Quick fixes

Force a merge on a specific table:

OPTIMIZE TABLE my_database.my_table FINAL

Check merge settings:

SELECT name, value
FROM system.settings
WHERE name LIKE '%merge%' OR name LIKE '%background%'

If you’re consistently hitting high part counts, you likely need to either slow down ingestion, batch inserts more aggressively, or tune your merge settings.

Diagnosing High Parts Count in Clickhouse

Check actual part counts

Check replication queue

Check active merges

Check the logs

Common causes

Quick fixes

References

#Check actual part counts

#Check replication queue

#Check active merges

#Check the logs

#Common causes

#Quick fixes

References

Check actual part counts

Check replication queue

Check active merges

Check the logs

Common causes

Quick fixes