feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence by sryanyuan · Pull Request #3340 · apache/kvrocks

sryanyuan · 2026-01-13T03:24:15Z

In Kvrocks replication, a replica's requested incremental sequence must align with the start sequence of a WriteBatch in the master's WAL. For example, if a WriteBatch starts at sequence 10 and contains 3 records, valid starting points are 10 or 13. Requests starting at 11 or 12 are considered invalid, causing incremental sync to fail and triggering a costly full sync.

A common cause of misaligned sequences occurs during master-slave failover: if the old master still has WAL entries that were not yet replicated to the old slave when the role switch happens, and the new master continues to accept writes, then re-establishing replication from this new master may, in certain cases, result in the replica requesting a sequence that falls inside an existing WriteBatch. This mismatch forces a full sync, even if only a few records are missing.

This change introduces an optional configuration replication-enable-sequence-padding (default: no). When enabled, the master will send dummy WriteBatch entries to pad the replication stream, advancing the replica's sequence to the next valid position. This allows incremental sync to continue while skipping the missing records, avoiding full sync when only a small number of logs are lost.

Trade-off: skipped records are not applied on the replica, potentially causing minor data inconsistency. This configuration is mainly intended for cache-like use cases, where the application can tolerate partial data loss or temporary inconsistency in favor of maintaining replication continuity and avoiding expensive full syncs.

Additionally, a new INFO metric sync_partial_padding is added to track the number of partial sync operations that succeeded due to sequence padding. This complements existing metrics:

sync_partial_ok: successful partial syncs without padding
sync_partial_err: failed partial syncs
sync_partial_padding: successful partial syncs that relied on padding

This metric helps operators monitor how often padding is used to avoid costly full syncs, and assess potential data inconsistency risk.

Changes:

Added replication-enable-sequence-padding config and documentation.
Implemented padding logic in replication send path.
Added INFO metric sync_partial_padding to record padding-based partial syncs.
Added unit tests TestReplicationSequencePadding to verify behavior.

…ned WriteBatch sequence

…uring partial replication

…m/sryanyuan/kvrocks into feat-replication-sequence-padding

git-hulk · 2026-01-13T12:15:25Z

@sryanyuan Thanks for your PR. I know it's a good point for avoiding unnecessary full syncs in some scenarios, but my concern is that it might cause long-term inconsistency between the master and the replica, even in the metadata and its fields.

For instance, the master has a not-yet-replicated command HSET hash f0 v0, and we use the padding sequence to replace it. Aside from the inconsistency between the master and replica, it also has inconsistent HASH metadata and fields if we continue appending new fields to this hash key.

cc @PragmaTwice @torwig @caipengbo

sryanyuan · 2026-01-14T01:52:25Z

@sryanyuan Thanks for your PR. I know it's a good point for avoiding unnecessary full syncs in some scenarios, but my concern is that it might cause long-term inconsistency between the master and the replica, even in the metadata and its fields.

For instance, the master has a not-yet-replicated command HSET hash f0 v0, and we use the padding sequence to replace it. Aside from the inconsistency between the master and replica, it also has inconsistent HASH metadata and fields if we continue appending new fields to this hash key.

cc @PragmaTwice @torwig @caipengbo

Thanks for pointing this out — you’re right, skipping WriteBatch entries via sequence padding can lead to long-term inconsistencies not only in the actual data but also in metadata like hash length 🥲

sonarqubecloud · 2026-01-14T09:46:27Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
71.7% Coverage on New Code
0.8% Duplication on New Code

See analysis details on SonarQube Cloud

tclxyxj25245 and others added 5 commits January 13, 2026 11:15

feat(replication): add sequence padding to avoid full sync on misalig…

0577bc4

…ned WriteBatch sequence

Merge branch 'unstable' into feat-replication-sequence-padding

791e211

feat(replication): add INFO metric to track sequence padding events d…

974bfe4

…uring partial replication

Merge branch 'feat-replication-sequence-padding' of https://github.co…

5207c26

…m/sryanyuan/kvrocks into feat-replication-sequence-padding

chore: code format

4dfda69

PragmaTwice mentioned this pull request Feb 3, 2026

fix(replication): prevent WAL exhaustion from slow consumers #3357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence#3340

feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence#3340
sryanyuan wants to merge 5 commits intoapache:unstablefrom
sryanyuan:feat-replication-sequence-padding

sryanyuan commented Jan 13, 2026 •

edited

Loading

Uh oh!

git-hulk commented Jan 13, 2026

Uh oh!

sryanyuan commented Jan 14, 2026

Uh oh!

sonarqubecloud bot commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sryanyuan commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

git-hulk commented Jan 13, 2026

Uh oh!

sryanyuan commented Jan 14, 2026

Uh oh!

sonarqubecloud bot commented Jan 14, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sryanyuan commented Jan 13, 2026 •

edited

Loading