Skip to content

feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence#3340

Open
sryanyuan wants to merge 5 commits intoapache:unstablefrom
sryanyuan:feat-replication-sequence-padding
Open

feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence#3340
sryanyuan wants to merge 5 commits intoapache:unstablefrom
sryanyuan:feat-replication-sequence-padding

Conversation

@sryanyuan
Copy link
Contributor

@sryanyuan sryanyuan commented Jan 13, 2026

In Kvrocks replication, a replica's requested incremental sequence must align with the start sequence of a WriteBatch in the master's WAL. For example, if a WriteBatch starts at sequence 10 and contains 3 records, valid starting points are 10 or 13. Requests starting at 11 or 12 are considered invalid, causing incremental sync to fail and triggering a costly full sync.

A common cause of misaligned sequences occurs during master-slave failover: if the old master still has WAL entries that were not yet replicated to the old slave when the role switch happens, and the new master continues to accept writes, then re-establishing replication from this new master may, in certain cases, result in the replica requesting a sequence that falls inside an existing WriteBatch. This mismatch forces a full sync, even if only a few records are missing.

This change introduces an optional configuration replication-enable-sequence-padding (default: no). When enabled, the master will send dummy WriteBatch entries to pad the replication stream, advancing the replica's sequence to the next valid position. This allows incremental sync to continue while skipping the missing records, avoiding full sync when only a small number of logs are lost.

Trade-off: skipped records are not applied on the replica, potentially causing minor data inconsistency. This configuration is mainly intended for cache-like use cases, where the application can tolerate partial data loss or temporary inconsistency in favor of maintaining replication continuity and avoiding expensive full syncs.

Additionally, a new INFO metric sync_partial_padding is added to track the number of partial sync operations that succeeded due to sequence padding. This complements existing metrics:

  • sync_partial_ok: successful partial syncs without padding
  • sync_partial_err: failed partial syncs
  • sync_partial_padding: successful partial syncs that relied on padding

This metric helps operators monitor how often padding is used to avoid costly full syncs, and assess potential data inconsistency risk.

Changes:

  • Added replication-enable-sequence-padding config and documentation.
  • Implemented padding logic in replication send path.
  • Added INFO metric sync_partial_padding to record padding-based partial syncs.
  • Added unit tests TestReplicationSequencePadding to verify behavior.

@git-hulk
Copy link
Member

@sryanyuan Thanks for your PR. I know it's a good point for avoiding unnecessary full syncs in some scenarios, but my concern is that it might cause long-term inconsistency between the master and the replica, even in the metadata and its fields.

For instance, the master has a not-yet-replicated command HSET hash f0 v0, and we use the padding sequence to replace it. Aside from the inconsistency between the master and replica, it also has inconsistent HASH metadata and fields if we continue appending new fields to this hash key.

cc @PragmaTwice @torwig @caipengbo

@sryanyuan
Copy link
Contributor Author

@sryanyuan Thanks for your PR. I know it's a good point for avoiding unnecessary full syncs in some scenarios, but my concern is that it might cause long-term inconsistency between the master and the replica, even in the metadata and its fields.

For instance, the master has a not-yet-replicated command HSET hash f0 v0, and we use the padding sequence to replace it. Aside from the inconsistency between the master and replica, it also has inconsistent HASH metadata and fields if we continue appending new fields to this hash key.

cc @PragmaTwice @torwig @caipengbo

Thanks for pointing this out — you’re right, skipping WriteBatch entries via sequence padding can lead to long-term inconsistencies not only in the actual data but also in metadata like hash length 🥲

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants