feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence#3340
feat(replication): add sequence padding to avoid full sync on misaligned WriteBatch sequence#3340sryanyuan wants to merge 5 commits intoapache:unstablefrom
Conversation
…ned WriteBatch sequence
…uring partial replication
…m/sryanyuan/kvrocks into feat-replication-sequence-padding
|
@sryanyuan Thanks for your PR. I know it's a good point for avoiding unnecessary full syncs in some scenarios, but my concern is that it might cause long-term inconsistency between the master and the replica, even in the metadata and its fields. For instance, the master has a not-yet-replicated command |
Thanks for pointing this out — you’re right, skipping WriteBatch entries via sequence padding can lead to long-term inconsistencies not only in the actual data but also in metadata like hash length 🥲 |
|



In Kvrocks replication, a replica's requested incremental sequence must align with the start sequence of a WriteBatch in the master's WAL. For example, if a WriteBatch starts at sequence 10 and contains 3 records, valid starting points are 10 or 13. Requests starting at 11 or 12 are considered invalid, causing incremental sync to fail and triggering a costly full sync.
A common cause of misaligned sequences occurs during master-slave failover: if the old master still has WAL entries that were not yet replicated to the old slave when the role switch happens, and the new master continues to accept writes, then re-establishing replication from this new master may, in certain cases, result in the replica requesting a sequence that falls inside an existing WriteBatch. This mismatch forces a full sync, even if only a few records are missing.
This change introduces an optional configuration
replication-enable-sequence-padding(default: no). When enabled, the master will send dummy WriteBatch entries to pad the replication stream, advancing the replica's sequence to the next valid position. This allows incremental sync to continue while skipping the missing records, avoiding full sync when only a small number of logs are lost.Trade-off: skipped records are not applied on the replica, potentially causing minor data inconsistency. This configuration is mainly intended for cache-like use cases, where the application can tolerate partial data loss or temporary inconsistency in favor of maintaining replication continuity and avoiding expensive full syncs.
Additionally, a new INFO metric
sync_partial_paddingis added to track the number of partial sync operations that succeeded due to sequence padding. This complements existing metrics:sync_partial_ok: successful partial syncs without paddingsync_partial_err: failed partial syncssync_partial_padding: successful partial syncs that relied on paddingThis metric helps operators monitor how often padding is used to avoid costly full syncs, and assess potential data inconsistency risk.
Changes:
replication-enable-sequence-paddingconfig and documentation.sync_partial_paddingto record padding-based partial syncs.TestReplicationSequencePaddingto verify behavior.