Skip to content

feat: multi-threaded slave (MTS) parallel DML apply#1692

Open
jackiesre721 wants to merge 3 commits into
github:masterfrom
jackiesre721:feat/mts-parallel-apply
Open

feat: multi-threaded slave (MTS) parallel DML apply#1692
jackiesre721 wants to merge 3 commits into
github:masterfrom
jackiesre721:feat/mts-parallel-apply

Conversation

@jackiesre721
Copy link
Copy Markdown

Implements LOGICAL_CLOCK-based parallel binlog event application, mirroring MySQL 5.7 MTS scheduling. With --num-workers=N, gh-ost applies DML events to the ghost table using N concurrent workers, significantly increasing throughput for high-write tables.

Key components:

  • commitBarrier: dependency tracking via last_committed/sequence_number
  • mtsScheduleState: new-group detection and epoch reset handling
  • dmlCoordinator: transaction grouping and dependency-aware dispatch
  • dmlWorker: per-worker goroutine with independent DB connection
  • Deadlock-aware retry: immediate retry on errno 1213, 1s sleep on others
  • Monotonic coordinate update: prevents checkpoint regression when workers complete out of order

Backward compatible: --num-workers=1 (default) uses the original single-threaded path with zero behavioral changes.

A Pull Request should be associated with an Issue.

We wish to have discussions in Issues. A single issue may be targeted by multiple PRs.
If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues,
and potentially we'll be able to point development in a particular direction.

Related issue: https://github.com/github/gh-ost/issues/0123456789

Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md
Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR

Description

This PR [briefly explain what it does]

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

@meiji163
Copy link
Copy Markdown
Contributor

Thanks for the PR @jackiesre721. When we attempted MTS before we ran into data consistency issues. If possible could you run some integration tests and tests under high query load to increase confidence in correctness?

If I have time I will run some tests as well

@jackiesre721 jackiesre721 force-pushed the feat/mts-parallel-apply branch 2 times, most recently from 7147d84 to ab784af Compare May 28, 2026 23:52
Implements LOGICAL_CLOCK-based parallel binlog event application,
mirroring MySQL 5.7 MTS scheduling. With --num-workers=N, gh-ost
applies DML events to the ghost table using N concurrent workers,
significantly increasing throughput for high-write tables.

Key components:
- commitBarrier: dependency tracking via last_committed/sequence_number
- mtsScheduleState: new-group detection and epoch reset handling
- dmlCoordinator: transaction grouping and dependency-aware dispatch
- dmlWorker: per-worker goroutine with independent DB connection
- Deadlock-aware retry: immediate retry on errno 1213, 1s sleep on others
- Monotonic coordinate update: prevents checkpoint regression when
  workers complete out of order

Backward compatible: --num-workers=1 (default) uses the original
single-threaded path with zero behavioral changes.
@jackiesre721 jackiesre721 force-pushed the feat/mts-parallel-apply branch from ab784af to c4060b9 Compare May 29, 2026 02:17
Replace naive gap-free LWM with dispatched-subsequence tracking.
gh-ost sees only one table's binlog events, so sequence_numbers are
sparse (5, 9, 14, ...). A gap-free LWM would stall at the first gap.

Key changes:
- Track dispatched sequence numbers; LWM advances over the committed
  prefix of that subsequence
- Detect cross-table dependencies via dispatched set membership
  (replaces explicit parentSeenOnStream bool)
- Fix sync.Cond.Wait() not responding to context cancellation by
  spawning a watcher goroutine that calls Broadcast() on ctx.Done()
- Guard delegatedJobs under mu to prevent lost-wakeup deadlocks
- Add delegatedJobCount() and reset() helpers

Fixes the 16-worker deadlock where concurrent deadlock retries caused
all workers to block in waitForDependency indefinitely.
The CI runs localtests/test.sh without -g flag, so --gtid must be
in the per-test extra_args file for the MTS test to activate GTID
mode and use logical timestamps for dependency tracking.
@jackiesre721
Copy link
Copy Markdown
Author

Hi @meiji163, great questions. Here is a detailed summary of the testing we have done and the specific consistency issues we identified and fixed.

PR #1454 Root Cause Analysis

PR #1454 previously had data consistency issues. We traced the root cause to two bugs:

1. Naive gap-free LWM stalls on sparse sequence numbers

gh-ost only streams binlog rows for the single migrated table, but MySQL assigns sequence_number across ALL transactions on ALL tables. Consecutive transactions on our table might be numbered 5, 9, 14, ... with gaps for other tables' transactions we never see. A naive gap-free LWM (advance only lwm+1) would stall permanently at the first gap because those missing sequence numbers belong to other tables and are never committed in our barrier.

Fix: Replaced gap-free LWM with dispatched-subsequence tracking. The coordinator records every transaction it dispatches via addDelegatedJob(seq) in binlog order. The LWM advances over the committed prefix of that dispatched subsequence — skipping gaps from unobserved cross-table transactions. This mirrors MySQL 8.0's GAQ scheduling restricted to the transactions gh-ost actually applies.

2. Cross-table dependency false blocking

PR #1454 tracked whether a parent transaction was "seen on stream" via an explicit boolean. This had an observe/dispatch ordering hazard.

Fix: Cross-table dependencies are now detected directly from the dispatched set: if lastCommitted was never dispatched on this stream, it is a cross-table parent and we skip waiting. Because the coordinator processes transactions in binlog order and a child's lastCommitted is always < its own sequence_number, any parent on this stream has already been dispatched by the time we check.

3. sync.Cond deadlock under high concurrency

sync.Cond.Wait() does NOT respond to Go context cancellation. Under 16-thread sysbench load with 4 workers, concurrent deadlock retries caused all workers to block in waitForDependency indefinitely (Applied stalled at 82).

Fix: Added waitWhileLocked helper that spawns a short-lived goroutine per Wait() call. The goroutine calls Broadcast() on ctx.Done(), ensuring no waiter blocks past context cancellation.

Tests Run Locally

All tests were run on MySQL 9.6.0 with GTID enabled, binlog_format=ROW.

Unit Tests (28 tests, all passing)

TestCommitBarrier_CommitZeroIsNoop
TestCommitBarrier_WaitForDependencyAlreadySatisfied
TestCommitBarrier_WaitForDependencyBlocksUntilLWMAdvances
TestCommitBarrier_WaitForDependencyZeroIsNoop
TestCommitBarrier_WaitForDependencyCrossTableIsNoop
TestCommitBarrier_WaitForDependencyUndispatchedGapIsNoop
TestCommitBarrier_WaitForAllWorkers
TestCommitBarrier_WaitForAllWorkersNoLostWakeup
TestCommitBarrier_ConcurrentCommitsAdvanceLWM
TestCommitBarrier_Reset
TestCommitBarrier_PR1454_OutOfOrderDoesNotShortCircuit
TestCommitBarrier_PR1454_WaitBlocksWhenLWMBelowLastCommitted
TestCommitBarrier_PR1454_BinlogRotationResetsLWM
TestCommitBarrier_PR1454_ErrorDoesNotMarkCommitted

These specifically cover the PR #1454 scenarios: out-of-order commits, LWM blocking, binlog rotation resets, and error handling.

Sysbench Write-Load Integration Tests

8-thread sysbench, 4 workers (50k rows, 60s):

  • ~1,764 TPS, 105,882 transactions, 635,509 DML queries
  • Triple verification passed: row count + CHECKSUM TABLE + md5sum

16-thread sysbench, 4 workers (50k rows, 60s):

  • ~242 TPS, 14,569 transactions, 1 binlog rotation detected during migration
  • Triple verification passed: row count + CHECKSUM TABLE + md5sum

Both tests start sysbench write load BEFORE gh-ost migration to ensure concurrent writes throughout the entire migration lifecycle.

Multi-Iteration Consistency Tests

Also ran the mts-consistency-test.sh script which runs 5 iterations each with 2 and 4 workers, verifying data consistency after each run. All 10 iterations passed.

CI Integration

The localtests/mts-sysbench/ test is now integrated into the existing replica-tests.yml CI workflow. It runs as a standard localtest across all MySQL versions in the matrix (5.7, 8.0, 8.4, Percona). The test:

  • Starts sysbench write load on the master
  • Runs gh-ost migration with --gtid --num-workers=4 on the replica
  • Verifies MTS mode was activated (checks log for "Starting MTS mode with 4 workers")
  • Triple-verifies data consistency (row count + CHECKSUM TABLE + md5sum)
  • Reports binlog rotation detection

This should give confidence that MTS parallel apply maintains data consistency under concurrent write load across supported MySQL versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants