I recently chased down a write-heavy performance regression that showed up when running the same workload on MongoDB 4.2.1 versus MongoDB 4.0.13. The punchline: the “slow” version spent a surprising amount of CPU time rebuilding WiredTiger row-store keys during eviction-driven reconciliation. With a targeted fix to key handling, the regression disappeared.
This post walks through the debugging approach end-to-end: how I reproduced the issue, what the flamegraphs revealed, how I used gdb to confirm the execution path, and why the root cause points to a specific key-buffer reset.
TL;DR
- The workload (many secondary indexes + concurrent updates) creates heavy eviction/reconciliation pressure in WiredTiger.
- In MongoDB 4.2.1, CPU time shifts dramatically into:
__wt_row_leaf_key_copy__wt_row_leaf_key_work
- In MongoDB 4.0.13, that cost is essentially absent; the equivalent “key work” shows up elsewhere (e.g.
__rec_cell_build_leaf_key) and is much cheaper overall. - Evidence from stack traces and conditional breakpoints shows the key-copy path is hit repeatedly across threads during eviction/reconcile.
- The likely culprit: an incorrect reset of
key->sizein__wt_row_leaf_key_work, invalidating cached decompressed keys and forcing repeated key reconstruction.
The Setup
I reproduced this in a clean environment (AWS EC2, Ubuntu 20.04), with the key goal being: make both versions run the same workload shape, then profile only the “interesting” phase.
The workload has three phases:
- Start a clean mongod with a fixed WiredTiger cache size.
- Insert phase
- Create many compound secondary indexes across 5 collections.
- Bulk insert 1M documents per collection using 5 concurrent threads.
- Update phase (the trigger)
- Run 5 concurrent threads, each repeatedly calling
updateManywith an_idmodulo predicate:query: { _id: { $mod: [100, i] } } update: { $inc: { a: 1 } }
- Run 5 concurrent threads, each repeatedly calling
This shape matters: lots of secondary indexes amplify per-update work, and concurrency increases dirty-page production—together pushing WiredTiger into frequent eviction and reconciliation.
Reproducing “Fast” vs “Slow”
I ran the same script against two builds by switching MONGO_HOME.
Fast run (MongoDB 4.0.13):
export MONGO_HOME="$HOME/work/mongo-4.0.13"
bash ./repro/repro_perf_v3.sh
Slow run (MongoDB 4.2.1):
export MONGO_HOME="$HOME/work/mongo-4.2.1"
bash ./repro/repro_perf_v3.sh
The script profiles the server process during the update phase using perf, then generates a flamegraph. The output artifacts are:
results/flamegraph_mongo-4.0.13.svgresults/flamegraph_mongo-4.2.1.svg
First Clue: Flamegraph Deltas
Flamegraphs are great for answering one question quickly: where did the CPU time go?
What “slow” looks like (4.2.1)
The slow run is dominated by WiredTiger eviction and reconciliation, and—critically—by row-store key copying/materialization:
__wt_evict_thread_rundominates sampled execution (eviction thread work).- Within reconciliation:
__wt_reconcileand__wt_rec_row_leafare major contributors.
- Row-key handling is extremely hot:
__wt_row_leaf_key_copyis a top contributor (~23% in my capture).__wt_row_leaf_key_workis also a top contributor (~22%).
At this point, the story becomes: “eviction/reconciliation is expected under write pressure, but why are we spending so much time copying keys?”
What “fast” looks like (4.0.13)
The fast run still spends time in eviction/reconcile (not surprising), but does not burn CPU in __wt_row_leaf_key_copy:
__wt_row_leaf_key_copyis negligible (~0.08%).- Key work shows up instead as
__rec_cell_build_leaf_key(noticeable, but not pathological).
That difference is the smoking gun: the regression isn’t “MongoDB updates are slower in general”—it’s that 4.2.1 takes a substantially more expensive key path inside WiredTiger reconciliation.
Turning Samples into an Execution Path
Flamegraphs tell you what is hot, but not always why it’s hot. To tie the hotspot to a real request path, I used gdb and captured stack traces during the update phase.
A representative call chain from the debug session shows the hotspot sits directly on the update commit path:
MongoDB update execution:
mongo::performUpdates
mongo::UpdateStage::transformAndUpdate
mongo::WiredTigerRecoveryUnit::_txnClose / commitUnitOfWork
WiredTiger commit + eviction/reconcile:
__wt_txn_commit
__wt_cache_eviction_worker
__wt_evict / __evict_page
__wt_reconcile / __wt_rec_row_leaf
__wt_row_leaf_key_copy
This mattered because it ruled out a bunch of red herrings (client-side overhead, query parsing, planner differences, etc.). The slowdown is firmly inside “commit triggers eviction, eviction triggers reconcile, reconcile does expensive key work.”
Confirming It’s Not a One-Off: Conditional Breakpoints
To test whether the key-copying path was rare or pervasive, I set a conditional breakpoint on __wt_row_leaf_key_copy that only triggers when the destination key buffer is empty:
stop only if key->size==0 && key->data==0
This breakpoint hit repeatedly, across different threads (foreground mongod threads and background threads). That was important evidence that the system is frequently reaching a state where it has to instantiate/copy keys from scratch, rather than reusing an existing representation.
In other words: not a single pathological page, but a systemic behavior under this workload.
What’s Actually Happening (Interpretation)
Here’s the high-level mechanism that fits all the evidence:
- The update workload creates a large amount of dirty data quickly (secondary indexes amplify it).
- WiredTiger starts evicting pages aggressively to keep cache under control.
- Eviction requires reconciliation: turning in-memory state into on-disk page images.
- During reconciliation of row-store leaf pages, WiredTiger needs access to keys.
- In 4.2.1, reconciliation frequently takes a path that materializes and copies row keys (
__wt_row_leaf_key_work→__wt_row_leaf_key_copy) far more often than in 4.0.13. - That additional key work becomes the dominant CPU cost, slowing eviction progress—and because eviction is on the critical path for commits under pressure, overall throughput collapses.
Root Cause (Evidence-Based Hypothesis)
The regression in 4.2.1 appears to be caused by an incorrect reset of key->size in __wt_row_leaf_key_work, which invalidates cached decompressed keys. Once invalidated, WiredTiger has to rebuild full keys repeatedly during eviction/reconciliation, which matches:
- the flamegraph shift (
__wt_row_leaf_key_copy/__wt_row_leaf_key_workbecoming top hotspots), - the stack traces showing the path is eviction→reconcile during update commits,
- and the conditional breakpoint repeatedly catching
key->size==0 && key->data==0.
The fix
Remove the incorrect reset so cached decompressed keys remain reusable; this restores key reuse and eliminates the extra key reconstruction work during eviction.
Lessons Learned / Debugging Checklist
A few takeaways I’d reuse for the next performance regression:
- Profile the smallest “interesting” phase. I only profiled the update phase; the insert phase just sets up pressure.
- Compare flamegraphs across versions. The delta often points straight at the subsystem that changed.
- Use gdb to connect hotspots to real request paths. It’s the fastest way to avoid guessing.
- Conditional breakpoints are underrated. They can tell you whether a suspicious state is rare or happening constantly under load.
- Eviction costs can become user-visible. Under high write pressure, eviction/reconcile work is often on the critical path for commit latency and throughput.
Artifacts / Repro Notes
The full reproduction scripts and logs are available here: KaiyueWei/mongo-server-44991.
If you want to reproduce the same analysis style, these are the core artifacts I generated:
results/flamegraph_mongo-4.0.13.svgresults/flamegraph_mongo-4.2.1.svggdb.log(stack traces + conditional breakpoint hits)repro/repro_perf_v3.sh(workload + perf profiling automation)