Details
-
Bug
-
Status: Verified (View Workflow)
-
Major
-
Resolution: Unresolved
-
BB v1.12
-
None
Description
During the MTR phase of amd64-msan-clang-20-debug, the "MTR normal"
step intermittently reports success even when errors are present
in the stdio output.
Observed symptoms:
- truncated MTR logs in stdio
- step completes with exit code 0 despite failures
System logs show evidence of OOM killer activity during these runs.
In particular, addr2line processes spawned while generating backtraces
are being killed:
Mar 18 02:55:49 hz-bbw8 kernel: checkpoint_bg invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
|
Mar 18 02:55:49 hz-bbw8 kernel: oom_kill_process.cold+0xb/0x10 |
Mar 18 02:55:49 hz-bbw8 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
|
Mar 18 02:55:50 hz-bbw8 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker-15b53e20d1e4f7dba6166ebce83a54eaa35e13a2460cd0647ef632639ad6a95b.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-15b53e20d1e4f7dba6166ebce83a54eaa35e13a2460cd0647ef632639ad6a95b.scope,task=addr2line,pid=212899,uid=1000 |
Mar 18 02:55:50 hz-bbw8 kernel: Out of memory: Killed process 212899 (addr2line) total-vm:14357312kB, anon-rss:14320664kB, file-rss:3840kB, shmem-rss:0kB, UID:1000 pgtables:28136kB oom_score_adj:0 |
Mar 18 02:55:52 hz-bbw8 kernel: oom_reaper: reaped process 212899 (addr2line), now anon-rss:0kB, file-rss:3840kB, shmem-rss:0kB |
The fact that most of the runs end while attempting a backtrace is not a coincidence.
Retrying test innodb_fts.innodb_fts_large_records, attempt(2/3)... |
***Warnings generated in error logs during shutdown after running tests: innodb_fts.innodb_fts_large_records |
260318 10:24:04 [ERROR] /home/buildbot/bld/sql/mariadbd got signal 6 ; |
Attempting backtrace. Include this in the bug report. |
This suggests that MTR does not properly propagate failures when
child processes are terminated by the OOM killer, resulting in the
main perl process exiting with code 0.
This change documents the issue and is an attempt at avoiding the issue
by reducing the number of parallel workers at runtime.
Attachments
Issue Links
- relates to
-
MDEV-39113 addr2line stack trace resolver deterimental in MariaDB-backup and MSAN/ASAN builds
-
- In Review
-