Uploaded image for project: 'MariaDB Foundation Development'
  1. MariaDB Foundation Development
  2. MDBF-1194

Reduce the memory pressure of hz-bbw8 and hz-bbw9

    XMLWordPrintable

Details

    • Bug
    • Status: Verified (View Workflow)
    • Major
    • Resolution: Unresolved
    • BB v1.12
    • BB V1.13
    • Buildbot
    • None

    Description

      During the MTR phase of amd64-msan-clang-20-debug, the "MTR normal"
      step intermittently reports success even when errors are present
      in the stdio output.

      Observed symptoms:

      • truncated MTR logs in stdio
      • step completes with exit code 0 despite failures

      System logs show evidence of OOM killer activity during these runs.
      In particular, addr2line processes spawned while generating backtraces
      are being killed:

      Mar 18 02:55:49 hz-bbw8 kernel: checkpoint_bg invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
      Mar 18 02:55:49 hz-bbw8 kernel:  oom_kill_process.cold+0xb/0x10
      Mar 18 02:55:49 hz-bbw8 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
      Mar 18 02:55:50 hz-bbw8 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker-15b53e20d1e4f7dba6166ebce83a54eaa35e13a2460cd0647ef632639ad6a95b.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-15b53e20d1e4f7dba6166ebce83a54eaa35e13a2460cd0647ef632639ad6a95b.scope,task=addr2line,pid=212899,uid=1000
      Mar 18 02:55:50 hz-bbw8 kernel: Out of memory: Killed process 212899 (addr2line) total-vm:14357312kB, anon-rss:14320664kB, file-rss:3840kB, shmem-rss:0kB, UID:1000 pgtables:28136kB oom_score_adj:0
      Mar 18 02:55:52 hz-bbw8 kernel: oom_reaper: reaped process 212899 (addr2line), now anon-rss:0kB, file-rss:3840kB, shmem-rss:0kB
      

      The fact that most of the runs end while attempting a backtrace is not a coincidence.

      Retrying test innodb_fts.innodb_fts_large_records, attempt(2/3)...
      ***Warnings generated in error logs during shutdown after running tests: innodb_fts.innodb_fts_large_records
      260318 10:24:04 [ERROR] /home/buildbot/bld/sql/mariadbd got signal 6 ;
      Attempting backtrace. Include this in the bug report.
      

      This suggests that MTR does not properly propagate failures when
      child processes are terminated by the OOM killer, resulting in the
      main perl process exiting with code 0.

      This change documents the issue and is an attempt at avoiding the issue
      by reducing the number of parallel workers at runtime.

      Attachments

        Issue Links

          Activity

            People

              rvarzaru Varzaru Razvan-Liviu
              rvarzaru Varzaru Razvan-Liviu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0d
                  0d
                  Logged:
                  Time Spent - 1d
                  1d