Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.5.8
    • N/A
    • Server
    • Fresh alpine linux 3.13 container running on an arm v7 debian 10.7 host

    Description

      Issue has also been filed in alpine packages

      The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):

      • On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
      • On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

      I see only one difference in start logs (but that may be irrelevant):

      • arm v7 : "Using generic crc32 instructions"
      • arm v8 : "Using ARMv8 crc32 instructions"

      Any idea why this happens and if there is an option in mysqld to circumvent the issue?

      Steps to reproduce:

      ~ # apk --update --upgrade add mariadb sudo
      ~ # mkdir /run/mysqld
      ~ # chown mysql:mysql /run/mysqld
      ~ # sudo -su mysql
      ~ $ mysql_install_db
      ~ $ mysqld --datadir=./data
      

      Attachments

        1. mariadb_sql_commands.txt
          76 kB
        2. perf.data
          634 kB
        3. perf.data
          46 kB
        4. perf.report
          7 kB

        Activity

          andre_r Andre R created issue -
          andre_r Andre R made changes -
          Field Original Value New Value
          Description The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):
          - On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
          - On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

          I see only one difference in start logs (but that may be irrelevant):
          - arm v7 : "Using generic crc32 instructions"
          - arm v8 : "Using ARMv8 crc32 instructions"

          Any idea why this happens and if there is an option in mysqld to circumvent the issue?

          Steps to reproduce:
          ~ # apk --update --upgrade add mariadb sudo
          ~ # mkdir /run/mysqld
          ~ # chown mysql:mysql /run/mysqld
          ~ # sudo -su mysql
          ~ $ mysql_install_db
          ~ $ mysqld --datadir=./data
          Issue has also been [filed in alpine packages|https://gitlab.alpinelinux.org/alpine/aports/-/issues/12384]

          The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):
          - On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
          - On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

          I see only one difference in start logs (but that may be irrelevant):
          - arm v7 : "Using generic crc32 instructions"
          - arm v8 : "Using ARMv8 crc32 instructions"

          Any idea why this happens and if there is an option in mysqld to circumvent the issue?

          Steps to reproduce:
          ~ # apk --update --upgrade add mariadb sudo
          ~ # mkdir /run/mysqld
          ~ # chown mysql:mysql /run/mysqld
          ~ # sudo -su mysql
          ~ $ mysql_install_db
          ~ $ mysqld --datadir=./data
          andre_r Andre R made changes -
          Description Issue has also been [filed in alpine packages|https://gitlab.alpinelinux.org/alpine/aports/-/issues/12384]

          The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):
          - On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
          - On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

          I see only one difference in start logs (but that may be irrelevant):
          - arm v7 : "Using generic crc32 instructions"
          - arm v8 : "Using ARMv8 crc32 instructions"

          Any idea why this happens and if there is an option in mysqld to circumvent the issue?

          Steps to reproduce:
          ~ # apk --update --upgrade add mariadb sudo
          ~ # mkdir /run/mysqld
          ~ # chown mysql:mysql /run/mysqld
          ~ # sudo -su mysql
          ~ $ mysql_install_db
          ~ $ mysqld --datadir=./data
          Issue has also been [filed in alpine packages|https://gitlab.alpinelinux.org/alpine/aports/-/issues/12384]

          The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):
          - On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
          - On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

          I see only one difference in start logs (but that may be irrelevant):
          - arm v7 : "Using generic crc32 instructions"
          - arm v8 : "Using ARMv8 crc32 instructions"

          Any idea why this happens and if there is an option in mysqld to circumvent the issue?

          Steps to reproduce:
          {{~ # apk --update --upgrade add mariadb sudo
          ~ # mkdir /run/mysqld
          ~ # chown mysql:mysql /run/mysqld
          ~ # sudo -su mysql
          ~ $ mysql_install_db
          ~ $ mysqld --datadir=./data}}
          andre_r Andre R made changes -
          Description Issue has also been [filed in alpine packages|https://gitlab.alpinelinux.org/alpine/aports/-/issues/12384]

          The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):
          - On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
          - On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

          I see only one difference in start logs (but that may be irrelevant):
          - arm v7 : "Using generic crc32 instructions"
          - arm v8 : "Using ARMv8 crc32 instructions"

          Any idea why this happens and if there is an option in mysqld to circumvent the issue?

          Steps to reproduce:
          {{~ # apk --update --upgrade add mariadb sudo
          ~ # mkdir /run/mysqld
          ~ # chown mysql:mysql /run/mysqld
          ~ # sudo -su mysql
          ~ $ mysql_install_db
          ~ $ mysqld --datadir=./data}}
          Issue has also been [filed in alpine packages|https://gitlab.alpinelinux.org/alpine/aports/-/issues/12384]

          The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):
          - On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
          - On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

          I see only one difference in start logs (but that may be irrelevant):
          - arm v7 : "Using generic crc32 instructions"
          - arm v8 : "Using ARMv8 crc32 instructions"

          Any idea why this happens and if there is an option in mysqld to circumvent the issue?

          Steps to reproduce:
          {noformat}
          ~ # apk --update --upgrade add mariadb sudo
          ~ # mkdir /run/mysqld
          ~ # chown mysql:mysql /run/mysqld
          ~ # sudo -su mysql
          ~ $ mysql_install_db
          ~ $ mysqld --datadir=./data
          {noformat}
          danblack Daniel Black added a comment -

          On arm v7 can you obtain two backtraces (with a few seconds between them):

          https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-backtraces-from-a-running-mysqld-process-with-gdb-on-linux including '
          set print frame-arguments all' in a gdb config option or command line.

          Also can you perform a `perf record -g -p $(pidof mysqld) – sleep 2` / `perf report -g --no-children --stdtio` to show where the CPU time is spent.

          Can you execute SQL like 'SHOW PROCESSLIST', 'SHOW ENGINE STATUS INNODB', 'SHOW GLOBAL STATUS'?

          Can you attach these outputs to this JIRA issue please?

          The lack of v7 optimized CPU isn't likely to cause 100% CPU, however the backtraces will show this.

          danblack Daniel Black added a comment - On arm v7 can you obtain two backtraces (with a few seconds between them): https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-backtraces-from-a-running-mysqld-process-with-gdb-on-linux including ' set print frame-arguments all' in a gdb config option or command line. Also can you perform a `perf record -g -p $(pidof mysqld) – sleep 2` / `perf report -g --no-children --stdtio` to show where the CPU time is spent. Can you execute SQL like 'SHOW PROCESSLIST', 'SHOW ENGINE STATUS INNODB', 'SHOW GLOBAL STATUS'? Can you attach these outputs to this JIRA issue please? The lack of v7 optimized CPU isn't likely to cause 100% CPU, however the backtraces will show this.
          andre_r Andre R made changes -
          Attachment mariadb_sql_commands.txt [ 55953 ]
          andre_r Andre R added a comment -

          1. Backtraces

          I'm afraid the backtraces won't be very useful :

          [New LWP 69]
          [New LWP 70]
          [New LWP 71]
          [New LWP 72]
          [New LWP 79]
          [New LWP 80]
          [New LWP 81]
          [New LWP 134]
          0xb6f16930 in ?? () from /lib/ld-musl-armhf.so.1
           
          Thread 9 (LWP 134 "mysqld"):
          #0  0xb6f16930 in ?? () from /lib/ld-musl-armhf.so.1
          No symbol table info available.
          #1  0xb6f16c84 in ?? () from /lib/ld-musl-armhf.so.1
          No symbol table info available.
          Backtrace stopped: previous frame identical to this frame (corrupt stack?)
          ...
          

          I see no "debug" package in my linux distribution. So I should compile mariadb myself to have symbols?

          2. perf report

          I guess this one is not very useful neither:

          # ========
          Error:
          failed to process sample
          # captured on    : Wed Feb  3 10:24:23 2021
          # header version : 1
          # data offset    : 816
          # data size      : 0
          # feat offset    : 816
          # hostname : (null)
          # os release : (null)
          # arch : (null)
          # cpudesc : (null)
          # cpuid : (null)
          # total memory : 0 kB
          # cmdline :
          # event desc: not available or unable to read
          # CPU_TOPOLOGY info available, use -I to display
          # pmu mappings: not available
          # time of first sample : 0.000000
          # time of last sample : 0.000000
          # sample duration :      0.000 ms
          # MEM_TOPOLOGY info available, use -I to display
          # cpu pmu capabilities: not available
          # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA
          # ========
          # 

          3. SQL commands

          This one is larger so I attach the output. mariadb_sql_commands.txt

          andre_r Andre R added a comment - 1. Backtraces I'm afraid the backtraces won't be very useful : [New LWP 69] [New LWP 70] [New LWP 71] [New LWP 72] [New LWP 79] [New LWP 80] [New LWP 81] [New LWP 134] 0xb6f16930 in ?? () from /lib/ld-musl-armhf.so.1   Thread 9 (LWP 134 "mysqld"): #0 0xb6f16930 in ?? () from /lib/ld-musl-armhf.so.1 No symbol table info available. #1 0xb6f16c84 in ?? () from /lib/ld-musl-armhf.so.1 No symbol table info available. Backtrace stopped: previous frame identical to this frame (corrupt stack?) ... I see no "debug" package in my linux distribution. So I should compile mariadb myself to have symbols? 2. perf report I guess this one is not very useful neither: # ======== Error: failed to process sample # captured on : Wed Feb 3 10:24:23 2021 # header version : 1 # data offset : 816 # data size : 0 # feat offset : 816 # hostname : (null) # os release : (null) # arch : (null) # cpudesc : (null) # cpuid : (null) # total memory : 0 kB # cmdline : # event desc: not available or unable to read # CPU_TOPOLOGY info available, use -I to display # pmu mappings: not available # time of first sample : 0.000000 # time of last sample : 0.000000 # sample duration : 0.000 ms # MEM_TOPOLOGY info available, use -I to display # cpu pmu capabilities: not available # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA # ======== # 3. SQL commands This one is larger so I attach the output. mariadb_sql_commands.txt
          andre_r Andre R added a comment -

          Additional info requested by alpine team in the alpine bug

          andre_r Andre R added a comment - Additional info requested by alpine team in the alpine bug
          andre_r Andre R made changes -
          Attachment perf.data [ 55964 ]
          andre_r Andre R added a comment -

          perf report (continued)

          It took me some more effort to produce the perf.data file. I hope it contains useful information. perf.data

          andre_r Andre R added a comment - perf report (continued) It took me some more effort to produce the perf.data file. I hope it contains useful information. perf.data
          danblack Daniel Black added a comment -

          perf.data is rather raw, it needs the ELF binary to resolve. Can you try `perf report -g --no-children -i ~/perf.data` and/or `objdump -d /usr/bin/mysqld` should give address that we can map to the perf data.

          Thanks for trying strace, its looking very much like an endless loop. The gdb of the running process will still be useful if you have it compiled with debug symbols (cflag=-g).

          danblack Daniel Black added a comment - perf.data is rather raw, it needs the ELF binary to resolve. Can you try `perf report -g --no-children -i ~/perf.data` and/or `objdump -d /usr/bin/mysqld` should give address that we can map to the perf data. Thanks for trying strace, its looking very much like an endless loop. The gdb of the running process will still be useful if you have it compiled with debug symbols (cflag=-g).
          joshbeer Josh Richards made changes -
          Attachment perf.report [ 56540 ]
          Attachment perf.data [ 56541 ]
          joshbeer Josh Richards added a comment -

          Not the OP but since it looks like there is some further data being waited on I thought I'd tried to provide it as I ran across this issue last night as well.

          perf.data + perf.report output attached from my setup perf.report perf.data

          joshbeer Josh Richards added a comment - Not the OP but since it looks like there is some further data being waited on I thought I'd tried to provide it as I ran across this issue last night as well. perf.data + perf.report output attached from my setup perf.report perf.data
          andre_r Andre R added a comment -

          (cross posted in alpine issue)
          Just discovered that this issue seems fixed when using the edge repository in alpine 3.13.
          A few packages are upgraded in edge, among which:

          mariadb (and co) from 10.5.8 to 10.5.9
          musl from 1.2.2-r0 to 1.2.2-r2

          I'd rather say (but I can't be sure) that musl was the culprit of the bug, as mariadb 10.5.8 used to run fine in a debian/buster container instead of alpine.

          andre_r Andre R added a comment - (cross posted in alpine issue ) Just discovered that this issue seems fixed when using the edge repository in alpine 3.13. A few packages are upgraded in edge, among which: mariadb (and co) from 10.5.8 to 10.5.9 musl from 1.2.2-r0 to 1.2.2-r2 I'd rather say (but I can't be sure) that musl was the culprit of the bug, as mariadb 10.5.8 used to run fine in a debian/buster container instead of alpine.
          elenst Elena Stepanova made changes -
          Epic/Theme server
          danblack Daniel Black added a comment - - edited

          Thanks for getting back to us andre_r. It appears from your upstream bug that both musl and mariadb have been updated on alpine 3.13/armv7.

          At the moments its a bit hard to determine the cause.

          We'd need musl MDBF-244, an armv7/armhf/Arm Cortex support in CI, and maybe even a alpine based testing (MDEV-18462) to find this kind of issue prerelease.

          Suggestions welcome how to obtain armv7 hardware/cloud infrastructure that can be integrated to a CI infrstructure as a runner (own container based preferable, and not a new dedicated CI framework like drone.io appears to be),
          Edit - found https://www.worksonarm.com/

          danblack Daniel Black added a comment - - edited Thanks for getting back to us andre_r . It appears from your upstream bug that both musl and mariadb have been updated on alpine 3.13/armv7. At the moments its a bit hard to determine the cause. We'd need musl MDBF-244 , an armv7/armhf/Arm Cortex support in CI, and maybe even a alpine based testing ( MDEV-18462 ) to find this kind of issue prerelease. Suggestions welcome how to obtain armv7 hardware/cloud infrastructure that can be integrated to a CI infrstructure as a runner (own container based preferable, and not a new dedicated CI framework like drone.io appears to be), Edit - found https://www.worksonarm.com/
          danblack Daniel Black made changes -
          Fix Version/s N/A [ 14700 ]
          Resolution Cannot Reproduce [ 5 ]
          Status Open [ 1 ] Closed [ 6 ]

          andre_r, in MariaDB 10.5.9 there was also a fix of MDEV-24270 where we replaced io_getevents() with a thinner wrapper of the system call to avoid 2 unnecessary wakeups per second. This basically works around a regression that was introduced by an ‘optimization’ in https://pagure.io/libaio/c/7cede5af5adf01ad26155061cc476aad0804d3fc several years ago. That ‘optimization’ would cause a race condition on shutdown in user space, leading to SIGSEGV or similar. But, I would find it hard to believe that the unnecessary wakeups would keep one CPU 100% busy.

          It would be interesting to try MariaDB 10.5.8 with the newer musl libc and 10.5.9 with the older libc, to narrow down the culprit.

          marko Marko Mäkelä added a comment - andre_r , in MariaDB 10.5.9 there was also a fix of MDEV-24270 where we replaced io_getevents() with a thinner wrapper of the system call to avoid 2 unnecessary wakeups per second. This basically works around a regression that was introduced by an ‘optimization’ in https://pagure.io/libaio/c/7cede5af5adf01ad26155061cc476aad0804d3fc several years ago. That ‘optimization’ would cause a race condition on shutdown in user space, leading to SIGSEGV or similar. But, I would find it hard to believe that the unnecessary wakeups would keep one CPU 100% busy. It would be interesting to try MariaDB 10.5.8 with the newer musl libc and 10.5.9 with the older libc, to narrow down the culprit.
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 118709 ] MariaDB v4 [ 158847 ]

          People

            Unassigned Unassigned
            andre_r Andre R
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.