Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.5.8
    • N/A
    • Server
    • Fresh alpine linux 3.13 container running on an arm v7 debian 10.7 host

    Description

      Issue has also been filed in alpine packages

      The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):

      • On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
      • On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

      I see only one difference in start logs (but that may be irrelevant):

      • arm v7 : "Using generic crc32 instructions"
      • arm v8 : "Using ARMv8 crc32 instructions"

      Any idea why this happens and if there is an option in mysqld to circumvent the issue?

      Steps to reproduce:

      ~ # apk --update --upgrade add mariadb sudo
      ~ # mkdir /run/mysqld
      ~ # chown mysql:mysql /run/mysqld
      ~ # sudo -su mysql
      ~ $ mysql_install_db
      ~ $ mysqld --datadir=./data
      

      Attachments

        1. mariadb_sql_commands.txt
          76 kB
        2. perf.data
          634 kB
        3. perf.data
          46 kB
        4. perf.report
          7 kB

        Activity

          danblack Daniel Black added a comment -

          perf.data is rather raw, it needs the ELF binary to resolve. Can you try `perf report -g --no-children -i ~/perf.data` and/or `objdump -d /usr/bin/mysqld` should give address that we can map to the perf data.

          Thanks for trying strace, its looking very much like an endless loop. The gdb of the running process will still be useful if you have it compiled with debug symbols (cflag=-g).

          danblack Daniel Black added a comment - perf.data is rather raw, it needs the ELF binary to resolve. Can you try `perf report -g --no-children -i ~/perf.data` and/or `objdump -d /usr/bin/mysqld` should give address that we can map to the perf data. Thanks for trying strace, its looking very much like an endless loop. The gdb of the running process will still be useful if you have it compiled with debug symbols (cflag=-g).
          joshbeer Josh Richards added a comment -

          Not the OP but since it looks like there is some further data being waited on I thought I'd tried to provide it as I ran across this issue last night as well.

          perf.data + perf.report output attached from my setup perf.report perf.data

          joshbeer Josh Richards added a comment - Not the OP but since it looks like there is some further data being waited on I thought I'd tried to provide it as I ran across this issue last night as well. perf.data + perf.report output attached from my setup perf.report perf.data
          andre_r Andre R added a comment -

          (cross posted in alpine issue)
          Just discovered that this issue seems fixed when using the edge repository in alpine 3.13.
          A few packages are upgraded in edge, among which:

          mariadb (and co) from 10.5.8 to 10.5.9
          musl from 1.2.2-r0 to 1.2.2-r2

          I'd rather say (but I can't be sure) that musl was the culprit of the bug, as mariadb 10.5.8 used to run fine in a debian/buster container instead of alpine.

          andre_r Andre R added a comment - (cross posted in alpine issue ) Just discovered that this issue seems fixed when using the edge repository in alpine 3.13. A few packages are upgraded in edge, among which: mariadb (and co) from 10.5.8 to 10.5.9 musl from 1.2.2-r0 to 1.2.2-r2 I'd rather say (but I can't be sure) that musl was the culprit of the bug, as mariadb 10.5.8 used to run fine in a debian/buster container instead of alpine.
          danblack Daniel Black added a comment - - edited

          Thanks for getting back to us andre_r. It appears from your upstream bug that both musl and mariadb have been updated on alpine 3.13/armv7.

          At the moments its a bit hard to determine the cause.

          We'd need musl MDBF-244, an armv7/armhf/Arm Cortex support in CI, and maybe even a alpine based testing (MDEV-18462) to find this kind of issue prerelease.

          Suggestions welcome how to obtain armv7 hardware/cloud infrastructure that can be integrated to a CI infrstructure as a runner (own container based preferable, and not a new dedicated CI framework like drone.io appears to be),
          Edit - found https://www.worksonarm.com/

          danblack Daniel Black added a comment - - edited Thanks for getting back to us andre_r . It appears from your upstream bug that both musl and mariadb have been updated on alpine 3.13/armv7. At the moments its a bit hard to determine the cause. We'd need musl MDBF-244 , an armv7/armhf/Arm Cortex support in CI, and maybe even a alpine based testing ( MDEV-18462 ) to find this kind of issue prerelease. Suggestions welcome how to obtain armv7 hardware/cloud infrastructure that can be integrated to a CI infrstructure as a runner (own container based preferable, and not a new dedicated CI framework like drone.io appears to be), Edit - found https://www.worksonarm.com/

          andre_r, in MariaDB 10.5.9 there was also a fix of MDEV-24270 where we replaced io_getevents() with a thinner wrapper of the system call to avoid 2 unnecessary wakeups per second. This basically works around a regression that was introduced by an ‘optimization’ in https://pagure.io/libaio/c/7cede5af5adf01ad26155061cc476aad0804d3fc several years ago. That ‘optimization’ would cause a race condition on shutdown in user space, leading to SIGSEGV or similar. But, I would find it hard to believe that the unnecessary wakeups would keep one CPU 100% busy.

          It would be interesting to try MariaDB 10.5.8 with the newer musl libc and 10.5.9 with the older libc, to narrow down the culprit.

          marko Marko Mäkelä added a comment - andre_r , in MariaDB 10.5.9 there was also a fix of MDEV-24270 where we replaced io_getevents() with a thinner wrapper of the system call to avoid 2 unnecessary wakeups per second. This basically works around a regression that was introduced by an ‘optimization’ in https://pagure.io/libaio/c/7cede5af5adf01ad26155061cc476aad0804d3fc several years ago. That ‘optimization’ would cause a race condition on shutdown in user space, leading to SIGSEGV or similar. But, I would find it hard to believe that the unnecessary wakeups would keep one CPU 100% busy. It would be interesting to try MariaDB 10.5.8 with the newer musl libc and 10.5.9 with the older libc, to narrow down the culprit.

          People

            Unassigned Unassigned
            andre_r Andre R
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.