Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35886

MariaDB Server frequently hanging, causing data corruption.

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.11.9
    • N/A
    • Server
    • None
    • Debian 12.9 (ARM)

    Description

      We are facing serious issues with MariaDB server hanging on queries, seemingly at random.

      Mostly we are seeing this in our production environments; we suspect this is due to these servers hosting larger sized databases compared to our other workloads.

      When mariadb-server hangs, it is not possible to stop it (short of using kill -9). it does not seem to matter whether or not the server is actively replicating or idle. The point where the server hangs seems to be random, but since it's usually mid-transaction, we face data loss each time this happens.

      There is nothing obvious to go on in the server logs or system logs. The systems are not resource constrained in any way. All are aws ec2 instance deployed in various regions.

      2025-01-19 22:08:26 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown
      2025-01-19 22:08:46 0 [Warning] /usr/sbin/mariadbd: Thread 8349 (user : 'root') did not exit
      

      The only change I can identify on our systems when this issue arose is normal system patching, which updated Debian from 12.8. to 12.9. I've included the package versions below.

      libsmartcols1:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      udev:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      python3.11:arm64 (3.11.2-6+deb12u4, 3.11.2-6+deb12u5)
      openssh-client:arm64 (1:9.2p1-2+deb12u3, 1:9.2p1-2+deb12u4)
      libnss-myhostname:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      libpam-systemd:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      ucf:arm64 (3.0043+nmu1, 3.0043+nmu1+deb12u1)
      libavahi-common-data:arm64 (0.8-10, 0.8-10+deb12u1)
      libtiff6:arm64 (4.5.0-6+deb12u1, 4.5.0-6+deb12u2)
      libsystemd0:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      libmount1:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      openssh-server:arm64 (1:9.2p1-2+deb12u3, 1:9.2p1-2+deb12u4)
      python3-urllib3:arm64 (1.26.12-1, 1.26.12-1+deb12u1)
      libpython3.11-minimal:arm64 (3.11.2-6+deb12u4, 3.11.2-6+deb12u5)
      util-linux:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      util-linux-extra:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      systemd:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      libudev1:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      fdisk:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      python3-pkg-resources:arm64 (66.1.1-1, 66.1.1-1+deb12u1)
      qemu-utils:arm64 (1:7.2+dfsg-7+deb12u7, 1:7.2+dfsg-7+deb12u12)
      libfdisk1:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      eject:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      libuuid1:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      uuid-runtime:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      systemd-resolved:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      base-files:arm64 (12.4+deb12u8, 12.4+deb12u9)
      python3-jinja2:arm64 (3.1.2-1, 3.1.2-1+deb12u1)
      libpython3.11-stdlib:arm64 (3.11.2-6+deb12u4, 3.11.2-6+deb12u5)
      libavahi-common3:arm64 (0.8-10, 0.8-10+deb12u1)
      mount:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      libglib2.0-0:arm64 (2.74.6-2+deb12u4, 2.74.6-2+deb12u5)
      openssh-sftp-server:arm64 (1:9.2p1-2+deb12u3, 1:9.2p1-2+deb12u4)
      python3.11-minimal:arm64 (3.11.2-6+deb12u4, 3.11.2-6+deb12u5)
      libsystemd-shared:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      systemd-sysv:arm64 (252.31-1~deb12u1, 252.33-1~deb12u1)
      libblkid1:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      linux-image-cloud-arm64:arm64 (6.1.119-1, 6.1.123-1)
      bsdutils:arm64 (1:2.38.1-5+deb12u2, 1:2.38.1-5+deb12u3)
      libavahi-client3:arm64 (0.8-10, 0.8-10+deb12u1)
      bsdextrautils:arm64 (2.38.1-5+deb12u2, 2.38.1-5+deb12u3)
      linux-libc-dev:arm64 (6.1.119-1, 6.1.123-1)
      

      Attachments

        Issue Links

          Activity

            xan@biblionix.com Xan Charbonnet added a comment - - edited

            stephen.hames, the version number is something like this:

            Linux 6.1.119 – Debian version 6.1.0-28
            Linux 6.1.123 – Debian version 6.1.0-29
            Linux 6.1.124 – Debian version 6.1.0-30

            There were six different io_uring fixes backported into the 6.1 branch between 6.1.119 and 6.1.123. One of the backports was buggy and causing the issue here in MDEV-35886. marko is thinking that it's possible that one of the other fixes might possibly have addressed MDEV-35334.

            I would recommend trying the kernel packages I produced which fixed the backported issue by adding the call to smp_mb(). Except you're on ARM so those won't work. Best I can think of is to do what I did on your platform, following this guide:
            https://www.dwarmstrong.org/kernel/

            To your question about timing of an updated kernel: the last several 6.1 releases from the kernel project were Jan 2, Jan 9, Jan 17, Jan 19, and Jan 23. Makes it seem like it won't be long before 6.1.128 is released with this fix.

            As for when Debian might release a package containing that new kernel, it's harder to say. They don't release one for every version. But since there's a critical bug open for this issue, they might jump on it quickly.

            xan@biblionix.com Xan Charbonnet added a comment - - edited stephen.hames , the version number is something like this: Linux 6.1.119 – Debian version 6.1.0-28 Linux 6.1.123 – Debian version 6.1.0-29 Linux 6.1.124 – Debian version 6.1.0-30 There were six different io_uring fixes backported into the 6.1 branch between 6.1.119 and 6.1.123. One of the backports was buggy and causing the issue here in MDEV-35886 . marko is thinking that it's possible that one of the other fixes might possibly have addressed MDEV-35334 . I would recommend trying the kernel packages I produced which fixed the backported issue by adding the call to smp_mb(). Except you're on ARM so those won't work. Best I can think of is to do what I did on your platform, following this guide: https://www.dwarmstrong.org/kernel/ To your question about timing of an updated kernel: the last several 6.1 releases from the kernel project were Jan 2, Jan 9, Jan 17, Jan 19, and Jan 23. Makes it seem like it won't be long before 6.1.128 is released with this fix. As for when Debian might release a package containing that new kernel, it's harder to say. They don't release one for every version. But since there's a critical bug open for this issue, they might jump on it quickly.

            Kernel 6.1.128 has been released and includes the fix.

            xan@biblionix.com Xan Charbonnet added a comment - Kernel 6.1.128 has been released and includes the fix.
            stephen.hames Stephen Hames added a comment -

            New kernel is rolling out for us, no hang issues observed so far.

            I think this issue can probably go ahead and be resolved.

            stephen.hames Stephen Hames added a comment - New kernel is rolling out for us, no hang issues observed so far. I think this issue can probably go ahead and be resolved.
            fabide Fabian Geiger added a comment - - edited

            I can confirm @Stepen Hames. I tested linux-image-6.1.0-31-amd64 (6.1.128-1) and there were no more issues.

            fabide Fabian Geiger added a comment - - edited I can confirm @Stepen Hames. I tested linux-image-6.1.0-31-amd64 (6.1.128-1) and there were no more issues.

            Closing, this is a kernel bug in io_uring, and is fixed in latest stable kernels

            knielsen Kristian Nielsen added a comment - Closing, this is a kernel bug in io_uring, and is fixed in latest stable kernels

            People

              Unassigned Unassigned
              stephen.hames Stephen Hames
              Votes:
              2 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.