Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33820

Deadlock if system time changes repeatedly during concurrent INSERTs

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 10.11.7, 11.3.2
    • None
    • None
    • None
    • Debian 12 on different hardware, on VMware ESX, and qemu

    Description

      This issue has been analyzed by Marko Mäkelä and for now it looks like this might rather be a bug in the Kernel, but never the less here's a formal report of the problem, in case someone ever encounters this in the wild too.

      Rapidly changing the system time (real time clock) while running a bunch of INSERT queries deadlocks MariaDB.
      If you run the attached script on a fresh install of (in my case) Debian 12 with MariaDB installed, the DB process will sooner or later deadlock, usually after around 10 minutes. You'll see a bunch of hanging INSERT queries in `SHOW PROCESSLIST`, any other queries touching the involved table will also hang forever.
      I have reproduced this with MariaDB 10.11.6 as shipped by Debian, as well as 10.11.7 and 11.3 from mariadb.org's apt repos for Debian 12. I also installed Kernel 6.8 (build by the Ubuntu Kernel team) and was able to reproduce. I've also tried setting `innodb_use_native_aio=0`, which didn't help. Testing with `libaio` wasn't done yet as this would require building from source.
      Happens on bare metal HW as well as VMware and QEMU VMs on Intel and AMD CPUs. So far this most reliably reproduces with at least 4 CPU cores, with one and two cores I had little luck to get this to trigger.

      If you run the attached test script, please make sure to use a disposable test environment, so that the script can be run as root and is able to mess with the system time.

      For the curious: "Why do you even change your system clock 100 times a second?" - This happened to us in production on one server. Turns out that server was running on an ESX node that accidentally had NTP disabled, while automatic time synchronization with the guest via open-vmware-tools was enabled. In addition, systemd-timesyncd was installed in the guest and set to use NTP. The ESX host's clock drifted by 65 seconds over time, so that vmware-tools and timesyncd were constantly fighting over the guest's system clock.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              srett Simon Rettberg
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.