Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36352

main.partition_explicit_prune sporadically crashes on loong64

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 11.8.1
    • None
    • None
    • None

    Description

      After uploading MariaDB 1:11.8.1-2 to Debian I saw this test crashing on arch loong64:

      main.partition_explicit_prune            w14 [ fail ]
              Test ended at 2025-03-21 04:35:53
       
      CURRENT_TEST: main.partition_explicit_prune
      mysqltest: At line 557: query 'TRUNCATE TABLE t2' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction
       
      The result from queries just before the failure was:
      < snip >
      HANDLER_ROLLBACK	1
      HANDLER_TMP_WRITE	24
      HANDLER_WRITE	1
      # 16 locks (2 tables, 1 + 5 subpartitions lock/unlock)
      FLUSH STATUS;
      INSERT IGNORE INTO t2 PARTITION (subp3) SELECT * FROM t1 PARTITION (subp3, `p10-99`, `p100-99999`);
      Warnings:
      Warning	1748	Found a row not matching the given partition set
      Warning	1748	Found a row not matching the given partition set
      Warning	1748	Found a row not matching the given partition set
      SELECT * FROM INFORMATION_SCHEMA.SESSION_STATUS
      WHERE VARIABLE_NAME LIKE 'HANDLER_%' AND VARIABLE_VALUE > 0;
      VARIABLE_NAME	VARIABLE_VALUE
      HANDLER_COMMIT	1
      HANDLER_READ_FIRST	5
      HANDLER_READ_NEXT	5
      HANDLER_TMP_WRITE	24
      HANDLER_WRITE	7
      # 16 locks (2 tables, 1 + 5 subpartitions lock/unlock)
      TRUNCATE TABLE t2;
       
      More results from queries before failure can be found in /<<PKGBUILDDIR>>/builddir/mysql-test/var/14/log/partition_explicit_prune.log
      

      Later down in the log I see the 3rd restart passed:

      main.partition_explicit_prune            w14 [ retry-pass ]  21544
      

      Full log at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=loong64&ver=1%3A11.8.1-2&stamp=1742531991&raw=0

      In the previous build of 1:11.8.1-1 the test passed right away: https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=loong64&ver=1%3A11.8.1-1&stamp=1742370987&raw=0

      Hence this issue seems sporadic.

      I also found MDEV-34824 with failure on same test, but it has a mismatch in value and not a crash.

      Attachments

        Issue Links

          Activity

            otto Otto Kekäläinen created issue -
            otto Otto Kekäläinen made changes -
            Field Original Value New Value
            danblack Daniel Black added a comment -

            Is the timeout always the same TRUNCATE TABLE t2 on line 557? The x86_64 local test time is 95ms so a timeout is quite large.

            Test case hasn't changed recently.

            I'm assuming loong64 is a newly tested arch?

            Some insight from those with native arch on the backtrace at the timeout corresponding to:

            mysql-test/mtr --gdb='b get_server_errmsgs if nr == 1205;r;thread apply all bt -frame-arguments all full' main.partition_explicit_prune

            danblack Daniel Black added a comment - Is the timeout always the same TRUNCATE TABLE t2 on line 557? The x86_64 local test time is 95ms so a timeout is quite large. Test case hasn't changed recently. I'm assuming loong64 is a newly tested arch? Some insight from those with native arch on the backtrace at the timeout corresponding to: mysql-test/mtr --gdb='b get_server_errmsgs if nr == 1205;r;thread apply all bt -frame-arguments all full' main.partition_explicit_prune
            marko Marko Mäkelä added a comment -

            I do not see any crash, but I see that a TRUNCATE TABLE on a partitioned table was interrupted due to a timeout.

            This kind of failures are recorded every now and then, starting with MariaDB Server 10.6, also on https://buildbot.mariadb.org, mostly in a test in ./mtr --suite=parts. Unfortunately, my attempts to search for an example failed: https://buildbot.mariadb.net/ci/reports/cross_reference would deliver the infamous "Error Loading Data" (meaning that it did find something but cannot display the results), and https://buildbot.mariadb.org/cr/ would either time out or not find anything relevant.

            I believe that MDEV-29566 could be relevant to this. This should be easy to verify by simulating the MDEV-4750 work-around by invoking the test suite with an additional parameter:

            ./mtr --mysqld=--loose-skip-innodb-stats-persistent
            

            marko Marko Mäkelä added a comment - I do not see any crash, but I see that a TRUNCATE TABLE on a partitioned table was interrupted due to a timeout. This kind of failures are recorded every now and then, starting with MariaDB Server 10.6, also on https://buildbot.mariadb.org , mostly in a test in ./mtr --suite=parts . Unfortunately, my attempts to search for an example failed: https://buildbot.mariadb.net/ci/reports/cross_reference would deliver the infamous "Error Loading Data" (meaning that it did find something but cannot display the results), and https://buildbot.mariadb.org/cr/ would either time out or not find anything relevant. I believe that MDEV-29566 could be relevant to this. This should be easy to verify by simulating the MDEV-4750 work-around by invoking the test suite with an additional parameter: ./mtr --mysqld=--loose-skip-innodb-stats-persistent
            marko Marko Mäkelä made changes -

            People

              Unassigned Unassigned
              otto Otto Kekäläinen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.