Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22889

InnoDB occasionally breaks the isolation of a recovered transaction

    XMLWordPrintable

    Details

      Description

      The test case innodb.innodb_force_recovery_rollback that was added for MDEV-21217 should have only two possible outcomes for the locking SELECT statement:

      • The statement is blocked if the fix of MDEV-21217 is missing, and the test will eventually fail with a lock wait timeout
      • The lock conflict will ensure that the statement will execute after the rollback has completed, and an empty table will be observed.

      What occasionally happens is that the locking SELECT will return nonempty contents for the table. I was not able to repeat it in my local environment, but it did occur on buildbot for several branches and builders.

      Such breakage of ACID is serious, and we must find and fix the reason.

      I believe that we must execute something like

      ./mtr innodb.innodb_force_recovery_rollback
      

      so that the server after the restart runs under rr record, so that we will get an execution trace of the failure for debugging.

      Here is an example of a failure: http://buildbot.askmonty.org/buildbot/builders/kvm-zyp-opensuse150-amd64/builds/3589/steps/mtr/logs/stdio

      10.5-ish 70a3e3ef552c0c65248151daaa45a2e978cfe86c

      CURRENT_TEST: innodb.innodb_force_recovery_rollback
      --- /usr/share/mysql-test/suite/innodb/r/innodb_force_recovery_rollback.result	2020-06-13 16:37:19.000000000 +0000
      +++ /dev/shm/var/4/log/innodb_force_recovery_rollback.reject	2020-06-13 20:35:22.444004488 +0000
      @@ -15,4 +15,1004 @@
       connection default;
       SELECT * FROM t0 LOCK IN SHARE MODE;
       a
      +1
      +2
      +3
      +4
      +997
      +998
      +999
      +1000
       DROP TABLE t0,t1;
       
      mysqltest: Result length mismatch
      

      Apparently, only the Isolation is being being violated: the locking read will return all 1000 records that the incomplete transaction had inserted.

      It is also theoretically possible that the incomplete transaction that inserted those 1,000 records was unexpectedly committed when the server was killed.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: