Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15912

InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.0 or 10.1 to 10.3

Details

    Description

      10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
      The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:

      • start the current 10.0 server;
      • create some tables and run some DML on them;
      • kill the server during operation (with SIGKILL);
      • restart the server with innodb-force-recovery=3, no client activity;
      • shutdown the server normally;

      Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

      10.3 b52bb6eb82db8

      2018-06-12 18:15:28 0x7fb1809ea700  InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
      InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit
       
      #5  0x00007fb1a1634448 in __GI_abort () at abort.c:89
      #6  0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
      #7  0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
      #8  0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
      #9  0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
      #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
      #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
      #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
      #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
      #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
      #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
      #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      

      This current test was run with --innodb-page-size=8K --loose-innodb_log_compressed_pages=on --loose-innodb-change-buffering=none, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with --innodb-page-size=8K, other two options don't make a difference; otherwise all defaults.

      ib_logfile-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

      Similar-looking crashes upon upgrade from 10.1 have also been observed before.
      10.2 doesn't crash on the same datadir.

      Attachments

        1. ib_logfile0.bz2
          8.52 MB
        2. ib_logfile1.bz2
          0.1 kB
        3. mdev15912_data.bgz
          4.18 MB

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Attachment mdev15912_data.bgz [ 45740 ]
            Attachment ib_logfile1.bz2 [ 45741 ]
            Attachment ib_logfile0.bz2 [ 45742 ]
            Fix Version/s 10.3 [ 22126 ]
            Description https://api.travis-ci.org/v3/job/367276228/log.txt
            {noformat:title=mariadb-10.3.6 debug}
            2018-04-17 16:12:33 0x7f0f817fa700 InnoDB: Assertion failure in file /home/travis/src/storage/innobase/trx/trx0purge.cc line 121
            InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit
            InnoDB: We intentionally generate a memory trap.
            InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            InnoDB: If you get repeated assertion failures or crashes, even
            InnoDB: immediately after the mysqld startup, there may be
            InnoDB: corruption in the InnoDB tablespace. Please refer to
            InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
            InnoDB: about forcing recovery.
            180417 16:12:33 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.
            Server version: 10.3.6-MariaDB-debug-log
            key_buffer_size=134217728
            read_buffer_size=131072
            max_used_connections=0
            max_threads=153
            thread_count=4
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467493 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
            Thread pointer: 0x7f0f68001d50
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7f0f817f9e08 thread_stack 0x49000
            /home/travis/server/bin/mysqld(my_print_stacktrace+0x3d)[0x1261350]
            include/ut0mutex.h:186(void mutex_init<PolicyMutex<TTASEventMutex<GenericPolicy> > >(PolicyMutex<TTASEventMutex<GenericPolicy> >*, latch_id_t, char const*, unsigned int))[0xabdec5]
            2018-04-17 16:12:33 0 [Warning] /home/travis/server/bin/mysqld: unknown variable 'loose-innodb-file-format=Barracuda'
            /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f0fa6f5ecb0]
            2018-04-17 16:12:33 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH','EMPTY_STRING_IS_NULL','SIMULTANEOUS_ASSIGNMENT'), found type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALU
            2018-04-17 16:12:33 0 [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.
            /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f0fa63b2035]
            /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f0fa63b579b]
            Version: '10.3.6-MariaDB-debug-log' socket: '/home/travis/logs/current1_1/mysql.sock' port: 19300 Source distribution
            Killed
            {noformat}
            {noformat:title=experimental 4feb78e17fbecf7fc7f2847b49b8c66b54879629}
             perl /home/travis/rqg///run-scenario.pl --grammar=conf/mariadb/oltp-transactional.yy --grammar2=conf/mariadb/oltp_and_ddl.yy --gendata=conf/mariadb/innodb_upgrade.zz --gendata-advanced --mysqld=--server-id=111 --scenario=UndoLogUpgrade --duration=200 --basedir1=/home/travis/old --basedir2=/home/travis/server --mysqld=--innodb-page-size=32K --mysqld=--innodb-compression-algorithm=zlib --mysqld=--loose-innodb-file-format=Barracuda --mysqld=--loose-innodb-file-per-table=1 --gendata=conf/mariadb/innodb_upgrade_compression.zz --no-mask --seed=1523981386 --threads=4 --queries=100M --mysqld=--loose-max-statement-time=20 --mysqld=--loose-lock-wait-timeout=20 --mysqld=--loose-innodb-lock-wait-timeout=10 --mysqld=--loose-innodb_log_compressed_pages=on --mtr-build-thread=300 --vardir1=/home/travis/logs/current1_1
            {noformat}

            Not reproducible so far.
            10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
            The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
            - start the current 10.0 server;
            - create some tables and run some DML on them;
            - kill the server during operation (with SIGKILL);
            - restart the server with {{innodb-force-recovery=3}}, no client activity;
            - shutdown the server normally;

            Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

            {noformat:title=10.3 b52bb6eb82db8}
            2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
            InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

            #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
            #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
            #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
            #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
            #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
            #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
            #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
            #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
            #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
            #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
            #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
            #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
            {noformat}

            This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

            {{ib_logfile}}s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

            10.2 doesn't crash on the same datadir.
            Summary [Draft] InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.1.22 to 10.3.6 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.0 or 10.1 to 10.3
            elenst Elena Stepanova made changes -
            Description 10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
            The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
            - start the current 10.0 server;
            - create some tables and run some DML on them;
            - kill the server during operation (with SIGKILL);
            - restart the server with {{innodb-force-recovery=3}}, no client activity;
            - shutdown the server normally;

            Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

            {noformat:title=10.3 b52bb6eb82db8}
            2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
            InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

            #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
            #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
            #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
            #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
            #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
            #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
            #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
            #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
            #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
            #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
            #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
            #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
            {noformat}

            This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

            {{ib_logfile}}s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

            10.2 doesn't crash on the same datadir.
            10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
            The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
            - start the current 10.0 server;
            - create some tables and run some DML on them;
            - kill the server during operation (with SIGKILL);
            - restart the server with {{innodb-force-recovery=3}}, no client activity;
            - shutdown the server normally;

            Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

            {noformat:title=10.3 b52bb6eb82db8}
            2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
            InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

            #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
            #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
            #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
            #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
            #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
            #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
            #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
            #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
            #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
            #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
            #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
            #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
            {noformat}

            This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

            {{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

            10.2 doesn't crash on the same datadir.
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Marko Mäkelä [ marko ]
            elenst Elena Stepanova made changes -
            Comment [ New occurrence: https://api.travis-ci.org/v3/job/372110074/log.txt
            again, from 10.1.22 to the current 10.3.

            https://api.travis-ci.org/v3/job/372110080/log.txt
            same from 10.0.34 ]
            elenst Elena Stepanova made changes -
            elenst Elena Stepanova made changes -
            Description 10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
            The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
            - start the current 10.0 server;
            - create some tables and run some DML on them;
            - kill the server during operation (with SIGKILL);
            - restart the server with {{innodb-force-recovery=3}}, no client activity;
            - shutdown the server normally;

            Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

            {noformat:title=10.3 b52bb6eb82db8}
            2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
            InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

            #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
            #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
            #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
            #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
            #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
            #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
            #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
            #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
            #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
            #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
            #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
            #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
            {noformat}

            This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

            {{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

            10.2 doesn't crash on the same datadir.
            10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
            The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
            - start the current 10.0 server;
            - create some tables and run some DML on them;
            - kill the server during operation (with SIGKILL);
            - restart the server with {{innodb-force-recovery=3}}, no client activity;
            - shutdown the server normally;

            Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

            {noformat:title=10.3 b52bb6eb82db8}
            2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
            InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

            #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
            #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
            #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
            #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
            #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
            #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
            #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
            #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
            #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
            #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
            #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
            #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
            {noformat}

            This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

            {{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

            Similar-looking crashes upon upgrade from 10.1 have also been observed before.
            10.2 doesn't crash on the same datadir.
            ben.anderson Ben Anderson added a comment -

            We're also seeing this issue under similar circumstances.

            We've got a 10.0 data dir which should have been shutdown cleanly but may not have in certain situations. Installing mariadb-server 10.3 and running mysql_upgrade works fine, and the server will boot. If the server is running enough or a certain action is performed (i'm not sure which, don't know enough), a purge will start. This will cause the crash.

            With some help from `dragonheart` on IRC, I installed debug symbols and gdb'd. Here is some clumsy GDB info that may help.

            (gdb) up
            #1  0x00007ffff5e7402a in __GI_abort () at abort.c:89
            89  abort.c: No such file or directory.
            (gdb) up
            #2  0x00005555559f34e2 in ut_dbg_assertion_failed (
                expr=expr@entry=0x5555563c2098 "purge_sys.tail.commit <= purge_sys.rseg->last_commit",
                file=file@entry=0x5555563c1f68 "/home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/trx/trx0purge.cc",
                line=line@entry=121) at /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc:61
            61  /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc: No such file or directory.
            (gdb) p purge_sys.tail.commit
            $1 = 127992
            (gdb) purge_sys.rseg->last_commit
            Undefined command: "purge_sys".  Try "help".
            (gdb) p purge_sys.rseg->last_commit
            $2 = 127991
            

            I then did some reading around and noticed we had missed a step in the MariaDB upgrade notes that for one of the version (i've lost the page now, but it might be have been 10.0 -> 10.1) it recommends doing a shutdown with innodb_fast_shutdown=0 before doing the upgrade. I tried this, and booted back up with 10.3, ran the mysql_upgrade and then booted 10.3, left it running under some workloads (very minor) and it appears to be stable.

            So - this is a case of user error, but I don't think MariaDB should be crashing quite this badly in this case. I'm keen to help debug any more if it means a better error message or something handled. I've watched this thread, if anyone needs more info.

            ben.anderson Ben Anderson added a comment - We're also seeing this issue under similar circumstances. We've got a 10.0 data dir which should have been shutdown cleanly but may not have in certain situations. Installing mariadb-server 10.3 and running mysql_upgrade works fine, and the server will boot. If the server is running enough or a certain action is performed (i'm not sure which, don't know enough), a purge will start. This will cause the crash. With some help from `dragonheart` on IRC, I installed debug symbols and gdb'd. Here is some clumsy GDB info that may help. (gdb) up #1 0x00007ffff5e7402a in __GI_abort () at abort.c:89 89 abort.c: No such file or directory. (gdb) up #2 0x00005555559f34e2 in ut_dbg_assertion_failed ( expr=expr@entry=0x5555563c2098 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=file@entry=0x5555563c1f68 "/home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/trx/trx0purge.cc", line=line@entry=121) at /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc:61 61 /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc: No such file or directory. (gdb) p purge_sys.tail.commit $1 = 127992 (gdb) purge_sys.rseg->last_commit Undefined command: "purge_sys". Try "help". (gdb) p purge_sys.rseg->last_commit $2 = 127991 I then did some reading around and noticed we had missed a step in the MariaDB upgrade notes that for one of the version (i've lost the page now, but it might be have been 10.0 -> 10.1) it recommends doing a shutdown with innodb_fast_shutdown=0 before doing the upgrade. I tried this, and booted back up with 10.3, ran the mysql_upgrade and then booted 10.3, left it running under some workloads (very minor) and it appears to be stable. So - this is a case of user error, but I don't think MariaDB should be crashing quite this badly in this case. I'm keen to help debug any more if it means a better error message or something handled. I've watched this thread, if anyone needs more info.

            ben.anderson, MariaDB upgrade is supposed to work even if the undo logs are not empty. Tests for that were implemented in MDEV-13269.

            In MariaDB 10.3, there are two major changes to the undo log format.

            1. MDEV-12288 eliminated the separate insert_undo log; the undo log records of a transaction will be written to a single persistent undo log, which was formerly called update_undo. On upgrade, old_insert logs of any incomplete transactions will be handled. There was a fix related to this in MDEV-15370. Because insert_undo logs were never visible to purge, this change should not be causing the issue.
            2. MDEV-15132 and MDEV-15158 stopped writing anything to the TRX_SYS page during transaction commit. This could be what is causing the reported problem. We could be retrieving the maximum transaction ID in the wrong way.

            There was also a minor change to the undo log format, in MDEV-14717 (making RENAME TABLE crash-safe inside InnoDB; the .frm operations still are not crash-safe).

            Note: A shutdown after SET GLOBAL innodb_fast_shutdown=0 may fail to empty some undo logs. This was fixed in MariaDB 10.3.6 only:
            MDEV-13603 innodb_fast_shutdown=0 may fail to purge all history
            It is not feasible to backport that to 10.1 or earlier. It might not be feasible to backport to 10.2 either.

            marko Marko Mäkelä added a comment - ben.anderson , MariaDB upgrade is supposed to work even if the undo logs are not empty. Tests for that were implemented in MDEV-13269 . In MariaDB 10.3, there are two major changes to the undo log format. MDEV-12288 eliminated the separate insert_undo log; the undo log records of a transaction will be written to a single persistent undo log, which was formerly called update_undo . On upgrade, old_insert logs of any incomplete transactions will be handled. There was a fix related to this in MDEV-15370 . Because insert_undo logs were never visible to purge, this change should not be causing the issue. MDEV-15132 and MDEV-15158 stopped writing anything to the TRX_SYS page during transaction commit. This could be what is causing the reported problem. We could be retrieving the maximum transaction ID in the wrong way. There was also a minor change to the undo log format, in MDEV-14717 (making RENAME TABLE  crash-safe inside InnoDB; the .frm operations still are not crash-safe). Note: A shutdown after SET GLOBAL innodb_fast_shutdown=0 may fail to empty some undo logs. This was fixed in MariaDB 10.3.6 only: MDEV-13603 innodb_fast_shutdown=0 may fail to purge all history It is not feasible to backport that to 10.1 or earlier. It might not be feasible to backport to 10.2 either.

            In the mdev15912_data.bgz when the assertion fails, both purge_sys.rseg->last_commit and purge_sys.tail.commit are 340491, and purge_sys.tail.commit is one more, causing the assertion to fail. Here are some interesting messages:

            10.3 a86a02a8447233940cc3b0f40e2b403691997807

            2018-07-25 12:58:45 0 [Note] InnoDB: Trx id counter is 170241
            …
            2018-07-25 12:58:45 0 [Note] InnoDB: Rolled back recovered transaction 169859
            2018-07-25 12:58:45 0 [Note] InnoDB: Rollback of non-prepared transactions completed
            …
            Version: '10.3.9-MariaDB-debug-log'  socket: '/dev/shm/10.3/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
            2018-07-25 13:03:45 0x7fffe0c10700  InnoDB: Assertion failure in file /mariadb/10.3/storage/innobase/trx/trx0purge.cc line 121
            

            If I start with innodb_force_recovery=3, then the server will not crash. But, this will not only prevent the rollback of recovered transactions but also the purge of old history (which is where the assertion would fail). With a source code modification we can only disable the rollback. This will let the server start up successfully:

            diff --git a/storage/innobase/trx/trx0roll.cc b/storage/innobase/trx/trx0roll.cc
            index c0864e40622..bd2e23716c4 100644
            --- a/storage/innobase/trx/trx0roll.cc
            +++ b/storage/innobase/trx/trx0roll.cc
            @@ -862,7 +862,6 @@ DECLARE_THREAD(trx_rollback_all_recovered)(void*)
             	if (trx_sys.rw_trx_hash.size()) {
             		ib::info() << "Starting in background the rollback of"
             			" recovered transactions";
            -		trx_rollback_recovered(true);
             		ib::info() << "Rollback of non-prepared transactions"
             			" completed";
             	}
            

            So, the failure must be somehow related to the rollback of the recovered transaction.

            marko Marko Mäkelä added a comment - In the mdev15912_data.bgz when the assertion fails, both purge_sys.rseg->last_commit and purge_sys.tail.commit are 340491, and purge_sys.tail.commit is one more, causing the assertion to fail. Here are some interesting messages: 10.3 a86a02a8447233940cc3b0f40e2b403691997807 2018-07-25 12:58:45 0 [Note] InnoDB: Trx id counter is 170241 … 2018-07-25 12:58:45 0 [Note] InnoDB: Rolled back recovered transaction 169859 2018-07-25 12:58:45 0 [Note] InnoDB: Rollback of non-prepared transactions completed … Version: '10.3.9-MariaDB-debug-log' socket: '/dev/shm/10.3/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution 2018-07-25 13:03:45 0x7fffe0c10700 InnoDB: Assertion failure in file /mariadb/10.3/storage/innobase/trx/trx0purge.cc line 121 If I start with innodb_force_recovery=3 , then the server will not crash. But, this will not only prevent the rollback of recovered transactions but also the purge of old history (which is where the assertion would fail). With a source code modification we can only disable the rollback. This will let the server start up successfully: diff --git a/storage/innobase/trx/trx0roll.cc b/storage/innobase/trx/trx0roll.cc index c0864e40622..bd2e23716c4 100644 --- a/storage/innobase/trx/trx0roll.cc +++ b/storage/innobase/trx/trx0roll.cc @@ -862,7 +862,6 @@ DECLARE_THREAD(trx_rollback_all_recovered)(void*) if (trx_sys.rw_trx_hash.size()) { ib::info() << "Starting in background the rollback of" " recovered transactions"; - trx_rollback_recovered(true); ib::info() << "Rollback of non-prepared transactions" " completed"; } So, the failure must be somehow related to the rollback of the recovered transaction.

            ben.anderson reported:

            purge_sys.tail.commit = 127992
            purge_sys.rseg->last_commit = 127991

            Also in mdev15912_data.bgz, the difference happens to be exactly 1:

            purge_sys.tail.commit = 340492
            purge_sys.rseg.last_commit = 340491

            The two-component last_commit field (trx->no << 1 | !old_insert) was introduced in MDEV-15370, in an attempt to relax this assertion. But, even if we ignore the low-order bit, the difference remains. It looks like the fix in MDEV-15370 was incomplete. Purge should not need to process any old_insert logs; we only have to make sure that the undo log pages are made available for reuse.

            marko Marko Mäkelä added a comment - ben.anderson reported: purge_sys.tail.commit = 127992 purge_sys.rseg->last_commit = 127991 Also in mdev15912_data.bgz , the difference happens to be exactly 1: purge_sys.tail.commit = 340492 purge_sys.rseg.last_commit = 340491 The two-component last_commit field ( trx->no << 1 | !old_insert ) was introduced in MDEV-15370 , in an attempt to relax this assertion. But, even if we ignore the low-order bit, the difference remains. It looks like the fix in MDEV-15370 was incomplete. Purge should not need to process any old_insert logs; we only have to make sure that the undo log pages are made available for reuse.
            marko Marko Mäkelä made changes -

            MDEV-13495 changed the logic so that the recovered old_insert undo logs would be moved to the purge queue. Before MDEV-12288, the TRX_UNDO_TO_FREE state was used. I tried various things, but this turns out to be more difficult to fix than I thought. I would prefer to avoid reintroducing the TRX_UNDO_TO_FREE state.

            diff --git a/storage/innobase/include/trx0rseg.h b/storage/innobase/include/trx0rseg.h
            index dbd80486b71..794f4e66ca3 100644
            --- a/storage/innobase/include/trx0rseg.h
            +++ b/storage/innobase/include/trx0rseg.h
            @@ -174,6 +174,8 @@ struct trx_rseg_t {
             
             	/** @return the commit ID of the last committed transaction */
             	trx_id_t last_trx_no() const { return last_commit >> 1; }
            +	/** @return whether the last committed transaction is an update */
            +	bool last_is_update() const { return bool(last_commit & 1); }
             
             	void set_last_trx_no(trx_id_t trx_no, bool is_update)
             	{
            diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
            index 32e050e9b91..b3c6f41827e 100644
            --- a/storage/innobase/trx/trx0purge.cc
            +++ b/storage/innobase/trx/trx0purge.cc
            @@ -118,9 +118,11 @@ inline bool TrxUndoRsegsIterator::set_next()
             	ut_ad(purge_sys.rseg->space->id == TRX_SYS_SPACE
             	      || srv_is_undo_tablespace(purge_sys.rseg->space->id));
             
            -	ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit);
            +	if (purge_sys.rseg->last_is_update()) {
            +		ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit);
            +		purge_sys.tail.commit = purge_sys.rseg->last_commit;
            +	}
             
            -	purge_sys.tail.commit = purge_sys.rseg->last_commit;
             	purge_sys.hdr_offset = purge_sys.rseg->last_offset;
             	purge_sys.hdr_page_no = purge_sys.rseg->last_page_no;
             
            @@ -307,10 +309,13 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr)
             				 MLOG_2BYTES, mtr);
             	}
             
            -	if (rseg->last_page_no == FIL_NULL) {
            +	const bool is_update = undo == trx->rsegs.m_redo.undo;
            +
            +	if (rseg->last_page_no == FIL_NULL
            +	    || (is_update && !rseg->last_is_update())) {
             		rseg->last_page_no = undo->hdr_page_no;
             		rseg->last_offset = undo->hdr_offset;
            -		rseg->set_last_trx_no(trx->no, undo == trx->rsegs.m_redo.undo);
            +		rseg->set_last_trx_no(trx->no, is_update);
             		rseg->needs_purge = true;
             	}
             
            @@ -1048,9 +1053,11 @@ static void trx_purge_truncate_history()
             	purge_sys_t::iterator& head = purge_sys.head.commit
             		? purge_sys.head : purge_sys.tail;
             
            -	if (head.trx_no() >= purge_sys.view.low_limit_no()) {
            +	const trx_id_t low_limit = purge_sys.view.low_limit_no();
            +	if (head.trx_no() >= low_limit) {
             		/* This is sometimes necessary. TODO: find out why. */
            -		head.reset_trx_no(purge_sys.view.low_limit_no());
            +		ib::info() << "reset " << head.trx_no() << " to " << low_limit;
            +		head.reset_trx_no(low_limit);
             		head.undo_no = 0;
             	}
             
            

            The last hunk is not emitting any message, and the dataset is still crashing. (Note: it is necessary to restore the data directory from mdev15912_data.bgz before every startup attempt, because the rollback of the recovered transaction would change the persistent undo log data structures, triggering the assertion failure on every subsequent startup even if the code for the rollback code path was later fixed.)

            marko Marko Mäkelä added a comment - MDEV-13495 changed the logic so that the recovered old_insert undo logs would be moved to the purge queue. Before MDEV-12288 , the TRX_UNDO_TO_FREE state was used. I tried various things, but this turns out to be more difficult to fix than I thought. I would prefer to avoid reintroducing the TRX_UNDO_TO_FREE state. diff --git a/storage/innobase/include/trx0rseg.h b/storage/innobase/include/trx0rseg.h index dbd80486b71..794f4e66ca3 100644 --- a/storage/innobase/include/trx0rseg.h +++ b/storage/innobase/include/trx0rseg.h @@ -174,6 +174,8 @@ struct trx_rseg_t { /** @return the commit ID of the last committed transaction */ trx_id_t last_trx_no() const { return last_commit >> 1; } + /** @return whether the last committed transaction is an update */ + bool last_is_update() const { return bool(last_commit & 1); } void set_last_trx_no(trx_id_t trx_no, bool is_update) { diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index 32e050e9b91..b3c6f41827e 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -118,9 +118,11 @@ inline bool TrxUndoRsegsIterator::set_next() ut_ad(purge_sys.rseg->space->id == TRX_SYS_SPACE || srv_is_undo_tablespace(purge_sys.rseg->space->id)); - ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit); + if (purge_sys.rseg->last_is_update()) { + ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit); + purge_sys.tail.commit = purge_sys.rseg->last_commit; + } - purge_sys.tail.commit = purge_sys.rseg->last_commit; purge_sys.hdr_offset = purge_sys.rseg->last_offset; purge_sys.hdr_page_no = purge_sys.rseg->last_page_no; @@ -307,10 +309,13 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr) MLOG_2BYTES, mtr); } - if (rseg->last_page_no == FIL_NULL) { + const bool is_update = undo == trx->rsegs.m_redo.undo; + + if (rseg->last_page_no == FIL_NULL + || (is_update && !rseg->last_is_update())) { rseg->last_page_no = undo->hdr_page_no; rseg->last_offset = undo->hdr_offset; - rseg->set_last_trx_no(trx->no, undo == trx->rsegs.m_redo.undo); + rseg->set_last_trx_no(trx->no, is_update); rseg->needs_purge = true; } @@ -1048,9 +1053,11 @@ static void trx_purge_truncate_history() purge_sys_t::iterator& head = purge_sys.head.commit ? purge_sys.head : purge_sys.tail; - if (head.trx_no() >= purge_sys.view.low_limit_no()) { + const trx_id_t low_limit = purge_sys.view.low_limit_no(); + if (head.trx_no() >= low_limit) { /* This is sometimes necessary. TODO: find out why. */ - head.reset_trx_no(purge_sys.view.low_limit_no()); + ib::info() << "reset " << head.trx_no() << " to " << low_limit; + head.reset_trx_no(low_limit); head.undo_no = 0; } The last hunk is not emitting any message, and the dataset is still crashing. (Note: it is necessary to restore the data directory from mdev15912_data.bgz before every startup attempt, because the rollback of the recovered transaction would change the persistent undo log data structures, triggering the assertion failure on every subsequent startup even if the code for the rollback code path was later fixed.)
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            marko Marko Mäkelä made changes -
            Labels compat56 compat57 regression upgrade

            We may have to introduce trx_rseg_t::old_insert_cached and some separate mechanism that will guarantee that old insert_undo pages from before the upgrade to 10.3 (or later) will eventually be freed, without interfering with the purge of transaction history.

            marko Marko Mäkelä added a comment - We may have to introduce trx_rseg_t::old_insert_cached  and some separate mechanism that will guarantee that old insert_undo pages from before the upgrade to 10.3 (or later) will eventually be freed, without interfering with the purge of transaction history.
            marko Marko Mäkelä made changes -
            elenst Elena Stepanova made changes -
            Labels compat56 compat57 regression upgrade affects-tests compat56 compat57 regression upgrade
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.4 [ 22408 ]
            Affects Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            marko Marko Mäkelä made changes -
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Critical [ 2 ] Minor [ 4 ]

            As described in MDEV-23755, we can prevent these failures by refusing InnoDB startup if any incomplete transactions have undo logs in the pre-MDEV-12288 format, with an error message saying that the older server version must be shut down cleanly, without using innodb_force_recovery=3.

            It might be possible to remove the old_insert data members altogether.

            marko Marko Mäkelä added a comment - As described in MDEV-23755 , we can prevent these failures by refusing InnoDB startup if any incomplete transactions have undo logs in the pre- MDEV-12288 format, with an error message saying that the older server version must be shut down cleanly, without using innodb_force_recovery=3 . It might be possible to remove the old_insert data members altogether.
            marko Marko Mäkelä made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            Priority Minor [ 4 ] Major [ 3 ]
            marko Marko Mäkelä made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            We will refuse an upgrade from older versions than MariaDB 10.3 if a clean shutdown was not performed, so that the undo logs would be emptied:

            2021-06-13 16:42:45 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown
            

            There will be only one persistent undo log per transaction. No main-memory data structures related to the partitioned persistent undo log before MDEV-12288 are retained.

            marko Marko Mäkelä added a comment - We will refuse an upgrade from older versions than MariaDB 10.3 if a clean shutdown was not performed, so that the undo logs would be emptied: 2021-06-13 16:42:45 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown There will be only one persistent undo log per transaction. No main-memory data structures related to the partitioned persistent undo log before MDEV-12288 are retained.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-06-21 13:33:54.0 2021-06-21 13:33:54.993
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.3 [ 25904 ]
            Fix Version/s 10.3.30 [ 25732 ]
            Fix Version/s 10.4.20 [ 25733 ]
            Fix Version/s 10.5.11 [ 25734 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            hborresen Hans Borresen added a comment - - edited

            It looks like this just got pulled into the 10.3 branch, but it breaks upgrades for anyone trying to go from 5.7 -> 10.3.30, or even 5.7 -> 10.3.29 -> 10.3.30

            I started a topic for this in the zulip chat.

            Log snippet, just in case it is helpful:

            Version: '10.3.29-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
            2021-06-21 20:58:29 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
            ...
            2021-06-21 20:58:29 0 [Note] InnoDB: FTS optimize thread exiting.
            2021-06-21 20:58:29 0 [Note] InnoDB: Starting shutdown...
            2021-06-21 20:58:29 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
            2021-06-21 20:58:29 0 [Note] InnoDB: Buffer pool(s) dump completed at 210621 20:58:29
            2021-06-21 20:58:31 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
            2021-06-21 20:58:31 0 [Note] InnoDB: Shutdown completed; log sequence number 3281751; transaction id 3852
            2021-06-21 20:58:31 0 [Note] /usr/sbin/mysqld: Shutdown complete
            ...
            (after updating to 10.3.30 via yum)
            2021-06-21 20:58:31 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
            2021-06-21 20:58:31 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown
            2021-06-21 20:58:31 0 [Note] InnoDB: Retry with innodb_force_recovery=5
            2021-06-21 20:58:31 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
            2021-06-21 20:58:31 0 [Note] InnoDB: Starting shutdown...
            2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' init function returned error.
            2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2021-06-21 20:58:31 0 [Note] Plugin 'FEEDBACK' is disabled.
            2021-06-21 20:58:31 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
            2021-06-21 20:58:31 0 [ERROR] Unknown/unsupported storage engine: InnoDB
            

            Edit: I went ahead and filed MDEV-25981 for this issue.

            hborresen Hans Borresen added a comment - - edited It looks like this just got pulled into the 10.3 branch, but it breaks upgrades for anyone trying to go from 5.7 -> 10.3.30, or even 5.7 -> 10.3.29 -> 10.3.30 I started a topic for this in the zulip chat. Log snippet, just in case it is helpful: Version: '10.3.29-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server 2021-06-21 20:58:29 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown ... 2021-06-21 20:58:29 0 [Note] InnoDB: FTS optimize thread exiting. 2021-06-21 20:58:29 0 [Note] InnoDB: Starting shutdown... 2021-06-21 20:58:29 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2021-06-21 20:58:29 0 [Note] InnoDB: Buffer pool(s) dump completed at 210621 20:58:29 2021-06-21 20:58:31 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2021-06-21 20:58:31 0 [Note] InnoDB: Shutdown completed; log sequence number 3281751; transaction id 3852 2021-06-21 20:58:31 0 [Note] /usr/sbin/mysqld: Shutdown complete ... (after updating to 10.3.30 via yum) 2021-06-21 20:58:31 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2021-06-21 20:58:31 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown 2021-06-21 20:58:31 0 [Note] InnoDB: Retry with innodb_force_recovery=5 2021-06-21 20:58:31 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption 2021-06-21 20:58:31 0 [Note] InnoDB: Starting shutdown... 2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' init function returned error. 2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2021-06-21 20:58:31 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-06-21 20:58:31 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded 2021-06-21 20:58:31 0 [ERROR] Unknown/unsupported storage engine: InnoDB Edit: I went ahead and filed MDEV-25981 for this issue.
            hborresen Hans Borresen made changes -

            hborresen, thank you. I should have actually tested the upgrade. As I expected, fixing MDEV-25981 involved only minor changes to the validation logic.

            marko Marko Mäkelä added a comment - hborresen , thank you. I should have actually tested the upgrade. As I expected, fixing MDEV-25981 involved only minor changes to the validation logic.
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 86623 ] MariaDB v4 [ 154179 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.