Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.3(EOL), 10.4(EOL)
Description
10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
- start the current 10.0 server;
- create some tables and run some DML on them;
- kill the server during operation (with SIGKILL);
- restart the server with innodb-force-recovery=3, no client activity;
- shutdown the server normally;
Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.
10.3 b52bb6eb82db8 |
2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
|
InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit
|
|
#5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
|
#6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
|
#7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
|
#8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
|
#9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
|
#10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
|
#11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
|
#12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
|
#13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
|
#14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
|
#15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
|
#16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
|
This current test was run with --innodb-page-size=8K --loose-innodb_log_compressed_pages=on --loose-innodb-change-buffering=none, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with --innodb-page-size=8K, other two options don't make a difference; otherwise all defaults.
ib_logfile-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.
Similar-looking crashes upon upgrade from 10.1 have also been observed before.
10.2 doesn't crash on the same datadir.
Attachments
Issue Links
- causes
-
MDEV-25981 Upgrade results in InnoDB failures
-
- Closed
-
-
MDEV-26465 Race condition in trx_purge_rseg_get_next_history_log()
-
- Closed
-
- is caused by
-
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC
-
- Closed
-
-
MDEV-15370 Upgrade fails when both insert_undo and update_undo exist for recovered transactions
-
- Closed
-
- is duplicated by
-
MDEV-15464 Assertion `purge_sys.purge_queue.empty() || purge_sys.purge_queue.top() != m_rsegs' failed in TrxUndoRsegsIterator::set_next upon upgrade from 10.0/10.1
-
- Closed
-
- relates to
-
MDEV-16952 Introduce SET GLOBAL innodb_max_purge_lag_wait
-
- Closed
-
-
MDEV-27437 Galera snapshot transfer fails to upgrade between some major versions
-
- Closed
-
-
MDEV-29475 trx_undo_rseg_free() does not write redo log
-
- Closed
-
-
MDEV-18454 Assertion `0' failed in ReadView::check_trx_id_sanity upon crash-upgrade from 10.2.6
-
- Closed
-
-
MDEV-18966 Transaction recovery may be broken after upgrade to 10.3
-
- Closed
-
-
MDEV-27800 upgrade from MariaDB 10.2 to 10.5.13 results in [ERROR] InnoDB: corrupted TRX_NO
-
- Closed
-
- mentioned in
-
Page Loading...
Activity
Field | Original Value | New Value |
---|---|---|
Attachment | mdev15912_data.bgz [ 45740 ] | |
Attachment | ib_logfile1.bz2 [ 45741 ] | |
Attachment | ib_logfile0.bz2 [ 45742 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Description |
https://api.travis-ci.org/v3/job/367276228/log.txt
{noformat:title=mariadb-10.3.6 debug} 2018-04-17 16:12:33 0x7f0f817fa700 InnoDB: Assertion failure in file /home/travis/src/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/ InnoDB: about forcing recovery. 180417 16:12:33 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.3.6-MariaDB-debug-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=153 thread_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467493 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f0f68001d50 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f0f817f9e08 thread_stack 0x49000 /home/travis/server/bin/mysqld(my_print_stacktrace+0x3d)[0x1261350] include/ut0mutex.h:186(void mutex_init<PolicyMutex<TTASEventMutex<GenericPolicy> > >(PolicyMutex<TTASEventMutex<GenericPolicy> >*, latch_id_t, char const*, unsigned int))[0xabdec5] 2018-04-17 16:12:33 0 [Warning] /home/travis/server/bin/mysqld: unknown variable 'loose-innodb-file-format=Barracuda' /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f0fa6f5ecb0] 2018-04-17 16:12:33 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH','EMPTY_STRING_IS_NULL','SIMULTANEOUS_ASSIGNMENT'), found type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALU 2018-04-17 16:12:33 0 [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f0fa63b2035] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f0fa63b579b] Version: '10.3.6-MariaDB-debug-log' socket: '/home/travis/logs/current1_1/mysql.sock' port: 19300 Source distribution Killed {noformat} {noformat:title=experimental 4feb78e17fbecf7fc7f2847b49b8c66b54879629} perl /home/travis/rqg///run-scenario.pl --grammar=conf/mariadb/oltp-transactional.yy --grammar2=conf/mariadb/oltp_and_ddl.yy --gendata=conf/mariadb/innodb_upgrade.zz --gendata-advanced --mysqld=--server-id=111 --scenario=UndoLogUpgrade --duration=200 --basedir1=/home/travis/old --basedir2=/home/travis/server --mysqld=--innodb-page-size=32K --mysqld=--innodb-compression-algorithm=zlib --mysqld=--loose-innodb-file-format=Barracuda --mysqld=--loose-innodb-file-per-table=1 --gendata=conf/mariadb/innodb_upgrade_compression.zz --no-mask --seed=1523981386 --threads=4 --queries=100M --mysqld=--loose-max-statement-time=20 --mysqld=--loose-lock-wait-timeout=20 --mysqld=--loose-innodb-lock-wait-timeout=10 --mysqld=--loose-innodb_log_compressed_pages=on --mtr-build-thread=300 --vardir1=/home/travis/logs/current1_1 {noformat} Not reproducible so far. |
10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario: - start the current 10.0 server; - create some tables and run some DML on them; - kill the server during operation (with SIGKILL); - restart the server with {{innodb-force-recovery=3}}, no client activity; - shutdown the server normally; Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below. {noformat:title=10.3 b52bb6eb82db8} 2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89 #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61 #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121 #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213 #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286 #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356 #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429 #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559 #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583 #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714 #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309 #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 {noformat} This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults. {{ib_logfile}}s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them. 10.2 doesn't crash on the same datadir. |
Summary | [Draft] InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.1.22 to 10.3.6 | InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.0 or 10.1 to 10.3 |
Description |
10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario: - start the current 10.0 server; - create some tables and run some DML on them; - kill the server during operation (with SIGKILL); - restart the server with {{innodb-force-recovery=3}}, no client activity; - shutdown the server normally; Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below. {noformat:title=10.3 b52bb6eb82db8} 2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89 #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61 #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121 #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213 #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286 #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356 #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429 #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559 #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583 #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714 #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309 #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 {noformat} This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults. {{ib_logfile}}s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them. 10.2 doesn't crash on the same datadir. |
10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario: - start the current 10.0 server; - create some tables and run some DML on them; - kill the server during operation (with SIGKILL); - restart the server with {{innodb-force-recovery=3}}, no client activity; - shutdown the server normally; Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below. {noformat:title=10.3 b52bb6eb82db8} 2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89 #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61 #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121 #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213 #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286 #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356 #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429 #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559 #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583 #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714 #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309 #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 {noformat} This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults. {{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them. 10.2 doesn't crash on the same datadir. |
Assignee | Elena Stepanova [ elenst ] | Marko Mäkelä [ marko ] |
Comment |
[ New occurrence: https://api.travis-ci.org/v3/job/372110074/log.txt
again, from 10.1.22 to the current 10.3. https://api.travis-ci.org/v3/job/372110080/log.txt same from 10.0.34 ] |
Comment |
[ New occurrences:
https://travis-ci.org/elenst/travis-tests/jobs/374961333 from 10.1.22 https://travis-ci.org/elenst/travis-tests/jobs/374961341 from 10.0.35 ] |
Description |
10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario: - start the current 10.0 server; - create some tables and run some DML on them; - kill the server during operation (with SIGKILL); - restart the server with {{innodb-force-recovery=3}}, no client activity; - shutdown the server normally; Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below. {noformat:title=10.3 b52bb6eb82db8} 2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89 #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61 #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121 #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213 #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286 #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356 #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429 #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559 #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583 #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714 #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309 #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 {noformat} This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults. {{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them. 10.2 doesn't crash on the same datadir. |
10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario: - start the current 10.0 server; - create some tables and run some DML on them; - kill the server during operation (with SIGKILL); - restart the server with {{innodb-force-recovery=3}}, no client activity; - shutdown the server normally; Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below. {noformat:title=10.3 b52bb6eb82db8} 2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89 #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61 #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121 #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213 #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286 #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356 #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429 #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559 #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583 #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714 #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309 #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 {noformat} This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults. {{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them. Similar-looking crashes upon upgrade from 10.1 have also been observed before. 10.2 doesn't crash on the same datadir. |
Link |
This issue is caused by |
Link |
This issue is caused by |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Labels | compat56 compat57 regression upgrade |
Link |
This issue is duplicated by |
Labels | compat56 compat57 regression upgrade | affects-tests compat56 compat57 regression upgrade |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Fix Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.4 [ 22408 ] |
Link |
This issue relates to |
Assignee | Marko Mäkelä [ marko ] | Eugene Kosov [ kevg ] |
Link |
This issue relates to |
Priority | Major [ 3 ] | Critical [ 2 ] |
Priority | Critical [ 2 ] | Minor [ 4 ] |
Assignee | Eugene Kosov [ kevg ] | Marko Mäkelä [ marko ] |
Priority | Minor [ 4 ] | Major [ 3 ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
issue.field.resolutiondate | 2021-06-21 13:33:54.0 | 2021-06-21 13:33:54.993 |
Fix Version/s | 10.6.3 [ 25904 ] | |
Fix Version/s | 10.3.30 [ 25732 ] | |
Fix Version/s | 10.4.20 [ 25733 ] | |
Fix Version/s | 10.5.11 [ 25734 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue causes |
Link |
This issue causes |
Workflow | MariaDB v3 [ 86623 ] | MariaDB v4 [ 154179 ] |
Link | This issue relates to TODO-3302 [ TODO-3302 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 36003 ] |
We're also seeing this issue under similar circumstances.
We've got a 10.0 data dir which should have been shutdown cleanly but may not have in certain situations. Installing mariadb-server 10.3 and running mysql_upgrade works fine, and the server will boot. If the server is running enough or a certain action is performed (i'm not sure which, don't know enough), a purge will start. This will cause the crash.
With some help from `dragonheart` on IRC, I installed debug symbols and gdb'd. Here is some clumsy GDB info that may help.
(gdb) up
#1 0x00007ffff5e7402a in __GI_abort () at abort.c:89
89 abort.c: No such file or directory.
(gdb) up
#2 0x00005555559f34e2 in ut_dbg_assertion_failed (
expr=expr@entry=0x5555563c2098 "purge_sys.tail.commit <= purge_sys.rseg->last_commit",
file=file@entry=0x5555563c1f68 "/home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/trx/trx0purge.cc",
line=line@entry=121) at /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc:61
61 /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc: No such file or directory.
(gdb) p purge_sys.tail.commit
$1 = 127992
(gdb) purge_sys.rseg->last_commit
Undefined command: "purge_sys". Try "help".
(gdb) p purge_sys.rseg->last_commit
$2 = 127991
I then did some reading around and noticed we had missed a step in the MariaDB upgrade notes that for one of the version (i've lost the page now, but it might be have been 10.0 -> 10.1) it recommends doing a shutdown with innodb_fast_shutdown=0 before doing the upgrade. I tried this, and booted back up with 10.3, ran the mysql_upgrade and then booted 10.3, left it running under some workloads (very minor) and it appears to be stable.
So - this is a case of user error, but I don't think MariaDB should be crashing quite this badly in this case. I'm keen to help debug any more if it means a better error message or something handled. I've watched this thread, if anyone needs more info.