[MDEV-15912] InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.0 or 10.1 to 10.3 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.3.30, 10.4.20, 10.5.11, 10.6.3
Component/s: Storage Engine - InnoDB
Labels:

Description

10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:

start the current 10.0 server;
create some tables and run some DML on them;
kill the server during operation (with SIGKILL);
restart the server with innodb-force-recovery=3, no client activity;
shutdown the server normally;

Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

10.3 b52bb6eb82db8
2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

#5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
#6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
#7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
#8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
#9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
#10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
#11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
#12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
#13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
#14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
#15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
#16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This current test was run with --innodb-page-size=8K --loose-innodb_log_compressed_pages=on --loose-innodb-change-buffering=none, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with --innodb-page-size=8K, other two options don't make a difference; otherwise all defaults.

ib_logfile-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

Similar-looking crashes upon upgrade from 10.1 have also been observed before.
10.2 doesn't crash on the same datadir.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ib_logfile0.bz2
8.52 MB
2018-06-12 15:37
ib_logfile1.bz2
0.1 kB
2018-06-12 15:37
mdev15912_data.bgz
4.18 MB
2018-06-12 15:37

Issue Links

causes

MDEV-25981 Upgrade results in InnoDB failures

Closed

MDEV-26465 Race condition in trx_purge_rseg_get_next_history_log()

Closed

is caused by

MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC

Closed

MDEV-15370 Upgrade fails when both insert_undo and update_undo exist for recovered transactions

Closed

is duplicated by

MDEV-15464 Assertion `purge_sys.purge_queue.empty() || purge_sys.purge_queue.top() != m_rsegs' failed in TrxUndoRsegsIterator::set_next upon upgrade from 10.0/10.1

Closed

relates to

MDEV-16952 Introduce SET GLOBAL innodb_max_purge_lag_wait

Closed

MDEV-27437 Galera snapshot transfer fails to upgrade between some major versions

Closed

MDEV-29475 trx_undo_rseg_free() does not write redo log

Closed

MDEV-18454 Assertion `0' failed in ReadView::check_trx_id_sanity upon crash-upgrade from 10.2.6

Closed

MDEV-18966 Transaction recovery may be broken after upgrade to 10.3

Closed

MDEV-27800 upgrade from MariaDB 10.2 to 10.5.13 results in [ERROR] InnoDB: corrupted TRX_NO

Closed

mentioned in: Page Loading...

(6 relates to, 1 mentioned in)

Activity

Ascending order - Click to sort in descending order

Elena Stepanova created issue - 2018-04-17 23:03

Elena Stepanova made changes - 2018-06-12 15:38

Field	Original Value	New Value
Attachment		mdev15912_data.bgz [ 45740 ]
Attachment		ib_logfile1.bz2 [ 45741 ]
Attachment		ib_logfile0.bz2 [ 45742 ]
Fix Version/s		10.3 [ 22126 ]
Description	https://api.travis-ci.org/v3/job/367276228/log.txt {noformat:title=mariadb-10.3.6 debug} 2018-04-17 16:12:33 0x7f0f817fa700 InnoDB: Assertion failure in file /home/travis/src/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/ InnoDB: about forcing recovery. 180417 16:12:33 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.3.6-MariaDB-debug-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=153 thread_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)max_threads = 467493 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f0f68001d50 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f0f817f9e08 thread_stack 0x49000 /home/travis/server/bin/mysqld(my_print_stacktrace+0x3d)[0x1261350] include/ut0mutex.h:186(void mutex_init<PolicyMutex<TTASEventMutex<GenericPolicy> > >(PolicyMutex<TTASEventMutex<GenericPolicy> >, latch_id_t, char const*, unsigned int))[0xabdec5] 2018-04-17 16:12:33 0 [Warning] /home/travis/server/bin/mysqld: unknown variable 'loose-innodb-file-format=Barracuda' /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f0fa6f5ecb0] 2018-04-17 16:12:33 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH','EMPTY_STRING_IS_NULL','SIMULTANEOUS_ASSIGNMENT'), found type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALU 2018-04-17 16:12:33 0 [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f0fa63b2035] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f0fa63b579b] Version: '10.3.6-MariaDB-debug-log' socket: '/home/travis/logs/current1_1/mysql.sock' port: 19300 Source distribution Killed {noformat} {noformat:title=experimental 4feb78e17fbecf7fc7f2847b49b8c66b54879629} perl /home/travis/rqg///run-scenario.pl --grammar=conf/mariadb/oltp-transactional.yy --grammar2=conf/mariadb/oltp_and_ddl.yy --gendata=conf/mariadb/innodb_upgrade.zz --gendata-advanced --mysqld=--server-id=111 --scenario=UndoLogUpgrade --duration=200 --basedir1=/home/travis/old --basedir2=/home/travis/server --mysqld=--innodb-page-size=32K --mysqld=--innodb-compression-algorithm=zlib --mysqld=--loose-innodb-file-format=Barracuda --mysqld=--loose-innodb-file-per-table=1 --gendata=conf/mariadb/innodb_upgrade_compression.zz --no-mask --seed=1523981386 --threads=4 --queries=100M --mysqld=--loose-max-statement-time=20 --mysqld=--loose-lock-wait-timeout=20 --mysqld=--loose-innodb-lock-wait-timeout=10 --mysqld=--loose-innodb_log_compressed_pages=on --mtr-build-thread=300 --vardir1=/home/travis/logs/current1_1 {noformat} Not reproducible so far.	10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir. The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario: - start the current 10.0 server; - create some tables and run some DML on them; - kill the server during operation (with SIGKILL); - restart the server with {{innodb-force-recovery=3}}, no client activity; - shutdown the server normally; Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below. {noformat:title=10.3 b52bb6eb82db8} 2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121 InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit #5 0x00007fb1a1634448 in __GI_abort () at abort.c:89 #6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61 #7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121 #8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213 #9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286 #10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356 #11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429 #12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559 #13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583 #14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714 #15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309 #16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 {noformat} This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults. {{ib_logfile}}s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them. 10.2 doesn't crash on the same datadir.
Summary	[Draft] InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.1.22 to 10.3.6	InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit upon upgrade from 10.0 or 10.1 to 10.3

Elena Stepanova made changes - 2018-06-12 15:39

Description

10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
- start the current 10.0 server;
- create some tables and run some DML on them;
- kill the server during operation (with SIGKILL);
- restart the server with {{innodb-force-recovery=3}}, no client activity;
- shutdown the server normally;

Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

{noformat:title=10.3 b52bb6eb82db8}
2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

#5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
#6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
#7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
#8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
#9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
#10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
#11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
#12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
#13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
#14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
#15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
#16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}

This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

{{ib_logfile}}s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

10.2 doesn't crash on the same datadir.

10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
- start the current 10.0 server;
- create some tables and run some DML on them;
- kill the server during operation (with SIGKILL);
- restart the server with {{innodb-force-recovery=3}}, no client activity;
- shutdown the server normally;

Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

{noformat:title=10.3 b52bb6eb82db8}
2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

#5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
#6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
#7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
#8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
#9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
#10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
#11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
#12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
#13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
#14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
#15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
#16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}

This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

{{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

10.2 doesn't crash on the same datadir.

Elena Stepanova made changes - 2018-06-12 15:39

Assignee

Elena Stepanova [ elenst ]

Marko Mäkelä [ marko ]

Elena Stepanova made changes - 2018-06-12 15:39

Comment

[ New occurrence: https://api.travis-ci.org/v3/job/372110074/log.txt
again, from 10.1.22 to the current 10.3.

https://api.travis-ci.org/v3/job/372110080/log.txt
same from 10.0.34 ]

Elena Stepanova made changes - 2018-06-12 15:39

Comment

[ New occurrences:
https://travis-ci.org/elenst/travis-tests/jobs/374961333 from 10.1.22
https://travis-ci.org/elenst/travis-tests/jobs/374961341 from 10.0.35
]

Elena Stepanova made changes - 2018-06-12 15:40

Description

10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
- start the current 10.0 server;
- create some tables and run some DML on them;
- kill the server during operation (with SIGKILL);
- restart the server with {{innodb-force-recovery=3}}, no client activity;
- shutdown the server normally;

Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

{noformat:title=10.3 b52bb6eb82db8}
2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

#5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
#6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
#7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
#8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
#9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
#10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
#11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
#12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
#13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
#14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
#15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
#16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}

This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

{{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

10.2 doesn't crash on the same datadir.

10.3 server crashes with a non-debug assertion failure when it starts on the attached datadir.
The datadir was created on the current 10.0 server as a part of the undo-upgrade scenario:
- start the current 10.0 server;
- create some tables and run some DML on them;
- kill the server during operation (with SIGKILL);
- restart the server with {{innodb-force-recovery=3}}, no client activity;
- shutdown the server normally;

Then the current 10.3 server is started on the same datadir. It starts, but crashes immediately afterwards as below.

{noformat:title=10.3 b52bb6eb82db8}
2018-06-12 18:15:28 0x7fb1809ea700 InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/trx/trx0purge.cc line 121
InnoDB: Failing assertion: purge_sys.tail.commit <= purge_sys.rseg->last_commit

#5 0x00007fb1a1634448 in __GI_abort () at abort.c:89
#6 0x00007fb1a44f44fb in ut_dbg_assertion_failed (expr=0x7fb1a4b6b0c8 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=0x7fb1a4b6af80 "/data/src/10.3/storage/innobase/trx/trx0purge.cc", line=121) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
#7 0x00007fb1a44c2dfb in TrxUndoRsegsIterator::set_next (this=0x7fb1a530c0a0 <purge_sys+416>) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:121
#8 0x00007fb1a44c0d4b in trx_purge_choose_next_log () at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1213
#9 0x00007fb1a44c0fa4 in trx_purge_get_next_rec (n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1286
#10 0x00007fb1a44c11f5 in trx_purge_fetch_next_rec (roll_ptr=0x7fb17400d8f0, n_pages_handled=0x7fb1809e9dd0, heap=0x7fb1a85d2780) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1356
#11 0x00007fb1a44c151c in trx_purge_attach_undo_recs (n_purge_threads=4) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1429
#12 0x00007fb1a44c19ff in trx_purge (n_purge_threads=4, truncate=false) at /data/src/10.3/storage/innobase/trx/trx0purge.cc:1559
#13 0x00007fb1a4492d44 in srv_do_purge (n_total_purged=0x7fb1809e9ed0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2583
#14 0x00007fb1a4493172 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.3/storage/innobase/srv/srv0srv.cc:2714
#15 0x00007fb1a3293064 in start_thread (arg=0x7fb1809ea700) at pthread_create.c:309
#16 0x00007fb1a16e662d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}

This current test was run with {{\-\-innodb-page-size=8K \-\-loose-innodb_log_compressed_pages=on \-\-loose-innodb-change-buffering=none}}, I'm not sure whether any of them important. Naturally, to reproduce the crash on the attached datadir, the server needs to be also started with {{\-\-innodb-page-size=8K}}, other two options don't make a difference; otherwise all defaults.

{{ib_logfile}}-s are compressed and attached separately just to overcome the 10M limitation in JIRA. I don't know if they are needed, the crash happens with and without them.

Similar-looking crashes upon upgrade from 10.1 have also been observed before.
10.2 doesn't crash on the same datadir.

Ben Anderson added a comment - 2018-07-25 07:26

We're also seeing this issue under similar circumstances.

We've got a 10.0 data dir which should have been shutdown cleanly but may not have in certain situations. Installing mariadb-server 10.3 and running mysql_upgrade works fine, and the server will boot. If the server is running enough or a certain action is performed (i'm not sure which, don't know enough), a purge will start. This will cause the crash.

With some help from `dragonheart` on IRC, I installed debug symbols and gdb'd. Here is some clumsy GDB info that may help.

(gdb) up

#1  0x00007ffff5e7402a in __GI_abort () at abort.c:89

89  abort.c: No such file or directory.

(gdb) up

#2  0x00005555559f34e2 in ut_dbg_assertion_failed (

    expr=expr@entry=0x5555563c2098 "purge_sys.tail.commit <= purge_sys.rseg->last_commit",

    file=file@entry=0x5555563c1f68 "/home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/trx/trx0purge.cc",

    line=line@entry=121) at /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc:61

61  /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc: No such file or directory.

(gdb) p purge_sys.tail.commit

$1 = 127992

(gdb) purge_sys.rseg->last_commit

Undefined command: "purge_sys".  Try "help".

(gdb) p purge_sys.rseg->last_commit

$2 = 127991

I then did some reading around and noticed we had missed a step in the MariaDB upgrade notes that for one of the version (i've lost the page now, but it might be have been 10.0 -> 10.1) it recommends doing a shutdown with innodb_fast_shutdown=0 before doing the upgrade. I tried this, and booted back up with 10.3, ran the mysql_upgrade and then booted 10.3, left it running under some workloads (very minor) and it appears to be stable.

So - this is a case of user error, but I don't think MariaDB should be crashing quite this badly in this case. I'm keen to help debug any more if it means a better error message or something handled. I've watched this thread, if anyone needs more info.

Ben Anderson added a comment - 2018-07-25 07:26 We're also seeing this issue under similar circumstances. We've got a 10.0 data dir which should have been shutdown cleanly but may not have in certain situations. Installing mariadb-server 10.3 and running mysql_upgrade works fine, and the server will boot. If the server is running enough or a certain action is performed (i'm not sure which, don't know enough), a purge will start. This will cause the crash. With some help from `dragonheart` on IRC, I installed debug symbols and gdb'd. Here is some clumsy GDB info that may help. (gdb) up #1 0x00007ffff5e7402a in __GI_abort () at abort.c:89 89 abort.c: No such file or directory. (gdb) up #2 0x00005555559f34e2 in ut_dbg_assertion_failed ( expr=expr@entry=0x5555563c2098 "purge_sys.tail.commit <= purge_sys.rseg->last_commit", file=file@entry=0x5555563c1f68 "/home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/trx/trx0purge.cc", line=line@entry=121) at /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc:61 61 /home/buildbot/buildbot/build/mariadb-10.3.8/storage/innobase/ut/ut0dbg.cc: No such file or directory. (gdb) p purge_sys.tail.commit $1 = 127992 (gdb) purge_sys.rseg->last_commit Undefined command: "purge_sys". Try "help". (gdb) p purge_sys.rseg->last_commit $2 = 127991 I then did some reading around and noticed we had missed a step in the MariaDB upgrade notes that for one of the version (i've lost the page now, but it might be have been 10.0 -> 10.1) it recommends doing a shutdown with innodb_fast_shutdown=0 before doing the upgrade. I tried this, and booted back up with 10.3, ran the mysql_upgrade and then booted 10.3, left it running under some workloads (very minor) and it appears to be stable. So - this is a case of user error, but I don't think MariaDB should be crashing quite this badly in this case. I'm keen to help debug any more if it means a better error message or something handled. I've watched this thread, if anyone needs more info.

Marko Mäkelä added a comment - 2018-07-25 07:51

ben.anderson, MariaDB upgrade is supposed to work even if the undo logs are not empty. Tests for that were implemented in ~~MDEV-13269~~.

In MariaDB 10.3, there are two major changes to the undo log format.

~~MDEV-12288~~ eliminated the separate insert_undo log; the undo log records of a transaction will be written to a single persistent undo log, which was formerly called update_undo. On upgrade, old_insert logs of any incomplete transactions will be handled. There was a fix related to this in ~~MDEV-15370~~. Because insert_undo logs were never visible to purge, this change should not be causing the issue.
~~MDEV-15132~~ and ~~MDEV-15158~~ stopped writing anything to the TRX_SYS page during transaction commit. This could be what is causing the reported problem. We could be retrieving the maximum transaction ID in the wrong way.

There was also a minor change to the undo log format, in ~~MDEV-14717~~ (making RENAME TABLE crash-safe inside InnoDB; the .frm operations still are not crash-safe).

Note: A shutdown after SET GLOBAL innodb_fast_shutdown=0 may fail to empty some undo logs. This was fixed in MariaDB 10.3.6 only:
~~MDEV-13603~~ innodb_fast_shutdown=0 may fail to purge all history
It is not feasible to backport that to 10.1 or earlier. It might not be feasible to backport to 10.2 either.

Marko Mäkelä added a comment - 2018-07-25 07:51 ben.anderson , MariaDB upgrade is supposed to work even if the undo logs are not empty. Tests for that were implemented in MDEV-13269 . In MariaDB 10.3, there are two major changes to the undo log format. MDEV-12288 eliminated the separate insert_undo log; the undo log records of a transaction will be written to a single persistent undo log, which was formerly called update_undo . On upgrade, old_insert logs of any incomplete transactions will be handled. There was a fix related to this in MDEV-15370 . Because insert_undo logs were never visible to purge, this change should not be causing the issue. MDEV-15132 and MDEV-15158 stopped writing anything to the TRX_SYS page during transaction commit. This could be what is causing the reported problem. We could be retrieving the maximum transaction ID in the wrong way. There was also a minor change to the undo log format, in MDEV-14717 (making RENAME TABLE crash-safe inside InnoDB; the .frm operations still are not crash-safe). Note: A shutdown after SET GLOBAL innodb_fast_shutdown=0 may fail to empty some undo logs. This was fixed in MariaDB 10.3.6 only: MDEV-13603 innodb_fast_shutdown=0 may fail to purge all history It is not feasible to backport that to 10.1 or earlier. It might not be feasible to backport to 10.2 either.

Marko Mäkelä added a comment - 2018-07-25 10:14

In the mdev15912_data.bgz when the assertion fails, both purge_sys.rseg->last_commit and purge_sys.tail.commit are 340491, and purge_sys.tail.commit is one more, causing the assertion to fail. Here are some interesting messages:

10.3 a86a02a8447233940cc3b0f40e2b403691997807
2018-07-25 12:58:45 0 [Note] InnoDB: Trx id counter is 170241
…
2018-07-25 12:58:45 0 [Note] InnoDB: Rolled back recovered transaction 169859
2018-07-25 12:58:45 0 [Note] InnoDB: Rollback of non-prepared transactions completed
…
Version: '10.3.9-MariaDB-debug-log' socket: '/dev/shm/10.3/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution
2018-07-25 13:03:45 0x7fffe0c10700 InnoDB: Assertion failure in file /mariadb/10.3/storage/innobase/trx/trx0purge.cc line 121

If I start with innodb_force_recovery=3, then the server will not crash. But, this will not only prevent the rollback of recovered transactions but also the purge of old history (which is where the assertion would fail). With a source code modification we can only disable the rollback. This will let the server start up successfully:

diff --git a/storage/innobase/trx/trx0roll.cc b/storage/innobase/trx/trx0roll.cc

index c0864e40622..bd2e23716c4 100644

--- a/storage/innobase/trx/trx0roll.cc

+++ b/storage/innobase/trx/trx0roll.cc

@@ -862,7 +862,6 @@ DECLARE_THREAD(trx_rollback_all_recovered)(void*)

 	if (trx_sys.rw_trx_hash.size()) {

 		ib::info() << "Starting in background the rollback of"

 			" recovered transactions";

-		trx_rollback_recovered(true);

 		ib::info() << "Rollback of non-prepared transactions"

 			" completed";

So, the failure must be somehow related to the rollback of the recovered transaction.

Marko Mäkelä added a comment - 2018-07-25 10:14 In the mdev15912_data.bgz when the assertion fails, both purge_sys.rseg->last_commit and purge_sys.tail.commit are 340491, and purge_sys.tail.commit is one more, causing the assertion to fail. Here are some interesting messages: 10.3 a86a02a8447233940cc3b0f40e2b403691997807 2018-07-25 12:58:45 0 [Note] InnoDB: Trx id counter is 170241 … 2018-07-25 12:58:45 0 [Note] InnoDB: Rolled back recovered transaction 169859 2018-07-25 12:58:45 0 [Note] InnoDB: Rollback of non-prepared transactions completed … Version: '10.3.9-MariaDB-debug-log' socket: '/dev/shm/10.3/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution 2018-07-25 13:03:45 0x7fffe0c10700 InnoDB: Assertion failure in file /mariadb/10.3/storage/innobase/trx/trx0purge.cc line 121 If I start with innodb_force_recovery=3 , then the server will not crash. But, this will not only prevent the rollback of recovered transactions but also the purge of old history (which is where the assertion would fail). With a source code modification we can only disable the rollback. This will let the server start up successfully: diff --git a/storage/innobase/trx/trx0roll.cc b/storage/innobase/trx/trx0roll.cc index c0864e40622..bd2e23716c4 100644 --- a/storage/innobase/trx/trx0roll.cc +++ b/storage/innobase/trx/trx0roll.cc @@ -862,7 +862,6 @@ DECLARE_THREAD(trx_rollback_all_recovered)(void*) if (trx_sys.rw_trx_hash.size()) { ib::info() << "Starting in background the rollback of" " recovered transactions"; - trx_rollback_recovered(true); ib::info() << "Rollback of non-prepared transactions" " completed"; } So, the failure must be somehow related to the rollback of the recovered transaction.

Marko Mäkelä added a comment - 2018-07-25 11:35

ben.anderson reported:

purge_sys.tail.commit = 127992
purge_sys.rseg->last_commit = 127991

Also in mdev15912_data.bgz, the difference happens to be exactly 1:

purge_sys.tail.commit = 340492
purge_sys.rseg.last_commit = 340491

The two-component last_commit field (trx->no << 1 | !old_insert) was introduced in ~~MDEV-15370~~, in an attempt to relax this assertion. But, even if we ignore the low-order bit, the difference remains. It looks like the fix in ~~MDEV-15370~~ was incomplete. Purge should not need to process any old_insert logs; we only have to make sure that the undo log pages are made available for reuse.

Marko Mäkelä added a comment - 2018-07-25 11:35 ben.anderson reported: purge_sys.tail.commit = 127992 purge_sys.rseg->last_commit = 127991 Also in mdev15912_data.bgz , the difference happens to be exactly 1: purge_sys.tail.commit = 340492 purge_sys.rseg.last_commit = 340491 The two-component last_commit field ( trx->no << 1 | !old_insert ) was introduced in MDEV-15370 , in an attempt to relax this assertion. But, even if we ignore the low-order bit, the difference remains. It looks like the fix in MDEV-15370 was incomplete. Purge should not need to process any old_insert logs; we only have to make sure that the undo log pages are made available for reuse.

Marko Mäkelä made changes - 2018-07-25 11:35

Link

This issue is caused by ~~MDEV-15370~~ [ ~~MDEV-15370~~ ]

Marko Mäkelä added a comment - 2018-07-25 12:31

~~MDEV-13495~~ changed the logic so that the recovered old_insert undo logs would be moved to the purge queue. Before ~~MDEV-12288~~, the TRX_UNDO_TO_FREE state was used. I tried various things, but this turns out to be more difficult to fix than I thought. I would prefer to avoid reintroducing the TRX_UNDO_TO_FREE state.

diff --git a/storage/innobase/include/trx0rseg.h b/storage/innobase/include/trx0rseg.h

index dbd80486b71..794f4e66ca3 100644

--- a/storage/innobase/include/trx0rseg.h

+++ b/storage/innobase/include/trx0rseg.h

@@ -174,6 +174,8 @@ struct trx_rseg_t {

 	/** @return the commit ID of the last committed transaction */

 	trx_id_t last_trx_no() const { return last_commit >> 1; }

+	/** @return whether the last committed transaction is an update */

+	bool last_is_update() const { return bool(last_commit & 1); }

 	void set_last_trx_no(trx_id_t trx_no, bool is_update)

diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc

index 32e050e9b91..b3c6f41827e 100644

--- a/storage/innobase/trx/trx0purge.cc

+++ b/storage/innobase/trx/trx0purge.cc

@@ -118,9 +118,11 @@ inline bool TrxUndoRsegsIterator::set_next()

 	ut_ad(purge_sys.rseg->space->id == TRX_SYS_SPACE

 	      || srv_is_undo_tablespace(purge_sys.rseg->space->id));

-	ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit);

+	if (purge_sys.rseg->last_is_update()) {

+		ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit);

+		purge_sys.tail.commit = purge_sys.rseg->last_commit;

+	}

-	purge_sys.tail.commit = purge_sys.rseg->last_commit;

 	purge_sys.hdr_offset = purge_sys.rseg->last_offset;

 	purge_sys.hdr_page_no = purge_sys.rseg->last_page_no;

@@ -307,10 +309,13 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr)

 				 MLOG_2BYTES, mtr);

-	if (rseg->last_page_no == FIL_NULL) {

+	const bool is_update = undo == trx->rsegs.m_redo.undo;

+	if (rseg->last_page_no == FIL_NULL

+	    || (is_update && !rseg->last_is_update())) {

 		rseg->last_page_no = undo->hdr_page_no;

 		rseg->last_offset = undo->hdr_offset;

-		rseg->set_last_trx_no(trx->no, undo == trx->rsegs.m_redo.undo);

+		rseg->set_last_trx_no(trx->no, is_update);

 		rseg->needs_purge = true;

@@ -1048,9 +1053,11 @@ static void trx_purge_truncate_history()

 	purge_sys_t::iterator& head = purge_sys.head.commit

 		? purge_sys.head : purge_sys.tail;

-	if (head.trx_no() >= purge_sys.view.low_limit_no()) {

+	const trx_id_t low_limit = purge_sys.view.low_limit_no();

+	if (head.trx_no() >= low_limit) {

 		/* This is sometimes necessary. TODO: find out why. */

-		head.reset_trx_no(purge_sys.view.low_limit_no());

+		ib::info() << "reset " << head.trx_no() << " to " << low_limit;

+		head.reset_trx_no(low_limit);

 		head.undo_no = 0;

The last hunk is not emitting any message, and the dataset is still crashing. (Note: it is necessary to restore the data directory from mdev15912_data.bgz before every startup attempt, because the rollback of the recovered transaction would change the persistent undo log data structures, triggering the assertion failure on every subsequent startup even if the code for the rollback code path was later fixed.)

Marko Mäkelä added a comment - 2018-07-25 12:31 MDEV-13495 changed the logic so that the recovered old_insert undo logs would be moved to the purge queue. Before MDEV-12288 , the TRX_UNDO_TO_FREE state was used. I tried various things, but this turns out to be more difficult to fix than I thought. I would prefer to avoid reintroducing the TRX_UNDO_TO_FREE state. diff --git a/storage/innobase/include/trx0rseg.h b/storage/innobase/include/trx0rseg.h index dbd80486b71..794f4e66ca3 100644 --- a/storage/innobase/include/trx0rseg.h +++ b/storage/innobase/include/trx0rseg.h @@ -174,6 +174,8 @@ struct trx_rseg_t { /** @return the commit ID of the last committed transaction */ trx_id_t last_trx_no() const { return last_commit >> 1; } + /** @return whether the last committed transaction is an update */ + bool last_is_update() const { return bool(last_commit & 1); } void set_last_trx_no(trx_id_t trx_no, bool is_update) { diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index 32e050e9b91..b3c6f41827e 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -118,9 +118,11 @@ inline bool TrxUndoRsegsIterator::set_next() ut_ad(purge_sys.rseg->space->id == TRX_SYS_SPACE || srv_is_undo_tablespace(purge_sys.rseg->space->id)); - ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit); + if (purge_sys.rseg->last_is_update()) { + ut_a(purge_sys.tail.commit <= purge_sys.rseg->last_commit); + purge_sys.tail.commit = purge_sys.rseg->last_commit; + } - purge_sys.tail.commit = purge_sys.rseg->last_commit; purge_sys.hdr_offset = purge_sys.rseg->last_offset; purge_sys.hdr_page_no = purge_sys.rseg->last_page_no; @@ -307,10 +309,13 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr) MLOG_2BYTES, mtr); } - if (rseg->last_page_no == FIL_NULL) { + const bool is_update = undo == trx->rsegs.m_redo.undo; + + if (rseg->last_page_no == FIL_NULL + || (is_update && !rseg->last_is_update())) { rseg->last_page_no = undo->hdr_page_no; rseg->last_offset = undo->hdr_offset; - rseg->set_last_trx_no(trx->no, undo == trx->rsegs.m_redo.undo); + rseg->set_last_trx_no(trx->no, is_update); rseg->needs_purge = true; } @@ -1048,9 +1053,11 @@ static void trx_purge_truncate_history() purge_sys_t::iterator& head = purge_sys.head.commit ? purge_sys.head : purge_sys.tail; - if (head.trx_no() >= purge_sys.view.low_limit_no()) { + const trx_id_t low_limit = purge_sys.view.low_limit_no(); + if (head.trx_no() >= low_limit) { /* This is sometimes necessary. TODO: find out why. */ - head.reset_trx_no(purge_sys.view.low_limit_no()); + ib::info() << "reset " << head.trx_no() << " to " << low_limit; + head.reset_trx_no(low_limit); head.undo_no = 0; } The last hunk is not emitting any message, and the dataset is still crashing. (Note: it is necessary to restore the data directory from mdev15912_data.bgz before every startup attempt, because the rollback of the recovered transaction would change the persistent undo log data structures, triggering the assertion failure on every subsequent startup even if the code for the rollback code path was later fixed.)

Marko Mäkelä made changes - 2018-07-25 12:31

Link

This issue is caused by ~~MDEV-12288~~ [ ~~MDEV-12288~~ ]

Marko Mäkelä made changes - 2018-07-25 12:31

Status

Open [ 1 ]

Confirmed [ 10101 ]

Marko Mäkelä made changes - 2018-07-25 12:32

Labels

compat56 compat57 regression upgrade

Marko Mäkelä added a comment - 2018-08-31 08:17

We may have to introduce trx_rseg_t::old_insert_cached and some separate mechanism that will guarantee that old insert_undo pages from before the upgrade to 10.3 (or later) will eventually be freed, without interfering with the purge of transaction history.

Marko Mäkelä added a comment - 2018-08-31 08:17 We may have to introduce trx_rseg_t::old_insert_cached and some separate mechanism that will guarantee that old insert_undo pages from before the upgrade to 10.3 (or later) will eventually be freed, without interfering with the purge of transaction history.

Marko Mäkelä made changes - 2019-01-18 05:00

Link

This issue is duplicated by ~~MDEV-15464~~ [ ~~MDEV-15464~~ ]

Elena Stepanova made changes - 2019-03-05 23:07

Labels

compat56 compat57 regression upgrade

affects-tests compat56 compat57 regression upgrade

Marko Mäkelä made changes - 2019-03-19 07:48

Link

This issue relates to ~~MDEV-18960~~ [ ~~MDEV-18960~~ ]

Marko Mäkelä made changes - 2019-03-19 08:51

Link

This issue relates to ~~MDEV-18966~~ [ ~~MDEV-18966~~ ]

Marko Mäkelä made changes - 2019-03-19 08:52

Link

This issue relates to ~~MDEV-18960~~ [ ~~MDEV-18960~~ ]

Marko Mäkelä made changes - 2019-03-26 15:56

Fix Version/s		10.4 [ 22408 ]
Affects Version/s		10.4 [ 22408 ]

Marko Mäkelä made changes - 2019-03-26 16:29

Link

This issue relates to ~~MDEV-18454~~ [ ~~MDEV-18454~~ ]

Eugene Kosov (Inactive) made changes - 2019-09-17 14:50

Assignee

Marko Mäkelä [ marko ]

Eugene Kosov [ kevg ]

Marko Mäkelä made changes - 2020-10-27 13:14

Link

This issue relates to ~~MDEV-16952~~ [ ~~MDEV-16952~~ ]

Julien Fritsch made changes - 2020-11-18 08:02

Priority

Major [ 3 ]

Critical [ 2 ]

Julien Fritsch made changes - 2020-11-25 08:56

Priority

Critical [ 2 ]

Minor [ 4 ]

Marko Mäkelä added a comment - 2021-05-25 18:37

As described in ~~MDEV-23755~~, we can prevent these failures by refusing InnoDB startup if any incomplete transactions have undo logs in the pre-~~MDEV-12288~~ format, with an error message saying that the older server version must be shut down cleanly, without using innodb_force_recovery=3.

It might be possible to remove the old_insert data members altogether.

Marko Mäkelä added a comment - 2021-05-25 18:37 As described in MDEV-23755 , we can prevent these failures by refusing InnoDB startup if any incomplete transactions have undo logs in the pre- MDEV-12288 format, with an error message saying that the older server version must be shut down cleanly, without using innodb_force_recovery=3 . It might be possible to remove the old_insert data members altogether.

Marko Mäkelä made changes - 2021-05-25 18:37

Assignee	Eugene Kosov [ kevg ]	Marko Mäkelä [ marko ]
Priority	Minor [ 4 ]	Major [ 3 ]

Marko Mäkelä made changes - 2021-06-13 13:52

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Marko Mäkelä added a comment - 2021-06-21 13:33

We will refuse an upgrade from older versions than MariaDB 10.3 if a clean shutdown was not performed, so that the undo logs would be emptied:

2021-06-13 16:42:45 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown

There will be only one persistent undo log per transaction. No main-memory data structures related to the partitioned persistent undo log before ~~MDEV-12288~~ are retained.

Marko Mäkelä added a comment - 2021-06-21 13:33 We will refuse an upgrade from older versions than MariaDB 10.3 if a clean shutdown was not performed, so that the undo logs would be emptied: 2021-06-13 16:42:45 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown There will be only one persistent undo log per transaction. No main-memory data structures related to the partitioned persistent undo log before MDEV-12288 are retained.

Marko Mäkelä made changes - 2021-06-21 13:33

issue.field.resolutiondate

2021-06-21 13:33:54.0

2021-06-21 13:33:54.993

Marko Mäkelä made changes - 2021-06-21 13:34

Fix Version/s		10.6.3 [ 25904 ]
Fix Version/s		10.3.30 [ 25732 ]
Fix Version/s		10.4.20 [ 25733 ]
Fix Version/s		10.5.11 [ 25734 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Hans Borresen added a comment - 2021-06-21 21:03 - edited

It looks like this just got pulled into the 10.3 branch, but it breaks upgrades for anyone trying to go from 5.7 -> 10.3.30, or even 5.7 -> 10.3.29 -> 10.3.30

I started a topic for this in the zulip chat.

Log snippet, just in case it is helpful:

Version: '10.3.29-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server

2021-06-21 20:58:29 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown

...

2021-06-21 20:58:29 0 [Note] InnoDB: FTS optimize thread exiting.

2021-06-21 20:58:29 0 [Note] InnoDB: Starting shutdown...

2021-06-21 20:58:29 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool

2021-06-21 20:58:29 0 [Note] InnoDB: Buffer pool(s) dump completed at 210621 20:58:29

2021-06-21 20:58:31 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

2021-06-21 20:58:31 0 [Note] InnoDB: Shutdown completed; log sequence number 3281751; transaction id 3852

2021-06-21 20:58:31 0 [Note] /usr/sbin/mysqld: Shutdown complete

...

(after updating to 10.3.30 via yum)

2021-06-21 20:58:31 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().

2021-06-21 20:58:31 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown

2021-06-21 20:58:31 0 [Note] InnoDB: Retry with innodb_force_recovery=5

2021-06-21 20:58:31 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption

2021-06-21 20:58:31 0 [Note] InnoDB: Starting shutdown...

2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' init function returned error.

2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

2021-06-21 20:58:31 0 [Note] Plugin 'FEEDBACK' is disabled.

2021-06-21 20:58:31 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded

2021-06-21 20:58:31 0 [ERROR] Unknown/unsupported storage engine: InnoDB

Edit: I went ahead and filed ~~MDEV-25981~~ for this issue.

Hans Borresen added a comment - 2021-06-21 21:03 - edited It looks like this just got pulled into the 10.3 branch, but it breaks upgrades for anyone trying to go from 5.7 -> 10.3.30, or even 5.7 -> 10.3.29 -> 10.3.30 I started a topic for this in the zulip chat. Log snippet, just in case it is helpful: Version: '10.3.29-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server 2021-06-21 20:58:29 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown ... 2021-06-21 20:58:29 0 [Note] InnoDB: FTS optimize thread exiting. 2021-06-21 20:58:29 0 [Note] InnoDB: Starting shutdown... 2021-06-21 20:58:29 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2021-06-21 20:58:29 0 [Note] InnoDB: Buffer pool(s) dump completed at 210621 20:58:29 2021-06-21 20:58:31 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2021-06-21 20:58:31 0 [Note] InnoDB: Shutdown completed; log sequence number 3281751; transaction id 3852 2021-06-21 20:58:31 0 [Note] /usr/sbin/mysqld: Shutdown complete ... (after updating to 10.3.30 via yum) 2021-06-21 20:58:31 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2021-06-21 20:58:31 0 [ERROR] InnoDB: upgrade from older version than MariaDB 10.3 requires clean shutdown 2021-06-21 20:58:31 0 [Note] InnoDB: Retry with innodb_force_recovery=5 2021-06-21 20:58:31 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption 2021-06-21 20:58:31 0 [Note] InnoDB: Starting shutdown... 2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' init function returned error. 2021-06-21 20:58:31 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2021-06-21 20:58:31 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-06-21 20:58:31 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded 2021-06-21 20:58:31 0 [ERROR] Unknown/unsupported storage engine: InnoDB Edit: I went ahead and filed MDEV-25981 for this issue.

Hans Borresen made changes - 2021-06-22 00:32

Link

This issue causes ~~MDEV-25981~~ [ ~~MDEV-25981~~ ]

Marko Mäkelä added a comment - 2021-06-23 06:08

hborresen, thank you. I should have actually tested the upgrade. As I expected, fixing ~~MDEV-25981~~ involved only minor changes to the validation logic.

Marko Mäkelä added a comment - 2021-06-23 06:08 hborresen , thank you. I should have actually tested the upgrade. As I expected, fixing MDEV-25981 involved only minor changes to the validation logic.

Marko Mäkelä made changes - 2021-08-23 13:59

Link

This issue causes ~~MDEV-26465~~ [ ~~MDEV-26465~~ ]

Sergei Golubchik made changes - 2021-12-06 21:47

Workflow

MariaDB v3 [ 86623 ]

MariaDB v4 [ 154179 ]

Marko Mäkelä made changes - 2022-01-14 13:44

Link

This issue relates to TODO-3302 [ TODO-3302 ]

Marko Mäkelä made changes - 2022-01-14 15:04

Link

This issue relates to ~~MDEV-27437~~ [ ~~MDEV-27437~~ ]

Marko Mäkelä made changes - 2022-02-17 15:38

Link

This issue relates to ~~MDEV-27800~~ [ ~~MDEV-27800~~ ]

Marko Mäkelä made changes - 2022-09-06 12:07

Link

This issue relates to ~~MDEV-29475~~ [ ~~MDEV-29475~~ ]

Rob Schwyzer (Inactive) made changes - 2023-09-27 20:17

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36003 ]

People

Assignee:: Marko Mäkelä

Reporter:: Elena Stepanova

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2018-04-17 23:03

Updated:: 2024-03-23 18:37

Resolved:: 2021-06-21 13:33

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.