[MDEV-13267] At startup with crash recovery: mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed Created: 2017-07-06  Updated: 2017-07-07  Resolved: 2017-07-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.2.2
Fix Version/s: 10.2.7, 10.3.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None


 Description   

http://buildbot.askmonty.org/buildbot/builders/qa-kvm-linux/builds/1658/steps/stable_tests/logs/stdio

tokudb_bugs.xa-3 'innodb'                w2 [ fail ]
        Test ended at 2017-07-05 13:24:51
 
CURRENT_TEST: tokudb_bugs.xa-3
 
 
Server [mysqld.1 - pid: 2997, winpid: 2997, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
SIGKILL myself
2017-07-05 13:24:49 140090411992896 [Note] /home/buildbot/build/sql/mysqld (mysqld 10.2.7-MariaDB-debug-log) starting as process 2998 ...
2017-07-05 13:24:49 140090411992896 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
2017-07-05 13:24:49 140090411992896 [Note] Plugin 'SEQUENCE' is disabled.
Wed Jul  5 13:24:49 2017 PerconaFT recovery starting in env /home/buildbot/build/mysql-test/var/2/mysqld.1/data/
Wed Jul  5 13:24:49 2017 PerconaFT recovery scanning backward from 545
Wed Jul  5 13:24:49 2017 PerconaFT recovery bw_end_checkpoint at 12 timestamp 1499275474701350 xid 8 (bw_newer)
Wed Jul  5 13:24:49 2017 PerconaFT recovery bw_begin_checkpoint at 8 timestamp 1499275474700634 (bw_between)
Wed Jul  5 13:24:49 2017 PerconaFT recovery turning around at begin checkpoint 8 time 716
Wed Jul  5 13:24:49 2017 PerconaFT recovery starts scanning forward to 545 from 8 left 537 (fw_between)
Wed Jul  5 13:24:49 2017 PerconaFT recovery closing 2 dictionaries
Wed Jul  5 13:24:49 2017 PerconaFT recovery making a checkpoint
Wed Jul  5 13:24:49 2017 PerconaFT recovery done
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Uses event mutexes
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Using Linux native AIO
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Number of pools: 1
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Using generic crc32 instructions
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Completed initialization of buffer pool
2017-07-05 13:24:49 140089241872128 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Highest supported file format is Barracuda.
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2785956
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 14, since the redo log references ./test/t1.ibd with space ID 7.
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 14, since the redo log references ./test/t1.ibd with space ID 13.
2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Starting a batch to recover 179 pages from redo log.
mysqld: /home/buildbot/build/storage/innobase/mtr/mtr0mtr.cc:603: void mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed.
170705 13:24:50 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.7-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63068 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/home/buildbot/build/sql/mysqld(my_print_stacktrace+0x38)[0x7f6956b6c3c8]
/home/buildbot/build/sql/mysqld(handle_fatal_signal+0x3a3)[0x7f6956404810]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f69557bd890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f6953af3067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f6953af4448]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f6953aec266]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f6953aec312]
/home/buildbot/build/sql/mysqld(+0xac51ae)[0x7f69566b31ae]
/home/buildbot/build/sql/mysqld(+0xd21f62)[0x7f695690ff62]
/home/buildbot/build/sql/mysqld(+0xab8f30)[0x7f69566a6f30]
/home/buildbot/build/sql/mysqld(+0xbcef01)[0x7f69567bcf01]
/home/buildbot/build/sql/mysqld(+0xa0f039)[0x7f69565fd039]
/home/buildbot/build/sql/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0xfc)[0x7f6956406849]
mysys/stacktrace.c:267(my_print_stacktrace)[0x7f69561b14f8]
/home/buildbot/build/sql/mysqld(_Z11plugin_initPiPPci+0x912)[0x7f69561b2101]
sql/signal_handler.cc:168(handle_fatal_signal)[0x7f69560bef35]
/home/buildbot/build/sql/mysqld(_Z11mysqld_mainiPPc+0x5f4)[0x7f69560bffba]
fil/fil0fil.cc:6580(fil_names_clear(unsigned long, bool))[0x7f69560b4ee0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f6953adfb45]
log/log0recv.cc:3351(recv_recovery_from_checkpoint_start(unsigned long))[0x7f69560b4db9]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
----------SERVER LOG END-------------
mysqltest failed but provided no output
The result from queries just before the failure was:
< snip >
drop table if exists t1, t2;
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=innodb;
CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=tokudb;
BEGIN;
SELECT @@log_bin;
@@log_bin
0
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (2);
SET SESSION debug_dbug="d,crash_commit_before";
COMMIT;
ERROR HY000: Lost connection to MySQL server during query
 
 
 
 - saving '/home/buildbot/build/mysql-test/var/2/log/tokudb_bugs.xa-3-innodb/' to '/home/buildbot/build/mysql-test/var/log/tokudb_bugs.xa-3-innodb/'
worker[2] > Restart  - not started
***Warnings generated in error logs during shutdown after running tests: tokudb_bugs.xa-3

I also had the same assertion failure with tokudb_bugs.xa-4 on a local debug build.



 Comments   
Comment by Marko Mäkelä [ 2017-07-07 ]

There were no server logs for the reported failure. Luckily, I ran into this very same assertion failure when testing a merge into 10.3 locally. For me, it failed for two tests that performed InnoDB crash recovery:
innodb.innodb-alter-tempfile
gcol.gcol_rollback

In all the 3 crashes (the first test also with innodb_page_size=4k), the stack trace was like this:

#8  0x000055fa2f7d41f6 in mtr_t::commit_checkpoint (this=0x7ffe265635f0, 
    checkpoint_lsn=732924, write_mlog_checkpoint=true)
    at /mariadb/10.3/storage/innobase/mtr/mtr0mtr.cc:603
#9  0x000055fa2fa5ab92 in fil_names_clear (lsn=732924, do_write=true)
    at /mariadb/10.3/storage/innobase/fil/fil0fil.cc:6580
#10 0x000055fa2f7c6396 in recv_recovery_from_checkpoint_start (
    flush_lsn=732225) at /mariadb/10.3/storage/innobase/log/log0recv.cc:3412
#11 0x000055fa2f8f2d99 in innobase_start_or_create_for_mysql ()
    at /mariadb/10.3/storage/innobase/srv/srv0start.cc:2197

When it comes to the MariaDB server, this is a bogus debug assertion failure. While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME records to guarantee that if the server is killed during or soon after the crash recovery, the subsequent crash recovery will succeed.

The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every dirty tablespace that was found.

When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups.

Generated at Thu Feb 08 08:04:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.