[MDEV-13267] At startup with crash recovery: mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.2
Fix Version/s: 10.2.7, 10.3.1
Component/s: Storage Engine - InnoDB, Tests
Labels:
None

Description

http://buildbot.askmonty.org/buildbot/builders/qa-kvm-linux/builds/1658/steps/stable_tests/logs/stdio

tokudb_bugs.xa-3 'innodb'                w2 [ fail ]

        Test ended at 2017-07-05 13:24:51

CURRENT_TEST: tokudb_bugs.xa-3

Server [mysqld.1 - pid: 2997, winpid: 2997, exit: 256] failed during test run

Server log from this test:

----------SERVER LOG START-----------

SIGKILL myself

2017-07-05 13:24:49 140090411992896 [Note] /home/buildbot/build/sql/mysqld (mysqld 10.2.7-MariaDB-debug-log) starting as process 2998 ...

2017-07-05 13:24:49 140090411992896 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)

2017-07-05 13:24:49 140090411992896 [Note] Plugin 'SEQUENCE' is disabled.

Wed Jul  5 13:24:49 2017 PerconaFT recovery starting in env /home/buildbot/build/mysql-test/var/2/mysqld.1/data/

Wed Jul  5 13:24:49 2017 PerconaFT recovery scanning backward from 545

Wed Jul  5 13:24:49 2017 PerconaFT recovery bw_end_checkpoint at 12 timestamp 1499275474701350 xid 8 (bw_newer)

Wed Jul  5 13:24:49 2017 PerconaFT recovery bw_begin_checkpoint at 8 timestamp 1499275474700634 (bw_between)

Wed Jul  5 13:24:49 2017 PerconaFT recovery turning around at begin checkpoint 8 time 716

Wed Jul  5 13:24:49 2017 PerconaFT recovery starts scanning forward to 545 from 8 left 537 (fw_between)

Wed Jul  5 13:24:49 2017 PerconaFT recovery closing 2 dictionaries

Wed Jul  5 13:24:49 2017 PerconaFT recovery making a checkpoint

Wed Jul  5 13:24:49 2017 PerconaFT recovery done

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Uses event mutexes

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Compressed tables use zlib 1.2.8

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Using Linux native AIO

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Number of pools: 1

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Using generic crc32 instructions

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Completed initialization of buffer pool

2017-07-05 13:24:49 140089241872128 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Highest supported file format is Barracuda.

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2785956

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 14, since the redo log references ./test/t1.ibd with space ID 7.

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 14, since the redo log references ./test/t1.ibd with space ID 13.

2017-07-05 13:24:49 140090411992896 [Note] InnoDB: Starting a batch to recover 179 pages from redo log.

mysqld: /home/buildbot/build/storage/innobase/mtr/mtr0mtr.cc:603: void mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed.

170705 13:24:50 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.2.7-MariaDB-debug-log

key_buffer_size=1048576

read_buffer_size=131072

max_used_connections=0

max_threads=153

thread_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63068 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

/home/buildbot/build/sql/mysqld(my_print_stacktrace+0x38)[0x7f6956b6c3c8]

/home/buildbot/build/sql/mysqld(handle_fatal_signal+0x3a3)[0x7f6956404810]

/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f69557bd890]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f6953af3067]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f6953af4448]

/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f6953aec266]

/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f6953aec312]

/home/buildbot/build/sql/mysqld(+0xac51ae)[0x7f69566b31ae]

/home/buildbot/build/sql/mysqld(+0xd21f62)[0x7f695690ff62]

/home/buildbot/build/sql/mysqld(+0xab8f30)[0x7f69566a6f30]

/home/buildbot/build/sql/mysqld(+0xbcef01)[0x7f69567bcf01]

/home/buildbot/build/sql/mysqld(+0xa0f039)[0x7f69565fd039]

/home/buildbot/build/sql/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0xfc)[0x7f6956406849]

mysys/stacktrace.c:267(my_print_stacktrace)[0x7f69561b14f8]

/home/buildbot/build/sql/mysqld(_Z11plugin_initPiPPci+0x912)[0x7f69561b2101]

sql/signal_handler.cc:168(handle_fatal_signal)[0x7f69560bef35]

/home/buildbot/build/sql/mysqld(_Z11mysqld_mainiPPc+0x5f4)[0x7f69560bffba]

fil/fil0fil.cc:6580(fil_names_clear(unsigned long, bool))[0x7f69560b4ee0]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f6953adfb45]

log/log0recv.cc:3351(recv_recovery_from_checkpoint_start(unsigned long))[0x7f69560b4db9]

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

Writing a core file

----------SERVER LOG END-------------

mysqltest failed but provided no output

The result from queries just before the failure was:

< snip >

drop table if exists t1, t2;

CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=innodb;

CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=tokudb;

BEGIN;

SELECT @@log_bin;

@@log_bin

INSERT INTO t1 VALUES (1);

INSERT INTO t2 VALUES (2);

SET SESSION debug_dbug="d,crash_commit_before";

COMMIT;

ERROR HY000: Lost connection to MySQL server during query

 - saving '/home/buildbot/build/mysql-test/var/2/log/tokudb_bugs.xa-3-innodb/' to '/home/buildbot/build/mysql-test/var/log/tokudb_bugs.xa-3-innodb/'

worker[2] > Restart  - not started

***Warnings generated in error logs during shutdown after running tests: tokudb_bugs.xa-3

I also had the same assertion failure with tokudb_bugs.xa-4 on a local debug build.

Attachments

Activity

Marko Mäkelä added a comment - 2017-07-07 15:12

There were no server logs for the reported failure. Luckily, I ran into this very same assertion failure when testing a merge into 10.3 locally. For me, it failed for two tests that performed InnoDB crash recovery:
innodb.innodb-alter-tempfile
gcol.gcol_rollback

In all the 3 crashes (the first test also with innodb_page_size=4k), the stack trace was like this:

#8  0x000055fa2f7d41f6 in mtr_t::commit_checkpoint (this=0x7ffe265635f0,

    checkpoint_lsn=732924, write_mlog_checkpoint=true)

    at /mariadb/10.3/storage/innobase/mtr/mtr0mtr.cc:603

#9  0x000055fa2fa5ab92 in fil_names_clear (lsn=732924, do_write=true)

    at /mariadb/10.3/storage/innobase/fil/fil0fil.cc:6580

#10 0x000055fa2f7c6396 in recv_recovery_from_checkpoint_start (

    flush_lsn=732225) at /mariadb/10.3/storage/innobase/log/log0recv.cc:3412

#11 0x000055fa2f8f2d99 in innobase_start_or_create_for_mysql ()

    at /mariadb/10.3/storage/innobase/srv/srv0start.cc:2197

When it comes to the MariaDB server, this is a bogus debug assertion failure. While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME records to guarantee that if the server is killed during or soon after the crash recovery, the subsequent crash recovery will succeed.

The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every dirty tablespace that was found.

When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups.

Marko Mäkelä added a comment - 2017-07-07 15:12 There were no server logs for the reported failure. Luckily, I ran into this very same assertion failure when testing a merge into 10.3 locally. For me, it failed for two tests that performed InnoDB crash recovery: innodb.innodb-alter-tempfile gcol.gcol_rollback In all the 3 crashes (the first test also with innodb_page_size=4k), the stack trace was like this: #8 0x000055fa2f7d41f6 in mtr_t::commit_checkpoint (this=0x7ffe265635f0, checkpoint_lsn=732924, write_mlog_checkpoint=true) at /mariadb/10.3/storage/innobase/mtr/mtr0mtr.cc:603 #9 0x000055fa2fa5ab92 in fil_names_clear (lsn=732924, do_write=true) at /mariadb/10.3/storage/innobase/fil/fil0fil.cc:6580 #10 0x000055fa2f7c6396 in recv_recovery_from_checkpoint_start ( flush_lsn=732225) at /mariadb/10.3/storage/innobase/log/log0recv.cc:3412 #11 0x000055fa2f8f2d99 in innobase_start_or_create_for_mysql () at /mariadb/10.3/storage/innobase/srv/srv0start.cc:2197 When it comes to the MariaDB server, this is a bogus debug assertion failure. While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME records to guarantee that if the server is killed during or soon after the crash recovery, the subsequent crash recovery will succeed. The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every dirty tablespace that was found. When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups.

People

Assignee:: Marko Mäkelä

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2017-07-06 14:24

Updated:: 2017-07-07 15:45

Resolved:: 2017-07-07 15:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server