[MDEV-5863] "Lock wait timeout exceeded", lost events, Assertion `!table || !table->in_use || ta ble->in_use == _current_thd()' failure upon parallel replicating - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 10.0.10
Component/s: None
Labels:
None

Description

The attached binary log was created by a fair concurrent test on master, without any manual editing.
In case it relevant, the master server was running with the following options:

10.0/sql/mysqld --no-defaults --basedir=10.0 --datadir=/home/elenst/test_results/analyze2/data --lc-messages-dir=10.0/sql/share --character-sets-dir=10.0/sql/share/charsets --tmpdir=/home/elenst/test_results/analyze2/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=/home/elenst/test_results/analyze2/mysql.sock --pid-file=/home/elenst/test_results/analyze2/mysql.pid --general-log --general-log-file=/home/elenst/test_results/analyze2/mysql.log --server_id=1 --report-host=127.0.0.1 --report_port=10730 --innodb --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --ignore-builtin-innodb --plugin-load=ha_innodb.so --slave-skip-errors=1054,1317,1049,1305,1539,1505 --log-output=FILE --slave_parallel_threads=16 --slave_parallel_max_queued=4194304

Despite slave-specific options, it was not configured as a slave and was not replicating from anywhere.

Now I'm feeding this binary log to a slave running with slave-parallel-threads=16 (no other special options either on master or on slave is necessary).

I'm getting several problems.

Replication failure

Initially replication breaks with

140314 18:04:40 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'INSERT INTO `view_A` ( `pk` ) VALUES (NULL)', Internal MariaDB error code: 1205

140314 18:04:40 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205

140314 18:04:40 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 312054

It happens every time, in the same place, does not seem to be a race condition, probably some logic specific to parallel replication.

Data discrepancy

After the previous failure, I simply issue START SLAVE again, and replication does not fail anymore.
But when I check the contents of view_A, it turns out that it only has 4 rows while the master has 30+ (either the master on the original datadir, or after feeding the binlog into a new one via MySQL client).
It also happens every time.

Assertion failure

When slave reports that it has finished replication, I attempt to shut down the server, and get the assertion failure upon shutdown:

mysqld: 10.0/sql/handler.cc:2381: THD* handler::ha_thd() const: Assertion `!table || !table->in_use || table->in_use == _current_thd()' failed.

140314 18:12:47 [ERROR] mysqld got signal 6 ;

#5  0x00007f0a9a4b46f0 in *__GI_abort () at abort.c:92

#6  0x00007f0a9a4aa621 in *__GI___assert_fail (assertion=0xfdbc48 "!table || !table->in_use || table->in_use == _current_thd()", file=<optimized out>, line=2381, function=0xfde5c0 "THD* handler::ha_thd() const") at assert.c:81

#7  0x000000000085f648 in handler::ha_thd (this=0x7f0a6d62f088) at 10.0/sql/handler.cc:2381

#8  0x0000000000b74035 in ha_innobase::extra (this=0x7f0a6d62f088, operation=HA_EXTRA_DETACH_CHILDREN) at 10.0/storage/xtradb/handler/ha_innodb.cc:12762

#9  0x000000000060fc14 in mark_tmp_table_for_reuse (table=0x7f0a7121c070) at 10.0/sql/sql_base.cc:696

#10 0x000000000060faf2 in mark_temp_tables_as_free_for_reuse (thd=0x7f0a6bd11070) at 10.0/sql/sql_base.cc:663

#11 0x0000000000610224 in close_thread_tables (thd=0x7f0a6bd11070) at 10.0/sql/sql_base.cc:943

#12 0x000000000077d1e1 in rpl_group_info::slave_close_thread_tables (this=0x7f0a6bcde870, thd=0x7f0a6bd11070) at 10.0/sql/rpl_rli.cc:1699

#13 0x000000000077ced0 in rpl_group_info::cleanup_context (this=0x7f0a6bcde870, thd=0x7f0a6bd11070, error=true) at 10.0/sql/rpl_rli.cc:1611

#14 0x00000000005e3381 in handle_slave_sql (arg=0x7f0a7c3ad070) at 10.0/sql/slave.cc:4594

#15 0x0000000000a2da58 in pfs_spawn_thread (arg=0x7f0a72887ef0) at 10.0/storage/perfschema/pfs.cc:1853

#16 0x00007f0a9c04eb50 in start_thread (arg=<optimized out>) at pthread_create.c:304

#17 0x00007f0a9a559a7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Stack trace from:

revision-id: monty@mariadb.org-20140313144311-hmphz2gup3qdjlp9

date: 2014-03-13 16:43:11 +0200

build-date: 2014-03-14 19:43:28 +0400

revno: 4051

branch-nick: 10.0

Server where the binary log was created

revision-id: svoj@mariadb.org-20140307074306-blf3begv8ee0tb5j

date: 2014-03-07 11:43:06 +0400

build-date: 2014-03-14 17:43:49 +0200

revno: 4045

branch-nick: 10.0

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mysql-bin.000001.gz
90 kB
2014-03-14 17:44

Issue Links

relates to

MDEV-5914 Parallel replication deadlock due to InnoDB lock conflicts

Closed

MDEV-5921 In parallel replication, an error is not correctly signalled to the next transaction

Closed

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2014-03-14 17:44

Updated:: 2014-03-21 14:54

Resolved:: 2014-03-21 14:54

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration