[MDEV-5863] "Lock wait timeout exceeded", lost events, Assertion `!table || !table->in_use || ta ble->in_use == _current_thd()' failure upon parallel replicating Created: 2014-03-14 Updated: 2014-03-21 Resolved: 2014-03-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 10.0.10 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Elena Stepanova | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
The attached binary log was created by a fair concurrent test on master, without any manual editing.
Despite slave-specific options, it was not configured as a slave and was not replicating from anywhere. Now I'm feeding this binary log to a slave running with slave-parallel-threads=16 (no other special options either on master or on slave is necessary). I'm getting several problems. Replication failure Initially replication breaks with
It happens every time, in the same place, does not seem to be a race condition, probably some logic specific to parallel replication. Data discrepancy After the previous failure, I simply issue START SLAVE again, and replication does not fail anymore. Assertion failure When slave reports that it has finished replication, I attempt to shut down the server, and get the assertion failure upon shutdown:
Stack trace from:
Server where the binary log was created
|
| Comments |
| Comment by Elena Stepanova [ 2014-03-17 ] |
|
I modified the command line in description, it was mistakenly copied from the slave startup. All options are the same, and the disclaimer about useless slave options applies, only server_id, port and datadir location were different – those I fixed to avoid confusion. |
| Comment by Kristian Nielsen [ 2014-03-18 ] |
|
I was able to reproduce with the supplied binlog file in a mysql-test-run test case |
| Comment by Sergey Vojtovich [ 2014-03-18 ] |
|
Very likely has something to do with my table cache patches. |
| Comment by Sergey Vojtovich [ 2014-03-18 ] |
|
OTOH it looks like temporary tables are in question. These don't go through table cache and must have table->in_use set to 0. Probably it is uninitialized? |
| Comment by Sergey Vojtovich [ 2014-03-18 ] |
|
Hmm... not really they should have proper in_use, but anyway crash doesn't seem to be relevant to table cache because it is temporary table. |
| Comment by Kristian Nielsen [ 2014-03-19 ] |
|
There are multiple bugs visible from this report (thanks, btw!). The lock wait timeout is a rather serious, and complex, issue with InnoDB row The problem is an INSERT and a DELETE on the same table. The DELETE blocks the --source include/have_innodb.inc --connection slave --connection master
--connect (con2,127.0.0.1,root,,test,$SERVER_MYPORT_1,) --connection master --connection con1 --connection slave SELECT * FROM t1 ORDER BY a; --source include/stop_slave.inc --connection master |
| Comment by Kristian Nielsen [ 2014-03-21 ] |
|
Ok, so I will close this as fixed. There were several important issues found by this test: 1. A temporary table bug seen during shutdown (the assertion). Monty fixed 2. The 3. The Please re-run the test to check that it is really fixed. I did at some point This was a very good test, really important to get parallel replication ready |