[MDEV-14822] binlog.binlog_killed fails with wrong result Created: 2017-12-30 Updated: 2018-01-10 Resolved: 2018-01-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Tests |
| Affects Version/s: | 10.3 |
| Fix Version/s: | 10.3.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Vicențiu Ciorbaru | Assignee: | Michael Widenius |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
In travis, we build the server with the following config line:
Due to how travis environment is set up, the test fails when compiling using clang 5.0.1 or with GCC 6.3.0 Locally, I was able the reproduce it with the following failure:
However, at times the failure is like this:
The common denominator it seems is that when running a transaction with a transactional storage engine (Innodb), in some cases we don't immediately flush to the binlog. If the kill query signal happens earlier, we don't write the transaction at all (including the ROLLBACK event). The problem is that the test doesn't always fail. With --repeat=100, I was able to get it to reproduce once every 3-4 runs. Ultimately, the binlog contents are valid either way, but the test needs to be stabilized in some way. The sequence of queries is roughly this:
|
| Comments |
| Comment by Vicențiu Ciorbaru [ 2017-12-30 ] | ||||||||||||||||||||||
| Comment by Vicențiu Ciorbaru [ 2017-12-30 ] | ||||||||||||||||||||||
|
Reverting c4581735d0210beba0733b30df8dd994786663fe partially seems to have solved the problem.
| ||||||||||||||||||||||
| Comment by Vicențiu Ciorbaru [ 2017-12-30 ] | ||||||||||||||||||||||
|
While trying to reproduce the problem on GCC 7.2.1, I've gotten the following failure:
| ||||||||||||||||||||||
| Comment by Michael Widenius [ 2018-01-09 ] | ||||||||||||||||||||||
|
Binlog checkpoints can happen any time. They are more or less random. To get rid of them, one should start scripts with 'reset master' or use the new mtr variable "--let $skip_checkpoint_events=1" which removes them from show_binlog_events | ||||||||||||||||||||||
| Comment by Michael Widenius [ 2018-01-10 ] | ||||||||||||||||||||||
|
Problem was timing between the thread that was killed and the thread that was Updated the test to wait until the killed thread was properly terminated To make check safe, I changed "threads_connected" to be updated after |