[MDEV-8133] ALTER TABLE can perform the operation but escape the binary log Created: 2015-05-11 Updated: 2016-02-15 Resolved: 2016-02-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Replication |
| Affects Version/s: | 10.0, 10.1 |
| Fix Version/s: | 10.0.24, 10.1.12 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Sprint: | 10.0.22, 10.0.24 |
| Description |
|
If ALTER TABLE is interrupted, it can happen that it performs the operation, e.g. adds a column, but is not written to the binary log, which of course causes further discrepancy. The MTR test below is for 10.1 only, because it uses max_statement_time, but the problem is not specific to it, same happens with KILL QUERY (see the RQG test which can be used for 10.0 as well). The MTR test highly depends on the machine timing, so it might need some tuning, e.g. more or less data in the table, longer or shorter max_statement_time, etc.
For running the RQG test, clone lp:~elenst/randgen/mariadb-patches (the main randgen tree might miss some changes or fixes in the required components).
The failure can come in two flavors: either it complains about binlog replaying error due to an unknown column, or reports multiple differences like
where the only difference is the number of columns in the table. If neither happens, try to increase the number of threads on the command line. |
| Comments |
| Comment by Sergei Golubchik [ 2015-07-27 ] | ||
|
Sorry, but I cannot repeat it with rqg. Tried to increase the number of threads — didn't help. I'd rather not use the first (mtr) test case, because I need to repeat it in 10.0. | ||
| Comment by Elena Stepanova [ 2015-07-29 ] | ||
|
I have set it up on perro (the RQG test).
| ||
| Comment by Sergei Golubchik [ 2015-10-22 ] | ||
|
I don't think this can be fixed completely. In-place ALTER cannot be undone and there are lots of actions (tables are opened, renamed, etc) that can fail after the in-place table modifications were done. After these failures we may end up with a half-done ALTER, for example, the table has the old name (if ALTER...RENAME) but a new structure. The only “fix” I can think of is to abort the replication. This can be easily achieved by binlogging ALTER together with the error code. There will be no error on the slave and the replication will stop. Not very intuitive behavior, though. | ||
| Comment by Sergei Golubchik [ 2015-10-22 ] | ||
|
Is it an upstream bug? | ||
| Comment by Elena Stepanova [ 2015-10-22 ] | ||
|
I don't see any notes about it in my description or comments, apparently I didn't check. I will do it shortly. | ||
| Comment by Elena Stepanova [ 2015-10-28 ] | ||
|
Could not reproduce on MariaDB 5.5, MySQL 5.5, MySQL 5.6 (for the latter, ran 20 attempts of the RQG test – for MariaDB 10.x usually 1-2 is enough). |