[MDEV-4211] Galera: with binlog-checksum=1 any ALTER TABLE statement results in Error_code: 1064 and not replicated on other nodes Created: 2013-02-27 Updated: 2013-03-08 Resolved: 2013-03-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 5.5.28a-galera |
| Fix Version/s: | 5.5.29-galera |
| Type: | Bug | Priority: | Major |
| Reporter: | Aleksey Sanin (Inactive) | Assignee: | Seppo Jaakola |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | galera | ||
| Environment: |
Cent OS 5.x Ubuntu 12.04 |
||
| Description |
|
We have a setup of 3 servers in a galera cluster: db01, db02, db03. If we run an ALTER TABLE statement on one of the nodes then the other two nodes get an error in the log and the statement is not replicated. For example, the following queries run on db01: ALTER TABLE `test` ADD INDEX `started_time` (`started_time`); Resulted in the following errors on db02 (db03 errors look the same): 130227 7:30:09 [ERROR] Slave SQL: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 1' on query. Default database: 'test'. Query: 'ALTER TABLE `test` ADD INDEX `started_time` (`started_time`)', Error_code: 1064 No errors have been produced on the db01. I've submitted my.cnf for another bug (https://mariadb.atlassian.net/browse/MDEV-4136), the only change is the following addition: wsrep_log_conflicts=1 Aleksey |
| Comments |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-27 ] |
|
Forgot to add that after checking db02/03 nodes, the alter table commands have not been executed there |
| Comment by Elena Stepanova [ 2013-02-28 ] |
|
Hi Alexey, I can't see a cnf in |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-28 ] |
|
Right, this one. Basically, this is the my.cnf with all wsrep_* options un-disabled |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-28 ] |
|
Also, we've tried to reproduce this problem in our test environment w/o success using same configs (the only difference is memory size/threads number because test machines are smaller than production). We also found these issues that look related: http://www.perconaforum.com/index.php?t=msg&goto=9378& |
| Comment by Elena Stepanova [ 2013-02-28 ] |
|
Okay, thanks, I see – many questions, no answers... Lets see if we can figure it out. Does it happen on any table at all? Thanks. |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-28 ] |
|
1) Yes, we tried a few tables and all have the same issue. Of course, all tables were InnoDB 2) We didn't try other DDLs unfortunately 3) Yes, the same issue regardless of the origin of the DDL. Exactly the same log entries. 4) Unfortunately, we had to rollback to the plain Master-Slave setup last night thus I don't have the error log anymore. And as I said, we can't repro it on our test environment Sorry, not a lot of data unfortunately. I was planning to build the latest 5.5.29, test it in dev environment and may be try another production rollout next week if everything goes well. I'll definitely keep an eye on the GRA*log files this time. |
| Comment by Aleksey Sanin (Inactive) [ 2013-03-01 ] |
|
We found it. The issue is 100% reproducible with binlog_checksum=1 And things work as expected if #binlog_checksum=1 |
| Comment by Aleksey Sanin (Inactive) [ 2013-03-01 ] |
|
clarification: binlog_checksum=1 is enough |
| Comment by Elena Stepanova [ 2013-03-01 ] |
|
Thank you, Aleksey. The problem is still reproducible on current maria-5.5-galera (revno 3386). |
| Comment by Seppo Jaakola [ 2013-03-04 ] |
|
Problem happens because the chosen checksum algorithm is not communicated to receiving nodes, currently there is no method for it. Note that, Galera replication has checksums already, these binlog checksums should not be used |
| Comment by Elena Stepanova [ 2013-03-04 ] |
|
>> these binlog checksums should not be used Then the server should disable/ignore them automatically (with a proper warning in the error log). We cannot expect every user to know all subtle limitations and fight with the issues caused by them. |
| Comment by Seppo Jaakola [ 2013-03-04 ] |
|
A temporary fix has been pushed, which sends a format description event before query event, the FD carries current binlog checksum setting. This fix enables checksums only for DDL statements, and binlog_checksum option should still not be used. |
| Comment by Seppo Jaakola [ 2013-03-04 ] |
|
Fix pushed in: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3389 |
| Comment by Aleksey Sanin (Inactive) [ 2013-03-05 ] |
|
I disagree that checksums for binlog should not be used. In a mixed setup with Galera cluster streaming data with normal replication (e.g. for delayed replication) these checksums in binlog are useful. |
| Comment by Seppo Jaakola [ 2013-03-08 ] |
|
Aleksey, that's a valid use case, indeed. And binlog checksumming should not hurt with Galera replication in general. It is just that this issue was acknowledged so close to the release deadline, that there was no time for a proper fix with full binlog checksum support. The plan is to continue with this development for future releases. |
| Comment by Aleksey Sanin (Inactive) [ 2013-03-08 ] |
|
Hi Seppo. Thanks for your reply. It is perfectly fine to have temporarily hack for the release |