[MDEV-28946] STOP SLAVE or slave errors (ex 1062, 1032) constantly crash MariaDB server Created: 2022-06-25 Updated: 2022-10-22 Resolved: 2022-07-09 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Compiling, Replication |
| Affects Version/s: | 10.6.1, 10.5.15 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | COUNOTTE CEDRIC | Assignee: | Daniel Black |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu 21.10 / Ubuntu 22.04 |
||
| Issue Links: |
|
||||||||
| Description |
|
I've set-up replication between a galera cluster connecting to one node, and each time I stop the slave the slave server crashes immediately. I upgraded to 22.04 and replication stopped working but the crash still occured. I reverted back to 21.10, still getting the crash but replication is still not working so trying to play with settings slave position, etc is just a pain while the server is constantly crashing on me. I've setup replication back in january and the problem was already there, running 10.4 I believe at the time. Any help would be greatly appreciated. 2022-06-25 9:55:09 35 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.012160', position 19131057; GTID position 300-3-522464027,100-1-102978519 To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.5.15-MariaDB-0ubuntu0.21.10.1-log Thread pointer: 0x0 |
| Comments |
| Comment by Daniel Black [ 2022-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you have a sample of the binary log error messages causing this that would be useful along with the table structure SHOW CREATE TABLE tbl. What non-default replication configuration options are you using (if any, assuming using binlog_format=ROW because of Galera). Clarifying the galera cluster member is the replication slave? Is log_slave_updates on? If a coredump was created can you get an apport output or a gdb backtrace. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here are the options I use on any servers, including the slave. Salve server used to be a single server, now it's a galera cluster, it doesn't make a difference when stopping the slave resulting in a crash. Without that one, the server will also crash because replication stops at any of the following errors: slave-skip-errors = 1062,1032 Concerning bin-log those are set this way: binlog-format = row Also some slave options, indeed slave updates is now ON, but used to be OFF when single server: log_slave_updates = ON Anyway, I've attached the whole options added to /etc/mysql/mariadb.conf.d/50-server.cnf on all my servers. I use a script to configure any servers we add in the same way for consistency. [^mariadb_config.txt] I used to have a single MariaDB server as slave of one of the 4-server cluster. Switching from one server to another required me to remove master files config from /var/lib/mysql, then issue a stop slave that crashes the server, wait for restart and setup new slave. Now the MariaDB slave server is part of a 2-node galera cluster. When starting replication the other node is stopped and I use mariabackup to initiate the replication, then start the second node that would automatically use mariabackup for its SST. I intend to have replication both ways however given those crashes I'm not moving on with this as the current 4-node cluster is in production with about 300 customers and 900 mobiles apps connected to it from 7am til 2am. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've just crashed the server (after setting core_file in config) and here is the output log: 2022-07-08 9:36:29 20 [Note] Slave I/O thread: connected to master 'mdb_control@ovh3.vlan:3306',replication starts at GTID position '100-1-137402868,300-3-522464027' To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.5.15-MariaDB-0ubuntu0.21.10.1-log Thread pointer: 0x0 However I couldn't find any core file under /var/lib/mysql !? Since it crashed on its own and I could find a dump which I gzipped in attached file. [^dump.gzip] Here is what I found in logs about this last crash, seems it crashed on slave stop again: 2022-07-08 9:54:29 21 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table 1check_front_www_v2.wp_options; Duplicate entry '_transient_global_styles_bridge' for key 'option_name', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.013999, end_log_pos 798461367, Gtid 100-1-137682169, Internal MariaDB error code: 1062 To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.5.15-MariaDB-0ubuntu0.21.10.1-log Thread pointer: 0x0 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks ccounotte for the details. core files are kernel generated based on sysctl -a | grep kernel.core settings however there's enough here (and the ubuntu lp report that indicates that galera probably isn't part of it) with your help to try to reproduce it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just in case I got a core dump, it's 70MB compressed so I splitted it into 10MB files, hopefully I can upload them here. Hopefully that'll be enough to find a solution by changing some options only | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So the pthread_exit(0) at the bottom of the handle_slave_sql thread is asserting (between frames #9 and #15, internal to libgcc). In the signal handler the my_read is also segfaulting on https://github.com/MariaDB/server/blob/mariadb-10.5.15/mysys/my_read.c#L63 without probably explanation.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Given the stack frames from 7,8,9 being the same as the bug https://bugs.launchpad.net/ubuntu/+source/mariadb-10.6/+bug/1970634/comments/6, can you try the 10.6 test package on https://bugs.launchpad.net/ubuntu/+source/mariadb-10.6/+bug/1970634/comments/21 for 22.04/jammy. Because 21.10 is end of life this month I think you'd be hard pushed to get a 10.5 update. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for going to the trouble of uploading the core file split. I didn't expect this. FYI https://mariadb.com/kb/en/meta/mariadb-ftp-server/ is better suited to bulk private files like this and I should have mentioned it. Note core files can contain significant passwords and user data so I don't recommend uploading them in raw form publicly (but they where useful). I can't delete them, but you can. I don't need them any more. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Daniel for the test package! Is it "compatible" with 10.5.15 regarding replication and galera cluster? I've attempted to upgrade my servers to 22.04 a few weeks ago and replication stopped (though maybe an UPGRADE might be enough) because mariabackup would create mysql tables that are not compatible!? And I've read SST using mariabackup would not work for that reason too. Any hints on upgrading my 6 servers to MariaDB 10.6 ? I've got 4 servers in a galera cluster replicating to 2 others in another galera cluster in another location. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Test package was from the Ubuntu maintainer not me. I don't know enough about the Galera upgrade paths to offer a recommendation sorry. I just tested the impish 10.5 versions from mariadb downloads and they don't crash during STOP SLAVE. So I'd recommend those for now. By the time you get to 22.04, Ubuntu should have the LTO disabled packages there. The next 10.6.7 release is scheduled for the end of the month and contains the referenced fixes (so far). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Daniel for your reply. I was able to upgrade 2 servers with MariaDB 10.5.16 and it no longer crashes indeed. If you don't mind I have one last question, reading this: " The current supported versions are: 10.2, 10.3, 10.4, 10.5, 10.6 (supported for 5 years), 10.7 (supported for one year), 10.8 (supported for one year) and the development version is 10.9. " Does it mean 10.5 will be supported for 5 years, or it is only 10.6 ? I suppose not and using Ubuntu 21.10 with 10.5 was a mistake, as I've tested upgrading to 10.6 and replication or galera stopped working along 10.5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Glad to hear you successful upgrade. ref: maintenance policy, so yes 10.5 is 5 years. 21.10 is EOL at the end of this month so 10.5.16 was the last release for it on ubuntu impish by us. We don't do releases less that what's supported on the distro so there won't be a 22.04 10.5 release as 10.6 is packaged. We will be doing 10.5 20.04 focal packages until the 10.5 eol date if you want to stay on that branch longer. Please do report the 10.6 upgrade issues on galera/replication. There are meant to be supported, and documentation has fallen behind MDEV-28483. Some constraints on Galera SST upgrades are in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the heads-up. I'm using mariabackup as SST method and replication boot-strap method (is there any other for replication?), primarily because servers are live and cannot be taken down. And this method is said not possible for an upgrade unless IST is used, which in turn requires enough gcache. I already tried migrating the slave server and it would report differences in mysql DB tables and would not start anymore!? I had to reinstall the server entirely with old OS and re-apply replication using a new mariabackup backup, which takes hours for our 180GB DB. I just tried to upgrade to Ubuntu 22.04 while keeping MariaDB 10.5.16, but it doesn't seem possible. When it tells me some repo has been disabled I renabled the MariaDB repo you gave me, and proceed, but it ends up with this: Error during update A problem occurred during the update. This is usually some sort of W:Updating from such a repository can't be done securely, and is Anyway to keep 10.5 on Ubuntu 22.04 ? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I typoed 22.04 focal for 20.04 focal sorry. > Anyway to keep 10.5 on Ubuntu 22.04 ? Options (though not great); building your own packages, or using a tarball from our download page | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by COUNOTTE CEDRIC [ 2022-07-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for fixing the typo, I'm still unsure which is the safest option between reverting our servers to 20.04 or to first upgrade MariaDB to 10.6, then upgrade Ubuntu to 22.04. Reverting seems the safeest option avoiding major version upgrade of MariaDB, but reinstalling 6 servers seems a little overwhelming. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-07-13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Alternative: Create a systemd service for a mariadb:10.5 container. A slave server should still start on 10.6, even with reporting table differences. Table differences should be resolved with mariadb-upgrade. If possible, I'd like to see a log in a new bug report of it not starting. |