[MDEV-15607] mysqld crashed few after node is being joined with sst Created: 2018-03-20 Updated: 2022-07-06 Resolved: 2018-06-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera SST |
| Affects Version/s: | 10.1, 10.3.4, 10.2 |
| Fix Version/s: | 10.1.35, 10.2.17, 10.3.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Zdravelina Sokolovska (Inactive) | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.4 |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Description |
|
mysqld crashed few after node is being joined with sst join Node to galera cluster loaded with ~12G data and set wsrep_sst_method=mariabackup
|
| Comments |
| Comment by Alexey [ 2018-03-20 ] |
|
This is a typical systemd response - it tries to shutdown the joiner (due to "timeout") before the joiner manages to complete SST: 2018-03-20 16:36:15 0 [Note] /usr/sbin/mysqld (unknown): Normal shutdown The assert happens due to a race between the ongoing startup and systemd-initialted shutdown. So it is a matter of fixing the systemd script. |
| Comment by Marko Mäkelä [ 2018-03-20 ] |
|
It seems that wsrep_view_handler_cb() could add calls to sd_notifyf("EXTEND_TIMEOUT_USEC=…") to extend the startup timeout. Alternatively, wsrep_SE_init_wait() could call mysql_cond_timedwait() instead of mysql_cond_wait() in the loop and keep extending the timeout. |
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-03-26 ] |
|
on 10.2 joiner: => Rate:[ 39MiB/s] Avg:[32.9MiB/s] Elapsed:0:01:20 ---- SST failed to complete :Interrupted system call |
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-04-11 ] |
|
note : on 10.2 with the same setup and data it's come to the problem |
| Comment by Sergei Golubchik [ 2018-04-17 ] |
|
Why 10.3 needs 4:55 when 10.2 completes SST in 1:20? Why 10.3 is 3.5 times slower? |
| Comment by Jan Lindström (Inactive) [ 2018-04-24 ] |
|
https://github.com/MariaDB/server/commit/48e3b4ca5dd6a6cffbee64381dc301d43c66e036 |
| Comment by Hartmut Holzgraefe [ 2018-10-31 ] |
|
Looks as if timeouts still happen on server versions that should have this fixed according to version info above. SystemD timeouts during startup still happened on 10.2.18 and 10.3.9 servers. See |