Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.6.1, 10.5.15
-
None
-
Ubuntu 21.10 / Ubuntu 22.04
Description
I've set-up replication between a galera cluster connecting to one node, and each time I stop the slave the slave server crashes immediately.
I upgraded to 22.04 and replication stopped working but the crash still occured. I reverted back to 21.10, still getting the crash but replication is still not working so trying to play with settings slave position, etc is just a pain while the server is constantly crashing on me.
I've setup replication back in january and the problem was already there, running 10.4 I believe at the time.
Any help would be greatly appreciated.
2022-06-25 9:55:09 35 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.012160', position 19131057; GTID position 300-3-522464027,100-1-102978519
2022-06-25 9:55:09 35 [Note] master was xxxxx:3306
220625 9:55:09 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.5.15-MariaDB-0ubuntu0.21.10.1-log
key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=2
max_threads=5002
thread_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11273904 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
2022-06-25 9:55:09 0 [Note] InnoDB: FTS optimize thread exiting.
2022-06-25 9:55:09 0 [Note] InnoDB: Buffer pool(s) load aborted due to user instigated abort at 220625 9:55:09
2022-06-25 9:55:09 0 [Note] InnoDB: Dumping of buffer pool not started as load was incomplete
2022-06-25 9:55:09 0 [Note] InnoDB: Starting shutdown...
2022-06-25 9:55:09 0 [Note] InnoDB: Dumping of buffer pool not started as load was incomplete
??:0(my_print_stacktrace)[0x55f980682170]
??:0(handle_fatal_signal)[0x55f980207268]
??:0(__sigaction)[0x7faeb69a6520]
??:0(pthread_kill)[0x7faeb69fa828]
??:0(raise)[0x7faeb69a6476]
??:0(abort)[0x7faeb698c7b7]
??:0(wsrep_write_dummy_event_low(THD*, char const*))[0x55f97fe6754a]
??:0(wsrep::wsrep_provider_v26::status() const)[0x55f98076065f]
??:0(_Unwind_GetTextRelBase)[0x7faeb69595f4]
??:0(_Unwind_ForcedUnwind)[0x7faeb6959ce2]
??:0(sem_trywait)[0x7faeb6a012a6]
??:0(pthread_exit)[0x7faeb69f9afa]
??:0(handle_slave_sql)[0x55f97ff0b9b1]
??:0(aria_get_capabilities)[0x55f98040cc23]
??:0(pthread_condattr_setpshared)[0x7faeb69f8947]
??:0(clone)[0x7faeb6a88a44]
Attachments
Issue Links
- duplicates
-
MDEV-25633 MariaDB crashes when compiled with link time optimizations
-
- Closed
-
- links to
Activity
Here are the options I use on any servers, including the slave. Salve server used to be a single server, now it's a galera cluster, it doesn't make a difference when stopping the slave resulting in a crash.
Without that one, the server will also crash because replication stops at any of the following errors:
slave-skip-errors = 1062,1032
Concerning bin-log those are set this way:
binlog-format = row
expire_logs_days = 1
max_binlog_size = 1024M
log_bin = /var/lib/mysql/mysql-bin
log_bin_index = /var/lib/mysql/mysql-bin.index
Also some slave options, indeed slave updates is now ON, but used to be OFF when single server:
log_slave_updates = ON
slave_parallel_mode = conservative
slave_transaction_retry_errors = 1213,1205
Anyway, I've attached the whole options added to /etc/mysql/mariadb.conf.d/50-server.cnf on all my servers. I use a script to configure any servers we add in the same way for consistency.
[^mariadb_config.txt]
I used to have a single MariaDB server as slave of one of the 4-server cluster. Switching from one server to another required me to remove master files config from /var/lib/mysql, then issue a stop slave that crashes the server, wait for restart and setup new slave.
Now the MariaDB slave server is part of a 2-node galera cluster. When starting replication the other node is stopped and I use mariabackup to initiate the replication, then start the second node that would automatically use mariabackup for its SST.
I intend to have replication both ways however given those crashes I'm not moving on with this as the current 4-node cluster is in production with about 300 customers and 900 mobiles apps connected to it from 7am til 2am.
I've just crashed the server (after setting core_file in config) and here is the output log:
2022-07-08 9:36:29 20 [Note] Slave I/O thread: connected to master 'mdb_control@ovh3.vlan:3306',replication starts at GTID position '100-1-137402868,300-3-522464027'
2022-07-08 9:37:25 21 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.013990' at position 60565725; GTID position '100-1-137402868,300-3-522464027'
2022-07-08 9:37:25 21 [Note] master was ovh3.vlan:3306
2022-07-08 9:37:46 20 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.014000', position 367171055; GTID position 100-1-137692762,300-3-522464027
2022-07-08 9:37:46 20 [Note] master was ovh3.vlan:3306
220708 9:37:46 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.5.15-MariaDB-0ubuntu0.21.10.1-log
key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=9
max_threads=5002
thread_count=23
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11273904 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
2022-07-08 9:37:46 0 [Note] InnoDB: Buffer pool(s) load completed at 220708 9:37:46
??:0(my_print_stacktrace)[0x56532aa86170]
??:0(handle_fatal_signal)[0x56532a60b268]
??:0(__sigaction)[0x7f37bdac6520]
??:0(pthread_kill)[0x7f37bdb1a828]
??:0(raise)[0x7f37bdac6476]
??:0(abort)[0x7f37bdaac7b7]
??:0(wsrep_write_dummy_event_low(THD*, char const*))[0x56532a26b54a]
??:0(wsrep::wsrep_provider_v26::status() const)[0x56532ab6465f]
??:0(_Unwind_GetTextRelBase)[0x7f37bda7b5f4]
??:0(_Unwind_ForcedUnwind)[0x7f37bda7bce2]
??:0(sem_trywait)[0x7f37bdb212a6]
??:0(pthread_exit)[0x7f37bdb19afa]
??:0(handle_slave_sql)[0x56532a30f9b1]
??:0(aria_get_capabilities)[0x56532a810c23]
??:0(pthread_condattr_setpshared)[0x7f37bdb18947]
??:0(clone)[0x7f37bdba8a44]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Fatal signal 11 while backtracing
However I couldn't find any core file under /var/lib/mysql !?
Since it crashed on its own and I could find a dump which I gzipped in attached file.
[^dump.gzip]
Here is what I found in logs about this last crash, seems it crashed on slave stop again:
2022-07-08 9:54:29 21 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table 1check_front_www_v2.wp_options; Duplicate entry '_transient_global_styles_bridge' for key 'option_name', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.013999, end_log_pos 798461367, Gtid 100-1-137682169, Internal MariaDB error code: 1062
2022-07-08 9:54:32 21 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 100-1-137683032, Internal MariaDB error code: 1047
2022-07-08 9:54:32 21 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.013999' at position 880992554; GTID position '100-1-137683031,300-3-522464027'
2022-07-08 9:54:32 21 [Note] master was ovh3.vlan:3306
220708 9:54:32 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.5.15-MariaDB-0ubuntu0.21.10.1-log
key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=9
max_threads=5002
thread_count=24
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11273904 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
??:0(my_print_stacktrace)[0x55ebef3e0170]
??:0(handle_fatal_signal)[0x55ebeef65268]
??:0(__sigaction)[0x7f6219581520]
??:0(pthread_kill)[0x7f62195d5828]
??:0(raise)[0x7f6219581476]
??:0(abort)[0x7f62195677b7]
??:0(wsrep_write_dummy_event_low(THD*, char const*))[0x55ebeebc554a]
??:0(wsrep::wsrep_provider_v26::status() const)[0x55ebef4be65f]
??:0(_Unwind_GetTextRelBase)[0x7f62195365f4]
??:0(_Unwind_ForcedUnwind)[0x7f6219536ce2]
??:0(sem_trywait)[0x7f62195dc2a6]
??:0(pthread_exit)[0x7f62195d4afa]
??:0(handle_slave_sql)[0x55ebeec699b1]
??:0(aria_get_capabilities)[0x55ebef16ac23]
??:0(pthread_condattr_setpshared)[0x7f62195d3947]
??:0(clone)[0x7f6219663a44]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Fatal signal 11 while backtracing
Thanks ccounotte for the details. core files are kernel generated based on sysctl -a | grep kernel.core settings however there's enough here (and the ubuntu lp report that indicates that galera probably isn't part of it) with your help to try to reproduce it.
Just in case I got a core dump, it's 70MB compressed so I splitted it into 10MB files, hopefully I can upload them here.
[^core.zip.001] [^core.zip.002] [^core.zip.003] [^core.zip.004] [^core.zip.005] [^core.zip.006] [^core.zip.007]
Hopefully that'll be enough to find a solution by changing some options only
resulting backtrace from core files |
Core was generated by `/usr/sbin/mariadbd --wsrep_start_position=9ffedd32-fdfe-11ec-bde9-aa7bce0e8d71:'.
|
Program terminated with signal SIGSEGV, Segmentation fault.
|
#0 0x000056532aa85500 in my_read (Filedes=6388, Buffer=0x7f1eac229170 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds \nMax file size", ' ' <repeats 13 times>, "unlimited "..., Count=4096, MyFlags=0) at ../mysys/./mysys/my_read.c:63
|
63 ../mysys/./mysys/my_read.c: No such file or directory.
|
[Current thread is 1 (Thread 0x7f1eac22c640 (LWP 90493))]
|
(gdb) bt -frame-arguments all full
|
#0 0x000056532aa85500 in my_read (Filedes=6388,
|
Buffer=0x7f1eac229170 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds \nMax file size", ' ' <repeats 13 times>, "unlimited "..., Count=4096, MyFlags=0)
|
at ../mysys/./mysys/my_read.c:63
|
got_errno = <optimized out>
|
readbytes = 1323
|
save_count = 0
|
#1 0x000056532a60ad8d in output_core_info () at ./sql/signal_handler.cc:74
|
buff = "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds \nMax file size", ' ' <repeats 13 times>, "unlimited", ' ' <repeats 12 times>...
|
len = <optimized out>
|
fd = 6388
|
#2 0x000056532a60b1e1 in handle_fatal_signal (sig=<optimized out>) at ./sql/signal_handler.cc:340
|
curr_time = 1657265866
|
tm = {tm_sec = 46, tm_min = 37, tm_hour = 9, tm_mday = 8, tm_mon = 6, tm_year = 122, tm_wday = 5, tm_yday = 188, tm_isdst = 1, tm_gmtoff = 7200,
|
tm_zone = 0x56532c121130 "CEST"}
|
thd = 0x0
|
print_invalid_query_pointer = false
|
#3 <signal handler called>
|
No locals.
|
#4 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139769713706560) at pthread_kill.c:44
|
tid = 90493
|
ret = 0
|
pd = 0x7f1eac22c640
|
old_mask = {__val = {0, 139766092679528, 0, 0, 139769713700832, 12848595740987219712, 31, 18446744073709551272, 0, 139769713700880, 139769713700896, 94915222159912, 0,
|
139877382524165, 29, 139766087800944}}
|
ret = <optimized out>
|
pd = <optimized out>
|
old_mask = <optimized out>
|
ret = <optimized out>
|
tid = <optimized out>
|
ret = <optimized out>
|
resultvar = <optimized out>
|
resultvar = <optimized out>
|
__arg2 = <optimized out>
|
__arg1 = <optimized out>
|
_a2 = <optimized out>
|
_a1 = <optimized out>
|
__futex = <optimized out>
|
resultvar = <optimized out>
|
__arg3 = <optimized out>
|
__arg2 = <optimized out>
|
__arg1 = <optimized out>
|
--Type <RET> for more, q to quit, c to continue without paging--c
|
_a3 = <optimized out>
|
_a2 = <optimized out>
|
_a1 = <optimized out>
|
__futex = <optimized out>
|
__private = <optimized out>
|
__oldval = <optimized out>
|
result = <optimized out>
|
#5 __pthread_kill_internal (signo=6, threadid=139769713706560) at pthread_kill.c:80
|
No locals.
|
#6 __GI___pthread_kill (threadid=139769713706560, signo=signo@entry=6) at pthread_kill.c:91
|
No locals.
|
#7 0x00007f37bdac6476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
|
ret = <optimized out>
|
#8 0x00007f37bdaac7b7 in __GI_abort () at abort.c:79
|
save_stage = 1
|
act = {__sigaction_handler = {sa_handler = 0x7f37bda7cfa0, sa_sigaction = 0x7f37bda7cfa0}, sa_mask = {__val = {139769713701168, 94915183476736, 139877393065928, 94915183476800, 139769713655822, 23, 0, 1, 139769713706504, 0, 12848595740987219712, 139769713702384, 139769713701376, 139769713702248, 139769713702080, 139769713702080}}, sa_flags = 707852720, sa_restorer = 0x0}
|
sigs = {__val = {32, 26714, 10712, 201, 473, 139877108757572, 94915221666136, 139877108757752, 0, 139877393064480, 23, 139769713701024, 139769713702080, 0, 0, 139877383369314}}
|
#9 0x000056532a26b54a in _Unwind_SetGR.cold ()
|
No symbol table info available.
|
#10 0x000056532ab6465f in __gcc_personality_v0 ()
|
No symbol table info available.
|
#11 0x00007f37bda7b5f4 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
|
No symbol table info available.
|
#12 0x00007f37bda7bce2 in _Unwind_ForcedUnwind () from /lib/x86_64-linux-gnu/libgcc_s.so.1
|
No symbol table info available.
|
#13 0x00007f37bdb212a6 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:132
|
ibuf = <optimized out>
|
self = <optimized out>
|
#14 0x00007f37bdb19afa in __do_cancel () at ../sysdeps/nptl/pthreadP.h:281
|
self = <optimized out>
|
#15 __GI___pthread_exit (value=0x0) at pthread_exit.c:37
|
No locals.
|
#16 0x000056532a30f9b1 in handle_slave_sql (arg=arg@entry=0x56532c2c7370) at ./sql/slave.cc:5737
|
thd = 0x7f1dd4001558
|
saved_log_name = '\000' <repeats 511 times>
|
saved_master_log_name = '\000' <repeats 511 times>
|
saved_log_pos = 0
|
saved_master_log_pos = 0
|
saved_skip_gtid_pos = {<Charset> = {m_charset = 0x56532b29eb80 <my_charset_bin>}, <Binary_string> = {<Static_binary_string> = {<Sql_alloc> = {<No data fields>}, Ptr = 0x0, str_length = 0}, Alloced_length = 0, extra_alloc = 0, alloced = false, thread_specific = false}, <No data fields>}
|
saved_skip = 0
|
mi = 0x56532c2c7370
|
rli = 0x56532c2c8d38
|
wsrep_node_dropped = 0 '\000'
|
errmsg = 0x0
|
serial_rgi = <optimized out>
|
sql_info = {cached_charset = "\000\000\000\000\000", rpl_filter = 0x56532c123f10}
|
#17 0x000056532a810c23 in pfs_spawn_thread (arg=0x56539da284f8) at ../storage/perfschema/./storage/perfschema/pfs.cc:2201
|
typed_arg = 0x56539da284f8
|
user_arg = 0x56532c2c7370
|
user_start_routine = 0x56532a30e720 <handle_slave_sql(void*)>
|
pfs = <optimized out>
|
klass = <optimized out>
|
#18 0x00007f37bdb18947 in start_thread (arg=<optimized out>) at pthread_create.c:435
|
ret = <optimized out>
|
pd = <optimized out>
|
out = <optimized out>
|
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139769713706560, 6973988136746809818, 140727742483134, 140727742483135, 0, 139769713401856, -6992457413021074982, -7014376065989019174}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
|
not_first_call = <optimized out>
|
#19 0x00007f37bdba8a44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
|
No locals.
|
So the pthread_exit(0) at the bottom of the handle_slave_sql thread is asserting (between frames #9 and #15, internal to libgcc).
In the signal handler the my_read is also segfaulting on https://github.com/MariaDB/server/blob/mariadb-10.5.15/mysys/my_read.c#L63 without probably explanation.
#0 0x000056532aa85500 in my_read (Filedes=6388,
|
Buffer=0x7f1eac229170 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds \nMax file size", ' ' <repeats 13 times>, "unlimited "..., Count=4096, MyFlags=0)
|
at ../mysys/./mysys/my_read.c:63
|
63 ../mysys/./mysys/my_read.c: No such file or directory.
|
(gdb) info locals
|
got_errno = <optimized out>
|
readbytes = 1323
|
save_count = 0
|
(gdb) p errno
|
$3 = 0
|
Given the stack frames from 7,8,9 being the same as the bug https://bugs.launchpad.net/ubuntu/+source/mariadb-10.6/+bug/1970634/comments/6, can you try the 10.6 test package on https://bugs.launchpad.net/ubuntu/+source/mariadb-10.6/+bug/1970634/comments/21 for 22.04/jammy.
Because 21.10 is end of life this month I think you'd be hard pushed to get a 10.5 update.
Thanks for going to the trouble of uploading the core file split. I didn't expect this.
FYI https://mariadb.com/kb/en/meta/mariadb-ftp-server/ is better suited to bulk private files like this and I should have mentioned it.
Note core files can contain significant passwords and user data so I don't recommend uploading them in raw form publicly (but they where useful). I can't delete them, but you can. I don't need them any more.
Thanks Daniel for the test package! Is it "compatible" with 10.5.15 regarding replication and galera cluster?
I've attempted to upgrade my servers to 22.04 a few weeks ago and replication stopped (though maybe an UPGRADE might be enough) because mariabackup would create mysql tables that are not compatible!?
And I've read SST using mariabackup would not work for that reason too.
Any hints on upgrading my 6 servers to MariaDB 10.6 ? I've got 4 servers in a galera cluster replicating to 2 others in another galera cluster in another location.
Test package was from the Ubuntu maintainer not me.
I don't know enough about the Galera upgrade paths to offer a recommendation sorry.
I just tested the impish 10.5 versions from mariadb downloads and they don't crash during STOP SLAVE. So I'd recommend those for now.
By the time you get to 22.04, Ubuntu should have the LTO disabled packages there.
The next 10.6.7 release is scheduled for the end of the month and contains the referenced fixes (so far).
Thanks Daniel for your reply. I was able to upgrade 2 servers with MariaDB 10.5.16 and it no longer crashes indeed.
If you don't mind I have one last question, reading this: " The current supported versions are: 10.2, 10.3, 10.4, 10.5, 10.6 (supported for 5 years), 10.7 (supported for one year), 10.8 (supported for one year) and the development version is 10.9. "
Does it mean 10.5 will be supported for 5 years, or it is only 10.6 ?
I suppose not and using Ubuntu 21.10 with 10.5 was a mistake, as I've tested upgrading to 10.6 and replication or galera stopped working along 10.5
Glad to hear you successful upgrade.
ref: maintenance policy, so yes 10.5 is 5 years.
21.10 is EOL at the end of this month so 10.5.16 was the last release for it on ubuntu impish by us. We don't do releases less that what's supported on the distro so there won't be a 22.04 10.5 release as 10.6 is packaged. We will be doing 10.5 20.04 focal packages until the 10.5 eol date if you want to stay on that branch longer.
Please do report the 10.6 upgrade issues on galera/replication. There are meant to be supported, and documentation has fallen behind MDEV-28483.
Some constraints on Galera SST upgrades are in MDEV-27437, so if it was SST related and not rsync this may have been one problem along with aio being default on.
Thanks for the heads-up.
I'm using mariabackup as SST method and replication boot-strap method (is there any other for replication?), primarily because servers are live and cannot be taken down. And this method is said not possible for an upgrade unless IST is used, which in turn requires enough gcache.
I already tried migrating the slave server and it would report differences in mysql DB tables and would not start anymore!? I had to reinstall the server entirely with old OS and re-apply replication using a new mariabackup backup, which takes hours for our 180GB DB.
I just tried to upgrade to Ubuntu 22.04 while keeping MariaDB 10.5.16, but it doesn't seem possible.
When it tells me some repo has been disabled I renabled the MariaDB repo you gave me, and proceed, but it ends up with this:
Error during update
A problem occurred during the update. This is usually some sort of
network problem, please check your network connection and retry.
W:Updating from such a repository can't be done securely, and is
therefore disabled by default., W:See apt-secure(8) manpage for
repository creation and user configuration details., E:The repository
'https://mirror.mva-n.net/mariadb/repo/10.5/ubuntu jammy Release'
does not have a Release file.
Anyway to keep 10.5 on Ubuntu 22.04 ?
I typoed 22.04 focal for 20.04 focal sorry.
> Anyway to keep 10.5 on Ubuntu 22.04 ?
Options (though not great); building your own packages, or using a tarball from our download page
Thanks for fixing the typo, I'm still unsure which is the safest option between reverting our servers to 20.04 or to first upgrade MariaDB to 10.6, then upgrade Ubuntu to 22.04. Reverting seems the safeest option avoiding major version upgrade of MariaDB, but reinstalling 6 servers seems a little overwhelming.
Alternative:
Create a systemd service for a mariadb:10.5 container.
A slave server should still start on 10.6, even with reporting table differences. Table differences should be resolved with mariadb-upgrade. If possible, I'd like to see a log in a new bug report of it not starting.
If you have a sample of the binary log error messages causing this that would be useful along with the table structure SHOW CREATE TABLE tbl.
What non-default replication configuration options are you using (if any, assuming using binlog_format=ROW because of Galera).
Clarifying the galera cluster member is the replication slave? Is log_slave_updates on?
If a coredump was created can you get an apport output or a gdb backtrace.