[MDEV-29346] update_rows_log_event hung causing galera cluster failure - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6.5
Fix Version/s: 10.6.15, 10.9.8, 10.10.7, 10.11.7, 11.1.4, 11.2.3, 11.3.2
Component/s: Galera
Labels:
None
Environment:
3 Node Galera Cluster

Description

We have multiple galera clusters working in a multi-master setup. And noticed that a "sleeping" system thread could hung the whole cluster.

When this system thread hung as shown in the screenshot, the whole galera cluster goes into a stand still. Nothing an be written into the database

We have a log that print the "wsrep_last_committed", it shows that one of the node 's wsrep_last_commited is not moving. Did the wsrep plugin in Galera hung?

The h5 server is the one that stuck. There is nothing in the mysql.err showing any stacktrace

2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h4:

2022-08-18 06:10:04,861 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "150", "wsrep_last_committed": "21383020",

2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h5:

2022-08-18 06:10:04,862 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "590", "wsrep_last_committed": "21382990",

2022-08-18 06:10:04,863 INFO galera_alert line:93 galerastats on node xxx-h6:

2022-08-18 06:10:04,863 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "204", "wsrep_last_committed": "21383020",

....

....

2022-08-18 06:30:04,996 INFO galera_alert line:93 galerastats on node xxx-h4:

2022-08-18 06:30:04,996 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "170", "wsrep_last_committed": "21383020",

2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h5:

2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "643", "wsrep_last_committed": "21382990",

2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h6:

2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "228", "wsrep_last_committed": "21383020",

The only solution to "unbreak" it is to stop the hung node, kill mariadb and start the mariadb service

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2022-08-22-12-29-26-874.png
943 kB
2022-08-22 04:29
mariadbd_full_bt_all_threads_11feb246.txt
1.28 MB
2023-02-11 08:07
mariadbd_full_bt_all_threads.txt
1.43 MB
2023-02-10 06:22
mariadbd_full_bt_all_threads-h12_1712676357.log
755 kB
2024-04-19 05:14
mariadbd_full_bt_all_threads-h14_1712676357.log
1.43 MB
2024-04-19 05:14
mariadbd_full_bt_all_threads-h15_1712676357.log
957 kB
2024-04-19 05:14

Issue Links

is caused by

MDEV-29293 MariaDB stuck on starting commit state (waiting on commit order critical section)

Closed

relates to

MDEV-27689 Node hangs and complete galera cluster freezes

Closed

MDEV-30718 Cluster hanging regularly on Update_rows_log_event

Closed

Activity

Ascending order - Click to sort in descending order

Khai Ping created issue - 2022-08-22 04:34

Daniel Black added a comment - 2022-08-22 05:36

Can you:

Daniel Black added a comment - 2022-08-22 05:36 Can you: install debug-info packages Get a backtrace of the mariadb server in this state

Khai Ping added a comment - 2022-08-23 08:32

@daniel, does installing the debug-info packages have any performance impact ?

Khai Ping added a comment - 2022-08-23 08:32 @daniel, does installing the debug-info packages have any performance impact ?

Daniel Black added a comment - 2022-08-23 09:11

No, they are information only and used by gdb. Small bit of storage but no impacts to the running server or any replacement of code.

Daniel Black added a comment - 2022-08-23 09:11 No, they are information only and used by gdb. Small bit of storage but no impacts to the running server or any replacement of code.

Khai Ping added a comment - 2022-08-23 09:30

thank you,i will come back with more information

Khai Ping added a comment - 2022-08-23 09:30 thank you,i will come back with more information

Khai Ping added a comment - 2022-08-24 15:33 - edited

@daniel,

Does this means i do not need to install the debug info packages? As my binary is not stripped.

/opt/sbin/mariadbd: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=893a0b4698fc39d184df3f3c32df693dfa008884, not stripped

When i tried to gdb attach <pid>, i get these lines. Does that mean i need to install the debuginfo?

Reading symbols from /usr/lib64/libgssapi_krb5.so.2...Reading symbols from /usr/lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.

Khai Ping added a comment - 2022-08-24 15:33 - edited @daniel, Does this means i do not need to install the debug info packages? As my binary is not stripped. /opt/sbin/mariadbd: ELF 64 -bit LSB shared object, x86- 64 , version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6 . 32 , BuildID[sha1]=893a0b4698fc39d184df3f3c32df693dfa008884, not stripped When i tried to gdb attach <pid>, i get these lines. Does that mean i need to install the debuginfo? Reading symbols from /usr/lib64/libgssapi_krb5.so. 2 ...Reading symbols from /usr/lib64/libgssapi_krb5.so. 2 ...(no debugging symbols found)...done.

Daniel Black added a comment - 2022-08-24 22:51

The binary is not technically stripped however a split-debug technique commonly used means that the debug info isn't in the binary, but in separate files, hence the debuginfo packages are still needed.

Missing debug information from the libraries mariadb uses isn't a large impediment as the fault is unlikely to be in these libraries. If in doubt, just include the generated gdb information.

If for some reason you feel uncomfortable with the detail in the gdb output you can upload it privately to the ftp server.

Daniel Black added a comment - 2022-08-24 22:51 The binary is not technically stripped however a split-debug technique commonly used means that the debug info isn't in the binary, but in separate files, hence the debuginfo packages are still needed. Missing debug information from the libraries mariadb uses isn't a large impediment as the fault is unlikely to be in these libraries. If in doubt, just include the generated gdb information. If for some reason you feel uncomfortable with the detail in the gdb output you can upload it privately to the ftp server .

Khai Ping added a comment - 2022-09-06 09:47 - edited

@daniel, we are building our own mariadb using the spec file , however the debuginfo rpm is not getting generated for 10.6.5 , however it is getting generated for 10.6.9

any idea what could be causing it?

Khai Ping added a comment - 2022-09-06 09:47 - edited @daniel, we are building our own mariadb using the spec file , however the debuginfo rpm is not getting generated for 10.6.5 , however it is getting generated for 10.6.9 any idea what could be causing it?

Jan Lindström (Inactive) made changes - 2022-09-19 09:45

Field	Original Value	New Value
Assignee		Jan Lindström [ jplindst ]

Daniel Black added a comment - 2022-09-26 23:17

> we are building our own mariadb using the spec file ,

Why? What is it?

> however the debuginfo rpm is not getting generated for 10.6.5 , however it is getting generated for 10.6.9
> any idea what could be causing it?

No. I could guess the cmake version is different. But I can't think of a code change that made this difference.

Daniel Black added a comment - 2022-09-26 23:17 > we are building our own mariadb using the spec file , Why? What is it? > however the debuginfo rpm is not getting generated for 10.6.5 , however it is getting generated for 10.6.9 > any idea what could be causing it? No. I could guess the cmake version is different. But I can't think of a code change that made this difference.

Khai Ping added a comment - 2022-10-18 06:44 - edited

hi daniel, the command provided by the doc does not seems to work in my system.

sudo gdb --batch --eval-command="thread apply all bt  -frame-arguments all full" /usr/sbin/mariadbd $(pgrep -xn mariadbd)  > mariadbd_full_bt_all_threads.txt

The command above give me output like this

[New LWP 27304]

[New LWP 27303]

[New LWP 27110]

...

...

[New LWP 29265]

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/usr/lib64/libthread_db.so.1".

0x00007f4727783ccd in poll () from /usr/lib64/libc.so.6

Thread 100 (Thread 0x7f4729b7c700 (LWP 29265)):

[Inferior 1 (process 29263) detached]

However, this alternate command seems to be working

sudo gdb --batch --eval-command="thread apply all bt full" /usr/sbin/mariadbd $(pgrep -xn mariadbd)  > mariadbd_full_bt_all_threads.txt

Sample output looks like this, is this something you guys are looking for?

Thread 1 (Thread 0x7f4729cfc8c0 (LWP 29263)):

#0  0x00007f4727783ccd in poll () from /usr/lib64/libc.so.6

No symbol table info available.

#1  0x000055b20755e7ca in poll (__timeout=-1, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46

No locals.

#2  handle_connections_sockets() () at /usr/src/debug/MariaDB-Galera-/src_0/sql/mysqld.cc:6112

        sock = {fd = 33, is_unix_domain_socket = <optimized out>, is_extra_port = <optimized out>, address_family = <optimized out>, m_psi = 0x0}

        error_count = 0

        cAddr = {ss_family = 1,

          __ss_padding = "\263$\177\000\000\001\000\000\000\000\000\000\000\000p\017\340\v\262U\000\000h\210\220\304\376\177\000\000\200%c\b\262U\000\000G*\344\a\262U\000\000\352\f\000\000\000\000\000\000\330+\344\a\262U\000\000\270\nc\b\262U\000\000\060Bc\b\262U\000\000\000\000\000\000\000\000\000\000!\000\000\000\000\000\000\000\300s\220\304\376\177\000\000I\002X\a\262U\000\000\023p\372\a\262U\000", __ss_align = 0}

        retval = <optimized out>

        fds = {array = {buffer = 0x55b20bd5d808 "\037", elements = 3, max_element = 16, alloc_increment = 16, size_of_element = 8, m_psi_key = 0, malloc_flags = 0}}

#3  0x000055b20755f739 in mysqld_main(int, char**) () at /usr/src/debug/MariaDB-Galera-/src_0/sql/mysqld.cc:5817

        please_close_stdin = true

        ho_error = <optimized out>

        new_thread_stack_size = <optimized out>

        user = <optimized out>

---Type <return> to continue, or q <return> to quit---

#4  0x00007f47276b2555 in __libc_start_main () from /usr/lib64/libc.so.6

No symbol table info available.

#5  0x000055b207553ec4 in _start () at /usr/src/debug/MariaDB-Galera-/src_0/sql/sql_array.h:129

Khai Ping added a comment - 2022-10-18 06:44 - edited hi daniel, the command provided by the doc does not seems to work in my system. sudo gdb --batch --eval-command= "thread apply all bt -frame-arguments all full" /usr/sbin/mariadbd $(pgrep -xn mariadbd) > mariadbd_full_bt_all_threads.txt The command above give me output like this [New LWP 27304 ] [New LWP 27303 ] [New LWP 27110 ] ... ... [New LWP 29265 ] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1" . 0x00007f4727783ccd in poll () from /usr/lib64/libc.so. 6 Thread 100 (Thread 0x7f4729b7c700 (LWP 29265 )): [Inferior 1 (process 29263 ) detached] However, this alternate command seems to be working sudo gdb --batch --eval-command= "thread apply all bt full" /usr/sbin/mariadbd $(pgrep -xn mariadbd) > mariadbd_full_bt_all_threads.txt Sample output looks like this, is this something you guys are looking for? Thread 1 (Thread 0x7f4729cfc8c0 (LWP 29263 )): # 0 0x00007f4727783ccd in poll () from /usr/lib64/libc.so. 6 No symbol table info available. # 1 0x000055b20755e7ca in poll (__timeout=- 1 , __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h: 46 No locals. # 2 handle_connections_sockets() () at /usr/src/debug/MariaDB-Galera-/src_0/sql/mysqld.cc: 6112 sock = {fd = 33 , is_unix_domain_socket = <optimized out>, is_extra_port = <optimized out>, address_family = <optimized out>, m_psi = 0x0 } error_count = 0 cAddr = {ss_family = 1 , __ss_padding = "\263$\177\000\000\001\000\000\000\000\000\000\000\000p\017\340\v\262U\000\000h\210\220\304\376\177\000\000\200%c\b\262U\000\000G*\344\a\262U\000\000\352\f\000\000\000\000\000\000\330+\344\a\262U\000\000\270\nc\b\262U\000\000\060Bc\b\262U\000\000\000\000\000\000\000\000\000\000!\000\000\000\000\000\000\000\300s\220\304\376\177\000\000I\002X\a\262U\000\000\023p\372\a\262U\000" , __ss_align = 0 } retval = <optimized out> fds = {array = {buffer = 0x55b20bd5d808 "\037" , elements = 3 , max_element = 16 , alloc_increment = 16 , size_of_element = 8 , m_psi_key = 0 , malloc_flags = 0 }} # 3 0x000055b20755f739 in mysqld_main( int , char **) () at /usr/src/debug/MariaDB-Galera-/src_0/sql/mysqld.cc: 5817 please_close_stdin = true ho_error = <optimized out> new_thread_stack_size = <optimized out> user = <optimized out> ---Type < return > to continue , or q < return > to quit--- # 4 0x00007f47276b2555 in __libc_start_main () from /usr/lib64/libc.so. 6 No symbol table info available. # 5 0x000055b207553ec4 in _start () at /usr/src/debug/MariaDB-Galera-/src_0/sql/sql_array.h: 129

Elena Stepanova made changes - 2022-11-23 15:58

Fix Version/s

10.6 [ 24028 ]

Jan Lindström (Inactive) added a comment - 2023-01-02 13:29

khaiping.loh Yes, that output would be more than useful. Please provide also full error log. Can you try with more recent version of MariaDB and Galera library.

Jan Lindström (Inactive) added a comment - 2023-01-02 13:29 khaiping.loh Yes, that output would be more than useful. Please provide also full error log. Can you try with more recent version of MariaDB and Galera library.

Jan Lindström (Inactive) made changes - 2023-01-02 13:29

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

Roel Van de Paar made changes - 2023-01-17 02:40

Link

This issue relates to ~~MDEV-27689~~ [ ~~MDEV-27689~~ ]

king added a comment - 2023-01-30 04:06

10.6.8 have the same problem

king added a comment - 2023-01-30 04:06 10.6.8 have the same problem

Daniel Black added a comment - 2023-01-30 04:51

> Please provide also full error log

and the full output of the sudo gdb --batch ....

Daniel Black added a comment - 2023-01-30 04:51 > Please provide also full error log and the full output of the sudo gdb --batch ... .

Khai Ping made changes - 2023-02-10 06:22

Attachment

mariadbd_full_bt_all_threads.txt [ 68208 ]

Khai Ping added a comment - 2023-02-10 06:23 - edited

@daniel , i have attached mariadbd_full_bt_all_threads.txt .

Is this issue resolve in mariadb 10.6.12? I am referencing this ticket https://jira.mariadb.org/browse/MDEV-29684, it seems like it is fixed?

Khai Ping added a comment - 2023-02-10 06:23 - edited @daniel , i have attached mariadbd_full_bt_all_threads.txt . Is this issue resolve in mariadb 10.6.12? I am referencing this ticket https://jira.mariadb.org/browse/MDEV-29684 , it seems like it is fixed?

Daniel Black added a comment - 2023-02-10 07:11

Thank you. What analysis have you done that makes you think it is ~~MDEV-29684~~?

This does have killed threads holding locks so it potentially the same, but a more complete look than what I have time for now is required to be more definate.

Daniel Black added a comment - 2023-02-10 07:11 Thank you. What analysis have you done that makes you think it is MDEV-29684 ? This does have killed threads holding locks so it potentially the same, but a more complete look than what I have time for now is required to be more definate.

Daniel Black made changes - 2023-02-10 07:11

Assignee

Jan Lindström [ jplindst ]

Julius Goryavsky [ sysprg ]

Daniel Black made changes - 2023-02-10 07:11

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Khai Ping added a comment - 2023-02-10 08:40

Due to the release notes , https://mariadb.com/kb/en/mariadb-10-6-12-release-notes/ . It mention "Fixes for cluster wide write conflict resolving"
In my environment, we only see this problem in multi master galera nodes. Our application can send the writes to any of the galera nodes. In high concurrency, there is bound to have galera write conflicts. In ~~MDEV-29684~~, it mention this line "This requires multi-master testing"
And also in ~~MDEV-29684~~, the first sentence mention this "There are a number of bug reports of cluster wide conflict resolving related crashes or hangs." When this issue happens in our environment, nothing can be written anymore, it is as tho the cluster hung

Appreciate your prompt response, i hope the bt thread logs is helpful. That log is retrieve from the node that hung.

Khai Ping added a comment - 2023-02-10 08:40 Due to the release notes , https://mariadb.com/kb/en/mariadb-10-6-12-release-notes/ . It mention "Fixes for cluster wide write conflict resolving" In my environment, we only see this problem in multi master galera nodes. Our application can send the writes to any of the galera nodes. In high concurrency, there is bound to have galera write conflicts. In MDEV-29684 , it mention this line "This requires multi-master testing" And also in MDEV-29684 , the first sentence mention this "There are a number of bug reports of cluster wide conflict resolving related crashes or hangs." When this issue happens in our environment, nothing can be written anymore, it is as tho the cluster hung Appreciate your prompt response, i hope the bt thread logs is helpful. That log is retrieve from the node that hung.

Khai Ping made changes - 2023-02-11 08:07

Attachment

mariadbd_full_bt_all_threads_11feb246.txt [ 68215 ]

Khai Ping added a comment - 2023-02-11 08:08

uploaded another logs mariadbd_full_bt_all_threads_11feb246.txt

Khai Ping added a comment - 2023-02-11 08:08 uploaded another logs mariadbd_full_bt_all_threads_11feb246.txt

Khai Ping added a comment - 2023-02-15 23:40

@Julius Goryavsky , any idea if the logs were useful in helping to find out if it is related to ~~MDEV-29684~~?

Khai Ping added a comment - 2023-02-15 23:40 @Julius Goryavsky , any idea if the logs were useful in helping to find out if it is related to MDEV-29684 ?

Julius Goryavsky made changes - 2023-02-28 13:31

Status

Open [ 1 ]

In Progress [ 3 ]

Julien Fritsch added a comment - 2023-12-05 09:26

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Julien Fritsch added a comment - 2023-12-05 09:26 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:48

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:48 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate made changes - 2023-12-05 11:48

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Julius Goryavsky made changes - 2024-03-25 21:09

Priority

Critical [ 2 ]

Major [ 3 ]

Julius Goryavsky made changes - 2024-03-25 21:31

Link

This issue relates to ~~MDEV-30718~~ [ ~~MDEV-30718~~ ]

Julius Goryavsky made changes - 2024-03-25 21:33

Assignee

Julius Goryavsky [ sysprg ]

Seppo Jaakola [ seppo ]

Khai Ping added a comment - 2024-03-26 01:45

anyone looked at the debug stacktrace?

Khai Ping added a comment - 2024-03-26 01:45 anyone looked at the debug stacktrace?

Jan Lindström made changes - 2024-03-26 06:53

Assignee

Seppo Jaakola [ seppo ]

Jan Lindström [ JIRAUSER53125 ]

Jan Lindström made changes - 2024-03-26 07:15

Status

Stalled [ 10000 ]

Needs Feedback [ 10501 ]

Jan Lindström added a comment - 2024-03-26 07:17

khaiping.loh Can you provide full unedited error log from node that hangs, show processlist, show engine innodb status? Is issue reproducible? If it is can you provide steps to reproduce. Used MariaDB server and Galera library versions are quite old, please consider upgrading to more recent ones. More recent version has fixes on cluster conflict hang cases and this could be one of them.

Jan Lindström added a comment - 2024-03-26 07:17 khaiping.loh Can you provide full unedited error log from node that hangs, show processlist, show engine innodb status? Is issue reproducible? If it is can you provide steps to reproduce. Used MariaDB server and Galera library versions are quite old, please consider upgrading to more recent ones. More recent version has fixes on cluster conflict hang cases and this could be one of them.

Khai Ping added a comment - 2024-03-26 07:23 - edited

Jan Lindström , when the issue happen is there is no errors on the mysql.err. However, when we go into one of the node, the processlist will show a system_user thread stuck indefinitely.

Do you know which version have those fixes related to cluster hanging? Is it 10.6.15?

Khai Ping added a comment - 2024-03-26 07:23 - edited Jan Lindström , when the issue happen is there is no errors on the mysql.err. However, when we go into one of the node, the processlist will show a system_user thread stuck indefinitely. Do you know which version have those fixes related to cluster hanging? Is it 10.6.15?

Jan Lindström added a comment - 2024-03-26 08:04

khaiping.loh I looked the stack trace and I can find selects from there but not that update-clause. I do not see any real evidence that server would be hang. https://mariadb.com/kb/en/mariadb-10-6-12-release-notes/ contains the fix but https://mariadb.com/kb/en/mariadb-10-6-17-release-notes/ is the latest and recommended release.

Jan Lindström added a comment - 2024-03-26 08:04 khaiping.loh I looked the stack trace and I can find selects from there but not that update-clause. I do not see any real evidence that server would be hang. https://mariadb.com/kb/en/mariadb-10-6-12-release-notes/ contains the fix but https://mariadb.com/kb/en/mariadb-10-6-17-release-notes/ is the latest and recommended release.

Khai Ping added a comment - 2024-03-27 04:15 - edited

Jan Lindström , i noticed this changelog in 10.6.15 as well. Could this help also?

MariaDB stuck on starting commit state (waiting on commit order critical section) (~~MDEV-29293~~)

Looking at the performance regression in ~~MDEV-33508~~, i do not think upgrading it to 10.6.17 should be recommended?

Khai Ping added a comment - 2024-03-27 04:15 - edited Jan Lindström , i noticed this changelog in 10.6.15 as well. Could this help also? MariaDB stuck on starting commit state (waiting on commit order critical section) ( MDEV-29293 ) Looking at the performance regression in MDEV-33508 , i do not think upgrading it to 10.6.17 should be recommended?

Jan Lindström added a comment - 2024-03-27 07:23

khaiping.loh Yes it would help, but then I did not see evidence you are hitting it. I do not know how severe the performance regression is.

Jan Lindström added a comment - 2024-03-27 07:23 khaiping.loh Yes it would help, but then I did not see evidence you are hitting it. I do not know how severe the performance regression is.

Khai Ping made changes - 2024-04-19 04:13

Attachment

mariadb stacktrace.zip [ 73413 ]

Khai Ping added a comment - 2024-04-19 04:13

@jan, i uploaded another set of stacktrace of another cluster whereby 1 node hung. Inside the logs contain 3 servers.

[^mariadb stacktrace.zip]

Khai Ping added a comment - 2024-04-19 04:13 @jan, i uploaded another set of stacktrace of another cluster whereby 1 node hung. Inside the logs contain 3 servers. [^mariadb stacktrace.zip]

Jan Lindström added a comment - 2024-04-19 04:31

khaiping.loh I can't read those MAXOS files but I strongly suspect https://jira.mariadb.org/browse/MDEV-29293 for that you would need to upgrade.

Jan Lindström added a comment - 2024-04-19 04:31 khaiping.loh I can't read those MAXOS files but I strongly suspect https://jira.mariadb.org/browse/MDEV-29293 for that you would need to upgrade.

Khai Ping made changes - 2024-04-19 05:14

Attachment

mariadb stacktrace.zip [ 73413 ]

Khai Ping made changes - 2024-04-19 05:14

Attachment		mariadbd_full_bt_all_threads-h14_1712676357.log [ 73414 ]
Attachment		mariadbd_full_bt_all_threads-h15_1712676357.log [ 73415 ]
Attachment		mariadbd_full_bt_all_threads-h12_1712676357.log [ 73416 ]

Khai Ping added a comment - 2024-04-19 05:15

@jan , i uploaded the non-zip files.

mariadbd_full_bt_all_threads-h12_1712676357.log
mariadbd_full_bt_all_threads-h15_1712676357.log
mariadbd_full_bt_all_threads-h14_1712676357.log

Thanks, we will proceed with the upgrade .

Khai Ping added a comment - 2024-04-19 05:15 @jan , i uploaded the non-zip files. mariadbd_full_bt_all_threads-h12_1712676357.log mariadbd_full_bt_all_threads-h15_1712676357.log mariadbd_full_bt_all_threads-h14_1712676357.log Thanks, we will proceed with the upgrade .

Jan Lindström added a comment - 2024-04-19 05:31

khaiping.loh Thanks, I can confirm it is caused by ~~MDEV-29293~~.

Jan Lindström added a comment - 2024-04-19 05:31 khaiping.loh Thanks, I can confirm it is caused by MDEV-29293 .

Jan Lindström made changes - 2024-04-19 05:31

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Jan Lindström made changes - 2024-04-19 05:31

Status

Open [ 1 ]

In Progress [ 3 ]

Jan Lindström made changes - 2024-04-19 05:31

Fix Version/s		10.6.15 [ 29013 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Jan Lindström made changes - 2024-04-19 05:32

Link

This issue is caused by ~~MDEV-29293~~ [ ~~MDEV-29293~~ ]

Khai Ping added a comment - 2024-04-19 05:41

thank you so much.

Khai Ping added a comment - 2024-04-19 05:41 thank you so much.

Jan Lindström made changes - 2024-04-19 11:27

Fix Version/s		11.3.2 [ 29522 ]
Fix Version/s		11.2.3 [ 29521 ]
Fix Version/s		11.1.4 [ 29024 ]
Fix Version/s		10.11.7 [ 29519 ]
Fix Version/s		10.10.7 [ 29018 ]
Fix Version/s		10.9.8 [ 29015 ]

Khai Ping added a comment - 2024-08-06 08:11

@jan , can you tell me based on a stack trace how we can identify the issue?

Khai Ping added a comment - 2024-08-06 08:11 @jan , can you tell me based on a stack trace how we can identify the issue?

Jan Lindström added a comment - 2024-08-06 08:26

khaiping.loh If you find a thread doing sql_kill function it was ~~MDEV-29293~~.

Jan Lindström added a comment - 2024-08-06 08:26 khaiping.loh If you find a thread doing sql_kill function it was MDEV-29293 .

Khai Ping added a comment - 2024-08-07 03:51 - edited

@jan, thanks!

How about seeing this in the processlist ? It seems to have cause the hung too. In this example, unfortunately i do not have the stacktrace.

ID,QUERY_ID,USER,DB,TIME,STATE,MEMORY_USED,MAX_MEMORY_USED,EXAMINED_ROWS,TID,INFO
276168,3416992,flask_user,None,4517,acquiring total order isolation,75568,75568,0,2831340,KILL CONNECTION ?
276167,3416991,flask_user,None,4517,acquiring total order isolation,74712,74712,0,2831239,KILL CONNECTION ?
276152,3416920,flask_user,None,4545,acquiring total order isolation,74712,74712,0,2831209,KILL CONNECTION ?
276141,3416909,flask_user,None,4566,acquiring total order isolation,74712,74712,0,2831188,KILL CONNECTION ?

When that happen, we noticed alot of commit transaction were stuck

277541,3422826,app_user,database_1,601,starting,83152,1033792,0,313531,COMMIT
277149,3421496,app_user,database_1,1501,starting,82080,1032720,0,2835445,COMMIT
276707,3420193,app_user,database_1,2401,starting,82080,1032720,0,2834972,COMMIT
276639,3420300,app_user,replication,2323,starting,82080,1032720,0,2834556,COMMIT

Seems to be related to ~~MDEV-29293~~ as well.

Khai Ping added a comment - 2024-08-07 03:51 - edited @jan, thanks! How about seeing this in the processlist ? It seems to have cause the hung too. In this example, unfortunately i do not have the stacktrace. ID,QUERY_ID,USER,DB,TIME,STATE,MEMORY_USED,MAX_MEMORY_USED,EXAMINED_ROWS,TID,INFO 276168,3416992,flask_user,None,4517,acquiring total order isolation,75568,75568,0,2831340,KILL CONNECTION ? 276167,3416991,flask_user,None,4517,acquiring total order isolation,74712,74712,0,2831239,KILL CONNECTION ? 276152,3416920,flask_user,None,4545,acquiring total order isolation,74712,74712,0,2831209,KILL CONNECTION ? 276141,3416909,flask_user,None,4566,acquiring total order isolation,74712,74712,0,2831188,KILL CONNECTION ? When that happen, we noticed alot of commit transaction were stuck 277541,3422826,app_user,database_1,601,starting,83152,1033792,0,313531,COMMIT 277149,3421496,app_user,database_1,1501,starting,82080,1032720,0,2835445,COMMIT 276707,3420193,app_user,database_1,2401,starting,82080,1032720,0,2834972,COMMIT 276639,3420300,app_user,replication,2323,starting,82080,1032720,0,2834556,COMMIT Seems to be related to MDEV-29293 as well.

Jan Lindström added a comment - 2024-08-07 05:19

khaiping.loh Yes, it is indication of ~~MDEV-29293~~ fixed on more recent version of MariaDB server.

Jan Lindström added a comment - 2024-08-07 05:19 khaiping.loh Yes, it is indication of MDEV-29293 fixed on more recent version of MariaDB server.

Khai Ping added a comment - 2024-08-07 06:35

@jan, thanks again!

Khai Ping added a comment - 2024-08-07 06:35 @jan, thanks again!

People

Assignee:: Jan Lindström

Reporter:: Khai Ping

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2022-08-22 04:34

Updated:: 2024-08-07 06:35

Resolved:: 2024-04-19 05:31

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration