[MDEV-12344] Crash after kill of first hanged SHOW STATUS query Created: 2017-03-23  Updated: 2023-03-07  Resolved: 2019-05-20

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication
Affects Version/s: 10.1.22
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Ján Regeš Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 1
Labels: Galera, crash, status
Environment:

Gentoo, kernel 4.4.6, 8x Intel Xeon E5-2643 0 @ 3.30GHz, 8GB RAM, 3 Galera nodes on the same stable network (no geo-replication)


Attachments: PNG File Screenshot 2023-03-07 at 16.34.41.png    
Issue Links:
Relates
relates to MDEV-11312 One Galera node hangs on "show global... Closed

 Description   

Hi,

we have Galera cluster based on 3 nodes, each with latest MariaDB 10.1.22.

Today morning, one node hangs/locked on query `SHOW STATUS LIKE 'wsrep_local_state'. The same issue I've already reported 5 months ago (still unresolved issue): https://jira.mariadb.org/browse/MDEV-11312

I attach full processlist from hanged time. In this time, i killed first/oldest hanged process with ID 109921.

After that, mysqld crashed immediately (i attached error log). There was no query present in an error log. At the end, there is few rows logged after my manual MySQL start.

Maybe, it is related to libjemalloc. Server has 8 GB RAM and only 2 GB are allocated, so free memory is not a problem.

Thank you for your help. It's really big issue for us. Galera is not stable already for 5 months and there is only 1 relatively small database with only 20-100 req/s. Nodes are on the same stable network with latency about ~0.3ms.

Processlist

+--------+-----------------+--------------------+------------+---------+--------+-----------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id     | User            | Host               | db         | Command | Time   | State                       | Info                                                                                                 | Progress |
+--------+-----------------+--------------------+------------+---------+--------+-----------------------------+------------------------------------------------------------------------------------------------------+----------+
| 1      | system user     |                    |            | Sleep   | 129115 | wsrep aborter idle          |                                                                                                      | 0.000    |
| 2      | system user     |                    |            | Sleep   | 217    | committed 15228746          |                                                                                                      | 0.000    |
| 4      | event_scheduler | localhost          |            | Daemon  | 129112 | Waiting for next activation |                                                                                                      | 0.000    |
| 5      | system user     |                    |            | Sleep   | 217    | committed 15228745          |                                                                                                      | 0.000    |
| 6      | system user     |                    |            | Sleep   | 217    | committed 15228748          |                                                                                                      | 0.000    |
| 7      | system user     |                    |            | Sleep   | 217    | committed 15228743          |                                                                                                      | 0.000    |
| 8      | system user     |                    |            | Sleep   | 217    | committed 15228747          |                                                                                                      | 0.000    |
| 10     | system user     |                    |            | Sleep   | 217    | committed 15228744          |                                                                                                      | 0.000    |
| 109921 | clustercheck    | localhost          |            | Query   | 214    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109924 | project_cz      | 10.234.4.35:19083  | project_cz | Query   | 213    | query end                   | UPDATE `geo_ip` SET `datetime_last_analysis`='2017-03-23 11:33:23' WHERE (`id` = 484297)             | 0.000    |
| 109925 | clustercheck    | localhost          |            | Query   | 212    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109926 | monitoring      | 10.234.3.18:45170  |            | Query   | 212    | Filling schema table        | show global status                                                                                   | 0.000    |
| 109928 | clustercheck    | localhost          |            | Query   | 210    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109929 | clustercheck    | localhost          |            | Query   | 209    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109930 | clustercheck    | localhost          |            | Query   | 203    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109932 | clustercheck    | localhost          |            | Query   | 199    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109933 | clustercheck    | localhost          |            | Query   | 197    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109935 | clustercheck    | localhost          |            | Query   | 195    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109936 | clustercheck    | localhost          |            | Query   | 191    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109938 | clustercheck    | localhost          |            | Query   | 185    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109939 | clustercheck    | localhost          |            | Query   | 185    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109941 | clustercheck    | localhost          |            | Query   | 180    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109943 | project_cz      | 10.234.4.35:19409  | project_cz | Query   | 174    | query end                   | INSERT INTO `document__download` (`id_document`, `unique_id`, `ip`, `datetime_inserted`, `downloaded | 0.000    |
| 109944 | clustercheck    | localhost          |            | Query   | 171    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109946 | clustercheck    | localhost          |            | Query   | 165    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109948 | clustercheck    | localhost          |            | Query   | 157    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109952 | clustercheck    | localhost          |            | Query   | 150    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109954 | clustercheck    | localhost          |            | Query   | 143    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109956 | clustercheck    | localhost          |            | Query   | 136    | Filling schema table        | SHOW STATUS LIKE 'wsrep_local_state'                                                                 | 0.000    |
| 109961 | monitoring      | localhost          |            | Query   | 116    | Filling schema table        | show /*!50002 GLOBAL */ status                                                                       | 0.000    |
| 109971 | root            | 192.168.2.48:50709 |            | Sleep   | 2      |                             |                                                                                                      | 0.000    |
| 109981 | monitoring      | 10.234.3.18:48126  |            | Query   | 30     | Filling schema table        | show global status                                                                                   | 0.000    |
| 109986 | monitoring      | localhost          |            | Query   | 0      | init                        | show processlist                                                                                     | 0.000    |
+--------+-----------------+--------------------+------------+---------+--------+-----------------------------+------------------------------------------------------------------------------------------------------+----------+

Error log

170323 11:37:45 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.22-MariaDB
key_buffer_size=33554432
read_buffer_size=1048576
max_used_connections=26
max_threads=152
thread_count=34
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 347165 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fa4a54d7008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa4d33ece58 thread_stack 0x80000
/usr/sbin/mysqld(my_print_stacktrace+0x29)[0x560cd61e43bf]
/usr/sbin/mysqld(handle_fatal_signal+0x327)[0x560cd5d690b5]
/lib64/libpthread.so.0(+0x103f0)[0x7fa59783e3f0]
/usr/lib64/libjemalloc.so.1(+0xc691)[0x7fa598531691]
/usr/lib64/libjemalloc.so.1(+0x21a0d)[0x7fa598546a0d]
/usr/lib64/libjemalloc.so.1(malloc+0x1f2)[0x7fa59852ac94]
/usr/sbin/mysqld(my_malloc+0x39)[0x560cd61e0ba0]
/usr/sbin/mysqld(alloc_root+0x9c)[0x560cd61da25b]
/usr/sbin/mysqld(memdup_root+0x14)[0x560cd61da629]
/usr/sbin/mysqld(+0x3d1a7c)[0x560cd5bafa7c]
/usr/sbin/mysqld(+0x3d21a4)[0x560cd5bb01a4]
/usr/sbin/mysqld(+0x3d1bce)[0x560cd5bafbce]
/usr/sbin/mysqld(+0x3d0703)[0x560cd5bae703]
/usr/sbin/mysqld(_Z16acl_authenticateP3THDj+0x1f9)[0x560cd5bc74ed]
/usr/sbin/mysqld(+0x4e7be7)[0x560cd5cc5be7]
/usr/sbin/mysqld(_Z16login_connectionP3THD+0x42)[0x560cd5cc6f6d]
/usr/sbin/mysqld(_Z22thd_prepare_connectionP3THD+0x11)[0x560cd5cc78e6]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x177)[0x560cd5cc7a9e]
/usr/sbin/mysqld(handle_one_connection+0x37)[0x560cd5cc7c63]
/lib64/libpthread.so.0(+0x734c)[0x7fa59783534c]
/lib64/libc.so.6(clone+0x6d)[0x7fa596d1f52d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer
Connection ID (thread ID): 109996
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loos
escan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_c
ache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway.
Query:
 
2017-03-23 11:38:18 140200819652480 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-03-23 11:38:18 140200819652480 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2017-03-23 11:38:18 140200819652480 [Note] WSREP: wsrep_load(): Galera 3.17(r447d194) by Codership Oy <info@codership.com> loaded successfully.
2017-03-23 11:38:18 140200819652480 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2017-03-23 11:38:18 140200819652480 [Note] WSREP: Found saved state: cb18d9d7-dc8f-11e6-9b71-37b745222498:-1
2017-03-23 11:38:18 140200819652480 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.234.4.35; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30
S; evs.inactive_check_period = PT3S; evs.inactive_timeout = PT20S; evs.install_timeout = PT20S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT2S; evs.max_install_timeouts = 3; evs.send_window = 8; evs.stats_report_period = PT1M; evs.suspect_time
out = PT10S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 512M; gcache.size = 1G; gcomm.thread_prio
= ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0
; gmcast.version = 0; pc.announce_timeout
2017-03-23 11:38:18 140199509595904 [Note] WSREP: Service thread queue flushed.



 Comments   
Comment by Steve Ruby [ 2017-07-03 ]

I am seeing something similar with these hanged queries that are in the state of "filling schema table" I'm 10.2.6 on Windows server 2008 R2. This issue only started after an upgrade from 10.1.22 TO 10.2.6

I'm not running Galera .

the server hangs on several queries like "SHOW STATUS" killing the server has twice resulted in data innodb corruption and having rebuild the server with > 350GB of database is VERY time consuming.

I am seeing similar error log entries which include the message "This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware."

Comment by Vladislav Vaintroub [ 2017-07-03 ]

stever, maybe you can attach the error log, and the mysqld.dmp from the crash. It maybe something else mostly likely, but Windows does not have any galera

Comment by Jan Lindström (Inactive) [ 2019-05-20 ]

No feedback provided.

Comment by Gordan Bobic [ 2023-03-07 ]

I just hit this on 10.5.17, not Galera.

Generated at Thu Feb 08 07:56:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.