[MDEV-30549] MariaDB 10.5.16 hang for a long time after shutdown Created: 2023-02-02  Updated: 2023-03-07

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.5.16
Fix Version/s: None

Type: Bug Priority: Major
Reporter: dongjian Assignee: Angelique Sklavounos (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Environment:

centos 7.8


Attachments: Text File pstack.txt    

 Description   

My production server was suddenly hanging after shutdown. I just executed some normal SQL statements, and turned on semi-synchronous replication.

2022-11-19 16:26:47 0 [Note] /home/sql/bin/mysqld: ready for connections.
Version: '10.5.16-MariaDB-log'  socket: '/data/mysql.sock'  port: 3306  MairaDB Community Server - (GPL)
2022-11-19 16:26:50 5 [Note] Semi-sync replication switched OFF.
2022-11-19 16:26:50 5 [Note] Semi-sync replication disabled on the master.
2022-11-19 16:26:50 5 [Note] Enable Semi-sync Master after reset master
2022-11-19 16:26:50 5 [Note] Semi-sync replication initialized for transactions.
2022-11-19 16:26:50 5 [Note] Semi-sync replication enabled on the master.
2022-11-19 16:27:00 6 [Warning] Timeout waiting for reply of binlog (file: mysql-bin.000001, pos: 512), semi-sync up to file , position 0.
2022-11-19 16:27:00 6 [Note] Semi-sync replication switched OFF.
2022-11-19 16:27:00 6 [Warning] 'user' entry '@cr1' ignored in --skip-name-resolve mode.
2022-11-19 16:27:00 6 [Warning] 'proxies_priv' entry '@% root@cr1' ignored in --skip-name-resolve mode.
2022-11-19 16:27:07 7 [Note] Start binlog_dump to slave_server(4640), pos(, 4), using_gtid(1), gtid('')
2022-11-19 16:27:07 7 [Note] Start semi-sync binlog_dump to slave (server_id: 4640), pos(./mysql-bin.000001, 4)
2022-11-19 16:27:07 1 [Note] Semi-sync replication switched ON with slave (server_id: 4640) at (mysql-bin.000001, 1807)
2022-11-19 16:27:22 8 [Note] Start binlog_dump to slave_server(8101), pos(, 4), using_gtid(1), gtid('')
2022-11-19 16:28:09 0 [Note] /home/sql/bin/mysqld (initiated by: root[root] @  [192.168.1.101]): Normal shutdown
2022-11-19 16:28:09 0 [Note] Event Scheduler: Purging the queue. 0 events
2022-11-19 16:28:11 1 [Note] Stopping ack receiver thread



 Comments   
Comment by Angelique Sklavounos (Inactive) [ 2023-02-28 ]

Hi dongjian,

Thanks for the report. Could you please provide my.cnf files for the servers, as well as the SQL statements (I suspect they include RESET MASTER and FLUSH LOGS)? Also, can I confirm there is one master and two slaves? Do both slaves have semisync replication enabled, or just slave server 4640? Thank you.

Comment by dongjian [ 2023-03-01 ]

Hi Angelique Sklavounos,
Thanks for your help. You are right, there is indeed a RESET MASTER statement.
There is one master and two slaves, one of the slaves is semi-synchronous replication.

Will this cause the hang?

[client]
port = 3306
socket = /data/mysql.sock
default-character-set = utf8mb4
 
[mysqld]
loose_rpl_semi_sync_master_enabled=on
loose_rpl_semi_sync_slave_enabled=on
max_prepared_stmt_count=1048576
basedir = /home/sql
safe-user-create
back_log = 3000
bind-address = 0.0.0.0
binlog_format = ROW
innodb_stats_include_delete_marked = ON
datadir = /data/
default_storage_engine = InnoDB
innodb_buffer_pool_size = 2147483648
innodb_data_file_path = ibdata1:128M:autoextend
innodb_file_per_table = ON
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 4
innodb_log_file_size = 2000M
key_buffer_size = 16M
log-bin = mysql-bin
log-error = /data/error.log
log_bin_trust_function_creators = ON
log_slave_updates = true
master_info_repository = TABLE
max_binlog_size = 536870912
max_connections = 2000
open_files_limit = 500000
pid-file = /data/mysql.pid
port = 3306
relay-log = mysql-relay-bin
server-id = 3630784761
socket = /data/mysql.sock
max_allowed_packet = 1073741824
relay_log_recovery = ON
slave_net_timeout = 60
slave_parallel_workers = 64
innodb_strict_mode = OFF

Comment by Angelique Sklavounos (Inactive) [ 2023-03-01 ]

Thanks, dongjian. Ideally, the RESET MASTER should not cause a hang - I was asking based on what was in the error log.

I am trying to reproduce the hang but have not been able to so far, so I have some more questions:

  1. What was the exact command(s) for shutdown?
  2. Do you remember the queries (apart from RESET MASTER)? Is it possible to recreate this, or did this only happen once and you have not seen it since?
  3. How were the executables obtained? I am mainly wondering because of the typo MairaDB in the error log.

Thank you.

Comment by dongjian [ 2023-03-02 ]

Hi Angelique Sklavounos,
Thanks for your reply.
I executed shutdown through mysqldadmin shutdown command. I mostly performed sysbench tests,and i was doing some failure tests with killing. This phenomenon has happened 3 times and has not been reproduced recently.
This executable was compiled by myself. I didn't make many changes, just added some debugging information based on 10.5.16.

I guess whether this hang is related to the modification of MDEV-11853.

Thank you.

Comment by Angelique Sklavounos (Inactive) [ 2023-03-02 ]

Thanks, dongjian. MDEV-11853 might be relevant.

Comment by Brandon Nesterenko [ 2023-03-03 ]

Hi dongjian

Looking at your stack trace, it looks like the cause of your hang may be the same that caused MDEV-24660, which was fixed in 10.5.18. Can you try upgrading and see if your server still hangs on shutdown?

Comment by dongjian [ 2023-03-06 ]

Thanks, Brandon Nesterenko
I'm going to try MariaDB 10.5.18.
The reason for the hang of MDEV-24660 and MDEV-30549 seems to be different. I guess the reason for MDEV-3054 seems to be waiting for the semi-synchronous slave to return an ack, an unknown problem occurred during the waiting period of the master.

Generated at Thu Feb 08 10:17:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.