[MDEV-28700] MariaDB will produce binlog even though I canceled with `control-c` signal when the master with rpl_semi_sync_master_enabled=1 and has no slave nodes Created: 2022-05-30  Updated: 2023-08-08

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6.7
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Kuijun Cui Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
Kernel version: 3.10.0-693.5.2.el7.x86_64
##################################

        1. DB configuration:
          [mysqld]
                    1. Server Level ##########
                      port=3306
                      basedir=/opt/maria10.1
                      datadir=/opt/maria10.1/data/db
                      tmpdir=/opt/maria10.1/data/tmp
                      socket=/opt/maria10.1/data/mysql.sock
                      pid-file=/opt/maria10.1/data/mysqld.pid
                      lc-messages-dir=/opt/maria10.1/share
                      lc-messages=en_US
                      lower-case-table-names=1
          1. Character #####
            character-set-client-handshake=FALSE
            character-set-server=utf8mb4
            collation-server=utf8mb4_unicode_ci
          1. Security/Mode #####
            #skip_name_resolve
            bind_address=0.0.0.0
            explicit_defaults_for_timestamp=1
            sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
            skip-ssl
          2. Connection #####
            max_connections=1000
            max_connect_errors=65536
            max_user_connections=2900
            max_allowed_packet=128M
            connect_timeout=20
            back_log=65535
            net_read_timeout=30
            net_write_timeout=60
            net_buffer_length=2K # default 16K
          1. per-thread Buffer #####
            #read_buffer_size=128K
            #read_rnd_buffer_size=256K
            sort_buffer_size=32M
            #join_buffer_size=2M
            optimizer_switch="index_merge_sort_intersection=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,optimize_join_buffer_size=on"
            join_cache_level=8
            join_buffer_size=4M
            join_buffer_space_limit=250M
            mrr_buffer_size=4M
            binlog_cache_size=16M
            thread_stack=292K
          1. Global Buffer #####
            key_buffer_size=10M
            query_cache_type = 0
            query_cache_size = 0
          1. THREAD POOL #####
            thread_handling=pool-of-threads
            thread_pool_max_threads=30000
          1. TABLE CACHE #####
            table_definition_cache=16500
            table_open_cache=2000
            table_open_cache_instances=8
          1. IO #####
            open_files_limit=65000
          1. OTHERS #####
            tmp_table_size=500M
            max_heap_table_size=16M # Memory Engine
            max_prepared_stmt_count=1310560
                    1. Server LOG ##########
          1. general log #####
            general_log=0
          1. err log #####
            log_error=/opt/maria10.1/log/alert_TMTEST.log
            log_warnings=1
          1. bin log #####
            server_id=14285
            log_bin=/opt/maria10.1/binlog/BINLOG
            binlog_format=ROW
            binlog_row_image=FULL
            sync_binlog=1
            log_slave_updates=1
            expire_logs_days=2
            max_binlog_size=128M
            max_binlog_cache_size=1G
          1. GTID #####
            gtid_domain_id=14282
            gtid_strict_mode=1
            gtid_ignore_duplicates=on
          1. slow log #####
            slow_query_log=1
            slow_query_log_file=/opt/maria10.1/log/slow.log
            log_output='FILE'
            long_query_time=1
            log_slow_admin_statements=0
            log_slow_rate_limit=1
            log_slow_verbosity='query_plan,explain'
          1. audit log #####
            plugin-load=server_audit=server_audit.so
            server_audit=FORCE_PLUS_PERMANENT
            server_audit_logging=1
            server_audit_file_path=/opt/maria10.1/log/server_audit.log
            server_audit_events='CONNECT,QUERY'
            server_audit_file_rotate_size=524288000 # 500M
            server_audit_file_rotations=9
            server_audit_output_type='file'
            server_audit_excl_users='event_scheduler'
            server_audit_query_log_limit=10240
                    1. INNODB ##########
                      default_storage_engine=InnoDB
          1. Tablespace #####
            innodb_file_per_table=1
          1. Buffer Pool #####
            #innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend
            innodb_buffer_pool_size=6G # innodb data cache. 70% of total RAM
            #innodb_buffer_pool_instances=2
            innodb_max_dirty_pages_pct=60
          1. Redo #####
            innodb_log_buffer_size=200M
            innodb_log_file_size=1G
            #innodb_log_files_in_group=2
          1. Undo #####
            innodb_undo_directory=/opt/maria10.1/data/db
            #innodb_undo_logs=128
            innodb_undo_tablespaces=4
          1. Locking #####
            innodb_print_all_deadlocks=1
            innodb_lock_wait_timeout=50 # 50s
            innodb_autoinc_lock_mode=2
            innodb_rollback_on_timeout=1
          1. Data Security #####
            innodb_doublewrite=1
            innodb_strict_mode=1
            innodb_flush_log_at_trx_commit=1
            transaction-isolation=READ-COMMITTED
          1. IO #####
            innodb_flush_method=O_DIRECT
            innodb_purge_threads=1
            innodb_io_capacity=1000 # if SSD, set 4000
            innodb_flush_neighbors=1 # if SSD, set 0
            innodb_read_io_threads=14
            innodb_write_io_threads=14
            innodb_open_files=60000
            innodb_use_native_aio=1
                    1. Performance Schema ##########
                      performance_schema=ON
                      performance_schema_instrument='stage/%=ON'
                      performance_schema_consumer_events_stages_current=ON
                      performance_schema_consumer_events_statements_current=ON
                      performance_schema_consumer_events_waits_current=ON
                      performance_schema_consumer_events_statements_history_long=ON
                    1. Rpl_Slave ##########
          1. only on slave node #####
            relay_log=/opt/maria10.1/binlog/RELAY
            max_relay_log_size=1024M
            relay_log_recovery=1
            relay_log_purge=1
            replicate_wild_ignore_table=sys.%
            replicate_wild_ignore_table=test.%
            replicate_wild_ignore_table=jnxdba.%
            replicate_wild_ignore_table=mysql.slow_log
            read_only=1
            skip_slave_start=1
                    1. Rpl_Semi_Sync ##########
                      rpl_semi_sync_master_wait_point=AFTER_SYNC
          1. master #####
            rpl_semi_sync_master_enabled=1
            rpl_semi_sync_master_timeout=18446744073709551615 # 1s
          2. slave #####
            rpl_semi_sync_slave_enabled=1

Issue Links:
Relates
relates to MDEV-31596 Client connection loss makes MariaDB ... Open

 Description   

I'm doing test about MDEV-21117, deployed master to master with semi-sync replication enabled, and found that:

stop all slaves; -- doing this in one of the two master-master node;

-- do the steps below in the left master node
start transaction:
insert into t1(1);
commit; -- due to slave stopped, will be hanged here. And issue `control + c` signal.

Repeatedly doing the above operation, and found that this will keep the gtid increasing every time.

Is it right logic for semi-sync produce this behavior?



 Comments   
Comment by Daniel Black [ 2022-05-31 ]

It seems quite normal for rolled back transactions to discard the gtid. There can be gaps in the gtid because of this.

The important bit is because its semisync there shouldn't be a committed insert and nor should the transaction be in the binary log.

As a final check, start all slaves, do a transaction that succeeds, and check the binary log to ensure that despite the gaps, the new transaction is there and the rolled back transaction isn't.

Comment by Kuijun Cui [ 2022-06-01 ]

I tested with go mysql connector driver, need to do more logic to handle this scenario for long transaction operation, and timeout logic to cancel this operation, something else to retry the failed transaction.

Comment by Daniel Black [ 2022-06-01 ]

Let the server cancel it for you:

SET STATEMENT max_statement_time=10 FOR INSERT INTO ....

Please report if this doesn't work.

So the title here indicates that it will produce binlog. Is that correct or are you making the assumption based on gtid increments?

Comment by Kuijun Cui [ 2022-06-01 ]

(dbausr@localhost) [(none)] 18:27:02> show processlist;
+----+-------------+-----------+--------+----------+--------+--------------------------------------+-----------------------------------------------------------------------------------+----------+
| Id | User        | Host      | db     | Command  | Time   | State                                | Info                                                                              | Progress |
+----+-------------+-----------+--------+----------+--------+--------------------------------------+-----------------------------------------------------------------------------------+----------+
|  6 | system user |           | NULL   | Slave_IO | 174550 | Waiting for master to send event     | NULL                                                                              |    0.000 |
*| 41 | dbausr      | localhost | sbtest | Query    |     55 | Waiting for semi-sync ACK from slave | SET STATEMENT max_statement_time = 5 FOR INSERT INTO sbtest.t1 VALUES(1654075999) |    0.000 |*
| 44 | dbausr      | localhost | NULL   | Query    |      0 | starting                             | show processlist                                                                  |    0.000 |
+----+-------------+-----------+--------+----------+--------+--------------------------------------+-----------------------------------------------------------------------------------+----------+
3 rows in set (0.00 sec)

did not take effect with this way

Comment by Daniel Black [ 2022-07-28 ]

bnestere do you think max_statement_time should be respected by semisync?

It might be harder on a COMMIT to return with an error, however galera would.

Comment by Brandon Nesterenko [ 2022-08-09 ]

Hi danblack and cui3093!

Sorry for the late reply, I was on vacation last week.

I think this situation is captured by rpl_semi_sync_master_wait_point. When set to AFTER_COMMIT (the default), I think max_statement_time would not be respected. If set to AFTER_SYNC, I think max_statement_time should be respected. Note too, however, rpl_semi_sync_master_timeout is considered independently from max_statement_time. That is, if rpl_semi_sync_master_timeout > max_statement_time, the server would await an ACK for rpl_semi_sync_master_timeout, even if it is greater than max_statement_time.

With that said, it would make sense to me that if AFTER_SYNC is used, the ACK wait would be overridden with max_statement_time if it is less than rpl_semi_sync_master_timeout. In AFTER_COMMIT mode; however, they make sense to be independent to me.

Elkin, what are your thoughts?

Generated at Thu Feb 08 10:02:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.