Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38429

Atomic DDL can binlog DDL during ddl log crash recovery without "ddl" mark on the GTID

    XMLWordPrintable

Details

    Description

      knielsen_binlog_in_engine_12.3 6c9155f48bd796cfdf6c1b2f59a03d6205b53ba4

      2025-12-24 20:31:27 7 [ERROR] Slave I/O: error reconnecting to master 'replication@127.0.0.1:14000' - retry-time: 1  maximum-retries: 100000  message: Can't connect to server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
      2025-12-24 20:32:00 7 [Note] Slave: connected to master 'replication@127.0.0.1:14000',replication resumed in log 'binlog-000000.ibb' at position 37865734
      2025-12-24 20:32:10 9 [ERROR] Slave SQL: Error 'An attempt was made to binlog GTID 0-1001-400 which would create an out-of-order sequence number with existing GTID 0-1001-403, and gtid strict mode is enabled' on query. Default database: 'partition_db'. Query: '/* WRK-2 QNO 226 */  ALTER TABLE `tlist_6_2` ORDER BY `col_int_nokey`', Gtid 0-1001-400, Internal MariaDB error code: 1950
      2025-12-24 20:32:10 9 [Warning] Slave: An attempt was made to binlog GTID 0-1001-400 which would create an out-of-order sequence number with existing GTID 0-1001-403, and gtid strict mode is enabled Error_code: 1950
      2025-12-24 20:32:10 9 [Warning] Slave: Error writing file '(null)' (errno: 1950 "Unknown error 1950") Error_code: 1026
      2025-12-24 20:32:10 9 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog-000000.ibb' position 4; GTID position '0-1001-403'
      mysqld: /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/transaction.cc:508: bool trans_commit_stmt(THD*): Assertion `!((&(mysql_bin_log.get_log_lock())->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(mysql_bin_log.get_log_lock())->m_mutex)->thread))' failed.
      251224 20:32:10 [ERROR] /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/mysqld got signal 6 ;
       
      (rr) bt
      #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=11365294569024) at ./nptl/pthread_kill.c:44
      #1  __pthread_kill_internal (signo=6, threadid=11365294569024) at ./nptl/pthread_kill.c:78
      #2  __GI___pthread_kill (threadid=11365294569024, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
      #3  0x000064ba6b4d8476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
      #4  0x000064ba6b4be7f3 in __GI_abort () at ./stdlib/abort.c:79
      #5  0x000064ba6b4be71b in __assert_fail_base (fmt=0x64ba6b673130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
          assertion=0x56372cce14a8 "!((&(mysql_bin_log.get_log_lock())->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(mysql_bin_log.get_log_lock())->m_mutex)->thread))", 
          file=0x56372cce11d0 "/home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/transaction.cc", line=508, function=<optimized out>) at ./assert/assert.c:94
      #6  0x000064ba6b4cfe96 in __GI___assert_fail (
          assertion=0x56372cce14a8 "!((&(mysql_bin_log.get_log_lock())->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(mysql_bin_log.get_log_lock())->m_mutex)->thread))", 
          file=0x56372cce11d0 "/home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/transaction.cc", line=508, function=0x56372cce1879 "bool trans_commit_stmt(THD*)")
          at ./assert/assert.c:103
      #7  0x000056372bdef113 in trans_commit_stmt (thd=0x65da1c001008) at /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/transaction.cc:508
      #8  0x000056372bdba66c in rpl_group_info::slave_close_thread_tables (this=0x15884d88a220, thd=0x65da1c001008)
          at /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/rpl_rli.cc:2419
      #9  0x000056372bdba15b in rpl_group_info::cleanup_context (this=0x15884d88a220, thd=0x65da1c001008, error=true, keep_domain_owner=false)
          at /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/rpl_rli.cc:2305
      #10 0x000056372beaea96 in signal_error_to_sql_driver_thread (thd=0x65da1c001008, rgi=0x15884d88a220, err=1)
          at /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/rpl_parallel.cc:348
      #11 0x000056372beb26a8 in handle_rpl_parallel_thread (arg=0x15884c00d8b0) at /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/rpl_parallel.cc:1632
      #12 0x000056372c40da2f in pfs_spawn_thread (arg=0x15884c00ed58) at /home/binlog/knielsen_binlog_in_engine_12.3-debug/storage/perfschema/pfs.cc:2198
      #13 0x000064ba6b52aac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #14 0x000064ba6b5bba74 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
      

      The concurrent test is a mix of different features and a a variety of server parameters, and performs the crash recovery in replication context – the primary server gets SIGKILL-ed and restarted to do crash recovery, while the replica is waiting it to come back, and if it does, the test flow on the primary and the replication continues. The failure happens on the replica shortly after recovery of the primary server.

      To reproduce with the concurrent test:

      https://github.com/MariaDB/randgen 85500ee499b172c1a158e8e632c6b503a3a7ced4

      perl ./run.pl  --compatibility=999999 --duration=180 --engine=InnoDB --filter=conf/ff/replication.ff --filter=conf/preview/new_binlog.ff --gendata=advanced --gendata=conf/zz/blobs.zz --gendata=data/dbt3/dbt3-s0.0001.dump --gendata=data/dbt3/dbt3-s0.0001.dump --gendata=data/sql/misc.sql --gendata=simple --grammar=conf/yy/admin.yy  --grammar=conf/yy/dbt3-dml.yy --grammar=conf/yy/dbt3-joins.yy --grammar=conf/yy/dml.yy --grammar=conf/yy/engine-spider.yy --grammar=conf/yy/indexes_and_constraints.yy --grammar=conf/yy/information_schema.yy --grammar=conf/yy/many_indexes.yy --grammar=conf/yy/partition_by_list.yy  --mysqld=--binlog_format=ROW   --mysqld=--binlog_storage_engine=innodb --mysqld=--character_set_server=utf8mb3 --mysqld=--collation_server=utf8mb3_general_ci  --mysqld=--explicit_defaults_for_timestamp=ON --mysqld=--gtid_strict_mode=ON  --mysqld=--innodb-lock-wait-timeout=5 --mysqld=--innodb_autoinc_lock_mode=2 --mysqld=--innodb_buffer_pool_size=2G --mysqld=--innodb_flush_neighbors=0 --mysqld=--innodb_io_capacity=400 --mysqld=--innodb_io_capacity_max=4000 --mysqld=--innodb_lock_wait_timeout=180 --mysqld=--innodb_log_buffer_size=67108864 --mysqld=--innodb_log_file_size=4294967296 --mysqld=--innodb_max_dirty_pages_pct=50.0 --mysqld=--innodb_open_files=2048 --mysqld=--innodb_print_all_deadlocks=ON --mysqld=--innodb_read_io_threads=8 --mysqld=--innodb_sort_buffer_size=12582912 --mysqld=--innodb_write_io_threads=8 --mysqld=--join_buffer_size=4194304 --mysqld=--join_buffer_space_limit=4194304 --mysqld=--join_cache_level=6 --mysqld=--key_buffer_size=67108864 --mysqld=--lock-wait-timeout=10 --mysqld=--lock_wait_timeout=10800 --mysqld=--log_bin --mysqld=--log_bin_compress=ON --mysqld=--log_bin_trust_function_creators=ON --mysqld=--log_slave_updates=ON --mysqld=--log_slow_verbosity=innodb,query_plan --mysqld=--long_query_time=30 --mysqld=--loose-shutdown_wait_for_slaves=ON --mysqld=--loose-spider-same-server-link=on --mysqld=--loose-spider_table_crd_thread_count=1 --mysqld=--loose-spider_table_sts_thread_count=1 --mysqld=--loose-sql_error_log_rate=1 --mysqld=--loose-sql_error_log_rotate=OFF --mysqld=--loose-sql_error_log_rotations=9 --mysqld=--loose-sql_error_log_size_limit=268435456 --mysqld=--loose-sql_error_log_with_db_and_thread_info=OFF --mysqld=--lower_case_table_names=1 --mysqld=--master_verify_checksum=ON --mysqld=--max-statement-time=20 --mysqld=--max_allowed_packet=67108864 --mysqld=--max_binlog_size=268435456 --mysqld=--max_connect_errors=10000 --mysqld=--max_connections=10000 --mysqld=--max_heap_table_size=33554432 --mysqld=--max_password_errors=5 --mysqld=--max_relay_log_size=268435456 --mysqld=--myisam_recover_options=BACKUP,FORCE --mysqld=--myisam_sort_buffer_size=67108864 --mysqld=--open_files_limit=278528 --mysqld=--optimizer_search_depth=0 --mysqld=--optimizer_switch=mrr=on,mrr_cost_based=on --mysqld=--plugin-load-add=ha_spider --mysqld=--plugin-maturity=experimental --mysqld=--plugin_maturity=stable --mysqld=--proxy_protocol_networks=* --mysqld=--query_cache_size=0 --mysqld=--relay_log_recovery=ON --mysqld=--replicate_events_marked_for_skip=FILTER_ON_MASTER --mysqld=--slave_net_timeout=10 --mysqld=--slave_parallel_max_queued=67108864 --mysqld=--slave_parallel_threads=4 --mysqld=--slave_parallel_workers=4 --mysqld=--slow_query_log=ON --mysqld=--sql_mode=PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ORACLE,NO_KEY_OPTIONS,NO_TABLE_OPTIONS,NO_FIELD_OPTIONS,STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,SIMULTANEOUS_ASSIGNMENT --mysqld=--sync_binlog=1 --mysqld=--sync_master_info=1 --mysqld=--table_definition_cache=16384 --mysqld=--table_open_cache=16384 --mysqld=--thread_cache_size=256 --mysqld=--thread_pool_size=16 --mysqld=--thread_stack=262144 --mysqld=--tmp_memory_table_size=33554432 --mysqld=--tmp_table_size=33554432 --mysqld=--transaction_isolation=READ-COMMITTED --mysqld=--use_stat_tables=PREFERABLY --mysqld=--userstat=ON --mysqld=--wait_timeout=3600 --queries=1000000 --reporters=Backtrace,Deadlock,FeatureUsage,UniqueConstraintValidity --scenario-primary-crash-recovery --scenario-use-gtid --scenario=Replication   --server1-mysqld=--log_bin --server1-mysqld=--read_only=0 --threads=4 --unique-hash-keys --seed=1765895546 --trials=10 --output="mysql_bin_log.get_log_lock" --basedir=/home/binlog/knielsen_binlog_in_engine_12.3-debug --vardir=/home/binlog/var-binlog
      

      The command line above attempts 10 runs (--trials option) and stops earlier if it finds "mysql_bin_log.get_log_lock" in the output (--output option). Note also --basedir and --vardir options at the end of the command line. It usually fails for me within 5-10 attempts, but there is no guarantee. I couldn't reproduce it without binlog-storage-engine, but it's no guarantee either.
      The rr profile will be provided. To run with rr, add --rr option to the command line above. It is remarkably well rr-able, or maybe I just got lucky.

      Sometimes it fails on two assertions, one after another:

      2025-12-24 21:11:00 7 [Note] Slave: connected to master 'replication@127.0.0.1:21000',replication resumed in log 'binlog-000001.ibb' at position 2100122
      2025-12-24 21:11:03 12 [ERROR] Slave SQL: Error 'An attempt was made to binlog GTID 0-1001-460 which would create an out-of-order sequence number with existing GTID 0-1001-462, and gtid strict mode is enabled' on query. Def
      ault database: 'simple_db'. Query: '/* WRK-2 QNO 431 */  CREATE OR REPLACE INDEX indcnstr8 ON `partition_db`.`tlist_4_1` ( `col_int_key` DESC ) WAIT 6', Gtid 0-1001-460, Internal MariaDB error code: 1950
      mysqld: /home/binlog/knielsen_binlog_in_engine_12.3-debug/sql/rpl_parallel.cc:285: void finish_event_group(rpl_parallel_thread*, uint64, rpl_parallel_entry*, rpl_group_info*): Assertion `!tmp_gco->next_gco || tmp_gco->last_
      sub_id > sub_id' failed.
      ...
      Assertion `!((&(mysql_bin_log.get_log_lock())->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(mysql_bin_log.get_log_lock())->m_mutex)->thread))' failed.
      

      Attachments

        Activity

          People

            knielsen Kristian Nielsen
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.