Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35054

Replication failure followed by Mariadb crash

    XMLWordPrintable

Details

    Description

      Hello Team,

      We observed a replication failure between the database servers db1 and db2, followed by a crash of the MariaDB service on db1 due to signal 11 (segmentation fault). Could you please investigate the root cause for both the replication failure and the subsequent crash?

      Replication Failure (Sept 30, 2024 - 22:47:31):
      Error Overview: The replication failure was triggered due to a checksum verification error while reading from the relay log.

      Corruption in the relay log or the master’s binary log.

      The replication SQL thread encountered an issue with parsing the relay log event, which eventually led to an I/O error and the termination of the slave SQL thread.

      mysql logs:

      *2024-09-30 22:47:31 27 [ERROR] Replication event checksum verification failed while reading from a log file
      2024-09-30 22:47:31 27 [ERROR] Error in *Log_event::read_log_event(): 'Replication event checksum verification failed while reading from a log file', data_len: 680, event_type: 160
      2024-09-30 22:47:31 27 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
      2024-09-30 22:47:31 27 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MariaDB code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Internal MariaDB error code: 1594
      2024-09-30 22:47:31 27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db2-binary.231673' position 125252441
      2024-09-30 22:47:31 27 [Note] Slave SQL thread exiting, replication stopped in log 'db2-binary.231673' at position 125272617, master: 10.76.24.77:3306
      2024-09-30 22:48:18 26 [ERROR] Could not use ./db1-relay.272126 for logging (error 11). Turning logging off for the whole duration of the MariaDB server process. To turn it on again: fix the cause, shutdown the MariaDB server and restart it.
      2024-09-30 22:48:18 26 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595
      2024-09-30 22:48:18 26 [Note] Slave I/O thread exiting, read up to log 'db2-binary.231674', position 4, master 10.76.24.77:3306
      2024-10-01 0:14:32 35483020 [Note] Slave I/O thread: Start asynchronous replication to master 'gtid@10.76.24.77:3306' in log 'db2-binary.231674' at position 4
      2024-10-01 0:14:32 35483020 [Note] Slave I/O thread: connected to master 'gtid@10.76.24.77:3306',replication started in log 'db2-binary.231674' at position 4
      241001 0:14:32 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.11.5-MariaDB-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c
      key_buffer_size=67108864
      read_buffer_size=131072
      max_used_connections=8
      max_threads=802
      thread_count=27
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831964 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x7f9be8001598
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f9f495c64b8 thread_stack 0x49000
      /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x558946a8a12e]
      mysys/stacktrace.c:216(my_print_stacktrace)[0x5589465835b5]
      /lib64/libpthread.so.0(+0x12cf0)[0x7fc54860ecf0]
      bits/string_fortified.h:34(check_binlog_magic(st_io_cache*, char const**))[0x5589466795e8]
      sql/rpl_rli.cc:717(init_relay_log_pos(Relay_log_info*, char const*, unsigned long long, bool, char const**, bool))[0x55894644786a]
      sql/slave.cc:5493(handle_slave_sql)[0x55894628ee7a]
      perfschema/pfs.cc:2204(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55894678ac9d]
      /lib64/libpthread.so.0(+0x81ca)[0x7fc5486041ca]
      /lib64/libc.so.6(clone+0x43)[0x7fc547954e73]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): (null)
      Connection ID (thread ID): 35483021
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off

      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.

      We think the query pointer is invalid, but we will try to print it anyway.
      Query:

      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit Soft Limit Hard Limit Units
      Max cpu time unlimited unlimited seconds
      Max file size unlimited unlimited bytes
      Max data size unlimited unlimited bytes
      Max stack size 8388608 unlimited bytes
      Max core file size unlimited unlimited bytes
      Max resident set unlimited unlimited bytes
      Max processes 2061710 2061710 processes
      Max open files 32768 32768 files
      Max locked memory 65536 65536 bytes
      Max address space unlimited unlimited bytes
      Max file locks unlimited unlimited locks
      Max pending signals 2061710 2061710 signals
      Max msgqueue size 819200 819200 bytes
      Max nice priority 0 0
      Max realtime priority 0 0
      Max realtime timeout unlimited unlimited us
      Core pattern: /var/coredumps/core.%e.%p

      Kernel version: Linux version 4.18.0-477.13.1.el8_8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)) #1 SMP Thu May 18 10:27:05 EDT 2023

      2024-10-01 0:15:21 0 [Note] Starting MariaDB 10.11.5-MariaDB-log source revision 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c as process 194175
      2024-10-01 0:15:21 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2024-10-01 0:15:21 0 [Note] InnoDB: Number of transaction pools: 1
      2024-10-01 0:15:21 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2024-10-01 0:15:22 0 [Note] InnoDB: Using Linux native AIO
      2024-10-01 0:15:22 0 [Note] InnoDB: Initializing buffer pool, total size = 150.000GiB, chunk size = 2.344GiB
      2024-10-01 0:15:22 0 [Note] InnoDB: Completed initialization of buffer pool
      2024-10-01 0:15:22 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      2024-10-01 0:15:23 0 [Note] InnoDB: End of log at LSN=48180725989630
      2024-10-01 0:15:23 0 [Note] InnoDB: 128 rollback segments are active.
      2024-10-01 0:15:24 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2024-10-01 0:15:24 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2024-10-01 0:15:24 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      2024-10-01 0:15:24 0 [Note] InnoDB: log sequence number 48180725989630; transaction id 151460736751
      2024-10-01 0:15:24 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
      2024-10-01 0:15:24 0 [Note] Plugin 'FEEDBACK' is disabled.
      2024-10-01 0:15:24 0 [Warning] 'thread-concurrency' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      2024-10-01 0:15:24 0 [Warning] 'innodb-thread-concurrency' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      2024-10-01 0:15:24 0 [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      2024-10-01 0:15:24 0 [Note] Recovering after a crash using db1-binary
      2024-10-01 0:15:24 0 [Note] Starting table crash recovery...
      2024-10-01 0:15:24 0 [Note] Crash table recovery finished.
      2024-10-01 0:15:24 0 [Note] Server socket created on IP: '0.0.0.0'.
      2024-10-01 0:15:24 0 [Note] Server socket created on IP: '::'.
      2024-10-01 0:15:24 0 [Warning] 'user' entry '@alepo-db2-ilb' ignored in --skip-name-resolve mode.
      2024-10-01 0:15:24 0 [Warning] 'user' entry 'root@localhost.localdomain' ignored in --skip-name-resolve mode.
      2024-10-01 0:15:24 0 [Warning] 'proxies_priv' entry '@% root@alepo-db2-ilb' ignored in --skip-name-resolve mode.
      2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb3.info' Relay_info:'relay-log-s@002ddb3.info'
      2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb3.info'
      2024-10-01 0:15:24 4 [Note] Master 's-db3': Slave I/O thread: Start asynchronous replication to master '@:3306' in log '' at position 4
      2024-10-01 0:15:24 4 [ERROR] Master 's-db3': Slave I/O: Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. Internal MariaDB error code: 1593
      2024-10-01 0:15:24 4 [Note] Master 's-db3': Slave I/O thread killed while connecting to master
      2024-10-01 0:15:24 4 [Note] Master 's-db3': Slave I/O thread exiting, read up to log 'FIRST', position 4, master :3306
      2024-10-01 0:15:24 3 [Note] Started replication for 's-db3'
      2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb4.info' Relay_info:'relay-log-s@002ddb4.info'
      2024-10-01 0:15:24 5 [Note] Master 's-db3': Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './db1-relay-s@002ddb3.000040' position: 4
      2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb4.info'
      2024-10-01 0:15:24 18 [Note] Master 's-db4': Slave I/O thread: Start asynchronous replication to master 'gtid@10.76.25.11:3306' in log 'db4-binary.208330' at position 44444885
      2024-10-01 0:15:24 3 [Note] Started replication for 's-db4'
      2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb2.info' Relay_info:'relay-log-s@002ddb2.info'
      2024-10-01 0:15:24 19 [Note] Master 's-db4': Slave SQL thread initialized, starting replication in log 'db4-binary.208330' at position 44444885, relay log './db1-relay-s@002ddb4.115156' position: 783
      2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb2.info'
      2024-10-01 0:15:24 20 [Note] Master 's-db2': Slave I/O thread: Start asynchronous replication to master '@:3306' in log '' at position 4
      2024-10-01 0:15:24 20 [ERROR] Master 's-db2': Slave I/O: Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. Internal MariaDB error code: 1593
      2024-10-01 0:15:24 20 [Note] Master 's-db2': Slave I/O thread killed while connecting to master
      2024-10-01 0:15:24 20 [Note] Master 's-db2': Slave I/O thread exiting, read up to log 'FIRST', position 4, master :3306
      2024-10-01 0:15:24 3 [Note] Started replication for 's-db2'
      2024-10-01 0:15:24 3 [Note] Reading Master_info: 'master-s@002ddb5.info' Relay_info:'relay-log-s@002ddb5.info'
      2024-10-01 0:15:24 21 [Note] Master 's-db2': Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './db1-relay-s@002ddb2.000039' position: 4
      2024-10-01 0:15:24 3 [Note] Initialized Master_info from 'master-s@002ddb5.info'
      2024-10-01 0:15:24 22 [Note] Master 's-db5': Slave I/O thread: Start asynchronous replication to master '@:3306' in log '' at position 4
      2024-10-01 0:15:24 22 [ERROR] Master 's-db5': Slave I/O: Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. Internal MariaDB error code: 1593
      2024-10-01 0:15:24 22 [Note] Master 's-db5': Slave I/O thread killed while connecting to master
      2024-10-01 0:15:24 22 [Note] Master 's-db5': Slave I/O thread exiting, read up to log 'FIRST', position 4, master :3306
      2024-10-01 0:15:24 3 [Note] Started replication for 's-db5'
      2024-10-01 0:15:24 23 [Note] Master 's-db5': Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './db1-relay-s@002ddb5.000031' position: 4
      2024-10-01 0:15:24 25 [Note] Slave I/O thread: Start asynchronous replication to master 'gtid@10.76.24.77:3306' in log 'db2-binary.231673' at position 209716353
      2024-10-01 0:15:24 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.11.5-MariaDB-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
      2024-10-01 0:15:24 26 [Note] Slave SQL thread initialized, starting replication in log 'db2-binary.231673' at position 125272617, relay log './db1-relay.272125' position: 58163882
      2024-10-01 0:15:24 25 [Note] Slave I/O thread: connected to master 'gtid@10.76.24.77:3306',replication started in log 'db2-binary.231673' at position 209716353
      2024-10-01 0:15:24 18 [Note] Master 's-db4': Slave I/O thread: connected to master 'gtid@10.76.25.11:3306',replication started in log 'db4-binary.208330' at position 44444885
      2024-10-01 0:16:16 330 [Note] Start binlog_dump to slave_server(2), pos(db1-binary.231004, 82365201), using_gtid(0), gtid('')
      2024-10-01 0:16:16 331 [Note] Start binlog_dump to slave_server(3), pos(db1-binary.231004, 82365201), using_gtid(0), gtid('')
      2024-10-01 0:16:16 333 [Note] Start binlog_dump to slave_server(4), pos(db1-binary.231004, 82365201), using_gtid(0), gtid('')
      2024-10-01 0:17:44 0 [Note] InnoDB: Buffer pool(s) load completed at 241001 0:17:44
      2024-10-01 2:00:02 330 [Warning] Aborted connection 330 to db: 'unconnected' user: 'gtid' host: '10.76.24.77' (Got an error writing communication packets)
      2024-10-01 2:00:02 352786 [Note] Start binlog_dump to slave_server(2), pos(db1-binary.231093, 148119556), using_gtid(0), gtid('')

      [root@ALEPO-DB1-ILB coredumps]# systemctl status mysqld
      ● mariadb.service - MariaDB 10.11.5 database server
      Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/mariadb.service.d
      └─migrated-from-my.cnf-settings.conf

      • Active: active (running) since Tue 2024-10-01 00:15:24 PKT; 11h ago*
        Docs: man:mariadbd(8)
        https://mariadb.com/kb/en/library/systemd/
        Process: 194226 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCC>
        Process: 193951 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/g>
        Process: 193949 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCE>
        Main PID: 194175 (mariadbd)
        Status: "Taking your SQL requests now..."
        Tasks: 38 (limit: 3298736)
        Memory: 165.9G
        CGroup: /system.slice/mariadb.service
        └─194175 /usr/sbin/mariadbd

      Oct 01 00:15:21 ALEPO-DB1-ILB systemd[1]: Starting MariaDB 10.11.5 database server...
      Oct 01 00:15:21 ALEPO-DB1-ILB mariadbd[194175]: 2024-10-01 0:15:21 0 [Warning] Could not increase number of max_open_fi>
      Oct 01 00:15:24 ALEPO-DB1-ILB systemd[1]: Started MariaDB 10.11.5 database server.
      lines 1-20/20 (END)

      MYSQL VERSION:
      mariadb Ver 15.1 Distrib 10.11.5-MariaDB, for Linux (x86_64) using readline 5.1

      Kindly find attached mysqlbinlog file, core dump file, coredump analysis, show slave status output for your reference:

      Core file :
      https://easyupload.io/bxx7f9
      password : 12345678

      mysqlbin log file:
      https://easyupload.io/d7ogcw
      password:12345678

      Attachments

        1. gdb.txt
          2.63 MB
          Priya Kunte
        2. messages_01102024
          140 kB
          Priya Kunte
        3. mysqld_logs_Oct_1
          14 kB
          Priya Kunte
        4. sar30
          1.18 MB
          Priya Kunte
        5. slave status.txt
          3 kB
          Priya Kunte

        Activity

          People

            bnestere Brandon Nesterenko
            Priya Kunte Priya Kunte
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.