Details

    Description

      This happened yesterday:

      mariadbd: 2022-03-21 13:15:30 13094 [ERROR] [FATAL] InnoDB: Rec offset 99, cur1 offset 8485, cur2 offset 15780
      mariadbd: 220321 13:15:30 [ERROR] mysqld got signal 6 ;
      mariadbd: This could be because you hit a bug. It is also possible that this binary
      mariadbd: or one of the libraries it was linked against is corrupt, improperly built,
      mariadbd: or misconfigured. This error can also be caused by malfunctioning hardware.
      mariadbd: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      mariadbd: We will try our best to scrape up some info that will hopefully help
      mariadbd: diagnose the problem, but since we have already crashed,
      mariadbd: something is definitely wrong and this may fail.
      mariadbd: Server version: 10.5.15-MariaDB-log
      mariadbd: key_buffer_size=134217728
      mariadbd: read_buffer_size=131072
      mariadbd: max_used_connections=4
      mariadbd: max_threads=153
      mariadbd: thread_count=3
      mariadbd: It is possible that mysqld could use up to
      mariadbd: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467872 K  bytes of memory
      mariadbd: Hope that's ok; if not, decrease some variables in the equation.
      mariadbd: Thread pointer: 0x7f42980029a8
      mariadbd: Attempting backtrace. You can use the following information to find out
      mariadbd: where mysqld died. If you see no messages after this, something went
      mariadbd: terribly wrong...
      mariadbd: stack_bottom = 0x7f446861a3e0 thread_stack 0x49000
      mariadbd: ??:0(my_print_stacktrace)[0x55a751fffe3e]
      mariadbd: ??:0(handle_fatal_signal)[0x55a7519f19c7]
      mariadbd: sigaction.c:0(__restore_rt)[0x7f448224d630]
      mariadbd: :0(__GI_raise)[0x7f4481698387]
      mariadbd: :0(__GI_abort)[0x7f4481699a78]
      mariadbd: ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x55a751e568b0]
      mariadbd: ??:0(std::unique_lock<std::mutex>::unlock())[0x55a751d71579]
      mariadbd: ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x55a751e5ce86]
      mariadbd: ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x55a751e5dab7]
      mariadbd: ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x55a751e7c63c]
      mariadbd: ??:0(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x55a751db5c46]
      mariadbd: ??:0(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x55a751db6932]
      mariadbd: ??:0(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x55a751db6e2a]
      mariadbd: ??:0(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x55a751dc9a14]
      mariadbd: ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55a751cf6d94]
      mariadbd: ??:0(handler::ha_write_row(unsigned char const*))[0x55a7519feb4f]
      mariadbd: ??:0(Rows_log_event::write_row(rpl_group_info*, bool))[0x55a751b0c2f5]
      mariadbd: ??:0(Write_rows_log_event::do_exec_row(rpl_group_info*))[0x55a751b0c75d]
      mariadbd: ??:0(Rows_log_event::do_apply_event(rpl_group_info*))[0x55a751b01b84]
      mariadbd: ??:0(non-virtual thunk to Item_string_sys::~Item_string_sys())[0x55a75172f65f]
      mariadbd: ??:0(handle_slave_sql)[0x55a75173984c]
      mariadbd: ??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55a751c471b7]
      mariadbd: pthread_create.c:0(start_thread)[0x7f4482245ea5]
      mariadbd: ??:0(__clone)[0x7f4481760b0d]
      mariadbd: Trying to get some variables.
      mariadbd: Some pointers may be invalid and cause the dump to abort.
      mariadbd: Query (0x7f42980a540b): INSERT INTO ...
      mariadbd: Status: NOT_KILLED
      mariadbd: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=of
      f,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
      mariadbd: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      mariadbd: information that should help you find out what is causing the crash.
      mariadbd: Writing a core file...
      mariadbd: Working directory at /var/lib/mysql
      mariadbd: Resource Limits:
      mariadbd: Limit                     Soft Limit           Hard Limit           Units
      mariadbd: Max cpu time              unlimited            unlimited            seconds
      mariadbd: Max file size             unlimited            unlimited            bytes
      mariadbd: Max data size             unlimited            unlimited            bytes
      mariadbd: Max stack size            8388608              unlimited            bytes
      mariadbd: Max core file size        0                    unlimited            bytes
      mariadbd: Max resident set          unlimited            unlimited            bytes
      mariadbd: Max processes             31150                31150                processes
      mariadbd: Max open files            32768                32768                files
      mariadbd: Max locked memory         65536                65536                bytes
      mariadbd: Max address space         unlimited            unlimited            bytes
      mariadbd: Max file locks            unlimited            unlimited            locks
      mariadbd: Max pending signals       31150                31150                signals
      mariadbd: Max msgqueue size         819200               819200               bytes
      mariadbd: Max nice priority         0                    0
      mariadbd: Max realtime priority     0                    0
      mariadbd: Max realtime timeout      unlimited            unlimited            us
      mariadbd: Core pattern: |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h
      abrt-hook-ccpp: Process 9433 (mariadbd) of user 987 killed by SIGABRT - dumping core
      abrtd: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'xorg-2019-01-19-08:12:24-6238-1'
      systemd: mariadb.service: main process exited, code=killed, status=6/ABRT
      systemd: Unit mariadb.service entered failed state.
      systemd: mariadb.service failed.
      abrtd: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'xorg-2019-02-23-08:17:59-6276-1'
      abrtd: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2017-09-13-18:37:14-4564-0'
      abrtd: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2017-09-13-18:36:34-4511-0'
      abrtd: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'Python-2017-09-13-17:53:58-1093'
      abrt-server: Package 'MariaDB-server' isn't signed with proper key
      abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2022-03-21-13:15:30-9433' exited with 1
      abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2022-03-21-13:15:30-9433'
      systemd: mariadb.service holdoff time over, scheduling restart.
      systemd: Cannot add dependency job for unit firewalld.service, ignoring: Unit is masked.
      systemd: Stopped MariaDB 10.5.15 database server.
      

      I was able to get server back running by starting it with innodb_force_recovery = 1, dropping databases and finally recreating them from another slave.

      Attachments

        Issue Links

          Activity

            Can you please try to produce stack traces from the core dump? The built-in stack trace generator for the current thread often produces garbage, like in this case.

            The crash message matches MDEV-26977. In MDEV-27734 we disabled the InnoDB change buffer by default because of such bug reports and because of questionable performance benefit.

            Since we have not been able to reproduce the corruption ourselves, we are unable to fix it, and the best course of action seems to be to disable the change buffer.

            marko Marko Mäkelä added a comment - Can you please try to produce stack traces from the core dump? The built-in stack trace generator for the current thread often produces garbage, like in this case. The crash message matches MDEV-26977 . In MDEV-27734 we disabled the InnoDB change buffer by default because of such bug reports and because of questionable performance benefit. Since we have not been able to reproduce the corruption ourselves, we are unable to fix it, and the best course of action seems to be to disable the change buffer.

            MariaDB Server 10.5.15 does disable the change buffering by default (MDEV-27734), but it is possible that some buffered changes existed in the change buffer when the server was started. The corruption would not be noticed until a change buffer merge occurs.

            The change buffer covers changes to secondary indexes. Therefore, the statement

            ALTER TABLE tablename FORCE;
            

            should be able to fix this type of corruption by rebuilding the table.

            If you restore the server from data files, any corruption in the persistent change buffer would be copied along. Is the crash reproducible after restoring from a logical dump, with the change buffer disabled? Can you provide a proper stack trace? Does it include ibuf_merge_or_delete_for_page()?

            marko Marko Mäkelä added a comment - MariaDB Server 10.5.15 does disable the change buffering by default ( MDEV-27734 ), but it is possible that some buffered changes existed in the change buffer when the server was started. The corruption would not be noticed until a change buffer merge occurs. The change buffer covers changes to secondary indexes. Therefore, the statement ALTER TABLE tablename FORCE ; should be able to fix this type of corruption by rebuilding the table. If you restore the server from data files, any corruption in the persistent change buffer would be copied along. Is the crash reproducible after restoring from a logical dump, with the change buffer disabled? Can you provide a proper stack trace? Does it include ibuf_merge_or_delete_for_page() ?

            Unfortunately it seems that I am not able to produce stack traces from the core dump because the databases were recreated and there is no server snapshot taken from the crash.

            But here are some details what I was able to gather:

            • MariaDB had been running constantly since Mar 1 2022 (update from 10.5.13 to 10.5.15)
            • After the crash when I tried to start the server without innodb_force_recovery=1 it always crashed
            • The server is originally started in 2017 with MariaDB 10.1 or 10.2 and it has been updated every year to new major version (10.1 -> 10.2, 10.2 -> 10.3 etc.)
            • server.cnf file is down below

            [mysqld]
            log_bin                   = mysql-bin
            binlog_do_db              = db1
            binlog_do_db              = db2
            binlog_format             = ROW
            expire_logs_days          = 5
            server_id                 = 2
             
            innodb_buffer_pool_size   = 5GB
            innodb_file_per_table     = 1
             
            ssl
            ssl-ca                    = /path/to/file
            ssl-cert                  = /path/to/file
            ssl-key                   = /path/to/file
             
            local_infile              = 0
            sql_mode                  = ''
            symbolic_links            = 0
            innodb_strict_mode        = 0
            max_password_errors       = 10
             
            relay-log                 = mysql-relay-bin.log
            read-only                 = 1
            replicate-do-db           = db1
            replicate-do-db           = db2
            log_slave_updates         = 1
             
            plugin_load                  = server_audit=server_audit.so
            server_audit_logging         = ON
            server_audit_events          = 'CONNECT,TABLE,QUERY_DDL'
            server_audit_file_rotate_now = ON
            server_audit_file_rotations  = 30
            

            We had simiral style crash in February to another slave (using MariaDB 10.5). In that case the error was "Page old data size xyz new data size xyz, page old max ins size xyz new max ins size xyz" and the server didn't start without innodb_force_recovery=4.

            jarkko.hyvarinen Jarkko Hyvärinen added a comment - Unfortunately it seems that I am not able to produce stack traces from the core dump because the databases were recreated and there is no server snapshot taken from the crash. But here are some details what I was able to gather: MariaDB had been running constantly since Mar 1 2022 (update from 10.5.13 to 10.5.15) After the crash when I tried to start the server without innodb_force_recovery=1 it always crashed The server is originally started in 2017 with MariaDB 10.1 or 10.2 and it has been updated every year to new major version (10.1 -> 10.2, 10.2 -> 10.3 etc.) server.cnf file is down below [mysqld] log_bin = mysql-bin binlog_do_db = db1 binlog_do_db = db2 binlog_format = ROW expire_logs_days = 5 server_id = 2   innodb_buffer_pool_size = 5GB innodb_file_per_table = 1   ssl ssl-ca = /path/to/file ssl-cert = /path/to/file ssl-key = /path/to/file   local_infile = 0 sql_mode = '' symbolic_links = 0 innodb_strict_mode = 0 max_password_errors = 10   relay-log = mysql-relay-bin.log read-only = 1 replicate- do -db = db1 replicate- do -db = db2 log_slave_updates = 1   plugin_load = server_audit=server_audit.so server_audit_logging = ON server_audit_events = 'CONNECT,TABLE,QUERY_DDL' server_audit_file_rotate_now = ON server_audit_file_rotations = 30 We had simiral style crash in February to another slave (using MariaDB 10.5). In that case the error was "Page old data size xyz new data size xyz, page old max ins size xyz new max ins size xyz" and the server didn't start without innodb_force_recovery=4.

            jarkko.hyvarinen, thank you. While I do not know the exact stack trace, the amount of stack frames could match InnoDB change buffer merge, which may invoke btr_page_reorganize(), which would output the fatal error message if the page is corrupted.

            You wrote that you were able to break a crash loop by setting innodb_force_recovery=4. Actually, in MariaDB Server 10.5, thanks to MDEV-19514, that setting is equivalent to innodb_force_recovery=3. Such a crash loop was reported earlier in MDEV-26977.

            I would need a repeatable test case in order to analyze and fix this bug. We do not hit this in our internal testing, possibly because we typically run rather short tests with rather small amounts of data, and by design, some data flow patterns of the change buffer would only be exercised with larger amounts of data over a longer period. The fact that the change buffer is persistent and that corruption may remain unnoticed for a long time makes this hard.

            In MDEV-27734, the change buffer was disabled by default to prevent further corruption due to the change buffer.

            marko Marko Mäkelä added a comment - jarkko.hyvarinen , thank you. While I do not know the exact stack trace, the amount of stack frames could match InnoDB change buffer merge, which may invoke btr_page_reorganize() , which would output the fatal error message if the page is corrupted. You wrote that you were able to break a crash loop by setting innodb_force_recovery=4 . Actually, in MariaDB Server 10.5, thanks to MDEV-19514 , that setting is equivalent to innodb_force_recovery=3 . Such a crash loop was reported earlier in MDEV-26977 . I would need a repeatable test case in order to analyze and fix this bug. We do not hit this in our internal testing, possibly because we typically run rather short tests with rather small amounts of data, and by design, some data flow patterns of the change buffer would only be exercised with larger amounts of data over a longer period. The fact that the change buffer is persistent and that corruption may remain unnoticed for a long time makes this hard. In MDEV-27734 , the change buffer was disabled by default to prevent further corruption due to the change buffer.

            The reported code that would intentionally cause a crash was removed in MDEV-13542.

            marko Marko Mäkelä added a comment - The reported code that would intentionally cause a crash was removed in MDEV-13542 .

            People

              marko Marko Mäkelä
              jarkko.hyvarinen Jarkko Hyvärinen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.