Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13542

Crashing on a corrupted page is unhelpful

Details

    Description

      Since the very first version, InnoDB was aborting the whole MySQL or MariaDB server when trying to access a corrupted page.

      Earlier, small steps have been taken to improve MariaDB stability, such as MDEV-12253.

      We must enable error handling for corrupted pages instead of allowing the server to crash. This is not only about detecting a checksum failure when reading a corrupted page (like suggested in MySQL Bug#10132 Crashing the server on corrupt InnoDB page is unhelpful, filed on April 25, 2005), but also about performing some operations on corrupted pages whose checksum appears valid.

      In some background operations, such as the purge of committed transaction history, errors may be reported to the server error log only, or silently ignored. While executing SQL, errors should be reported back to the client, and the transaction could be rolled back.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -

            As part of fixing this, we should also ensure that when fil_space_verify_crypt_checksum() has reported corruption, access to the corrupted page will not be allowed, because that could cause various assertion failures inside InnoDB, like this one somewhere inside a B-tree:

            10.2.12

            2018-09-15 22:01:17 139673797834496 [ERROR] InnoDB:  Page 39432:264336 may be corrupted. Post encryption checksum 2055092086 stored [1704382167:1704382167] key_version 1
            2018-09-15 22:01:17 0x7f08571ab700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580
            

            marko Marko Mäkelä added a comment - As part of fixing this, we should also ensure that when fil_space_verify_crypt_checksum() has reported corruption, access to the corrupted page will not be allowed, because that could cause various assertion failures inside InnoDB, like this one somewhere inside a B-tree: 10.2.12 2018-09-15 22:01:17 139673797834496 [ERROR] InnoDB: Page 39432:264336 may be corrupted. Post encryption checksum 2055092086 stored [1704382167:1704382167] key_version 1 2018-09-15 22:01:17 0x7f08571ab700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä added a comment - - edited

            Also for ROW_FORMAT=COMPRESSED tables, it looks like corrupted pages will be wrongly added to the buffer pool. I was under the impression that MySQL would remove the corrupted page from the buffer pool (or not add it), but that definitely is not the case in MariaDB.

            Adding a corrupted page to the buffer pool would cause a crash anywhere when the page contents is accessed. Here is an example from CHECK TABLE:

            10.3.10

             Version: '10.3.10-MariaDB-log' socket: '' port: 3306 mariadb.org binary distribution
            2018-11-03 16:40:21 2467 [ERROR] InnoDB: Unable to decompress XXX.ibd[page id: space=1922, page number=865]
            2018-11-03 16:40:21 2470 [ERROR] Got error 126 when reading table 'XXX'
            2018-11-03 16:40:21 2473 [ERROR] Got error 126 when reading table 'XXX'
            2018-11-03 16:40:21 2475 [ERROR] Got error 126 when reading table 'XXX'
            2018-11-03 16:40:21 2467 [ERROR] Got error 192 when reading table 'XXX'
            ...
            2018-11-03 16:40:23 2489 [ERROR] Got error 1877 when reading table 'XXX'
            2018-11-03 16:46:47 29 [ERROR] InnoDB: Wrong owned count 8, 1, rec 129
            2018-11-03 16:46:47 29 [ERROR] InnoDB: Apparent corruption in space 1922 page 865 index `PRIMARY`
            2018-11-03 16:46:47 29 [ERROR] InnoDB: In page 865 of index `PRIMARY` of table `XXX`.`XXX`
            …
            Thread pointer: 0x1649b2d4d98
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            mysqld.exe!page_dir_find_owner_slot()[page0page.cc:99]
            mysqld.exe!page_rec_get_prev_const()[page0page.ic:762]
            mysqld.exe!btr_validate_level()[btr0btr.cc:5119]
            mysqld.exe!btr_validate_index()[btr0btr.cc:5474]
            mysqld.exe!ha_innobase::check()[ha_innodb.cc:14404]
            mysqld.exe!handler::ha_check()[handler.cc:4199]
            mysqld.exe!mysql_admin_table()[sql_admin.cc:803]
            mysqld.exe!Sql_cmd_check_table::execute()[sql_admin.cc:1348]
            mysqld.exe!mysql_execute_command()[sql_parse.cc:6295]
            mysqld.exe!mysql_parse()[sql_parse.cc:8096]
            

            marko Marko Mäkelä added a comment - - edited Also for ROW_FORMAT=COMPRESSED tables, it looks like corrupted pages will be wrongly added to the buffer pool. I was under the impression that MySQL would remove the corrupted page from the buffer pool (or not add it), but that definitely is not the case in MariaDB. Adding a corrupted page to the buffer pool would cause a crash anywhere when the page contents is accessed. Here is an example from CHECK TABLE : 10.3.10 Version: '10.3.10-MariaDB-log' socket: '' port: 3306 mariadb.org binary distribution 2018-11-03 16:40:21 2467 [ERROR] InnoDB: Unable to decompress XXX.ibd[page id: space=1922, page number=865] 2018-11-03 16:40:21 2470 [ERROR] Got error 126 when reading table 'XXX' 2018-11-03 16:40:21 2473 [ERROR] Got error 126 when reading table 'XXX' 2018-11-03 16:40:21 2475 [ERROR] Got error 126 when reading table 'XXX' 2018-11-03 16:40:21 2467 [ERROR] Got error 192 when reading table 'XXX' ... 2018-11-03 16:40:23 2489 [ERROR] Got error 1877 when reading table 'XXX' 2018-11-03 16:46:47 29 [ERROR] InnoDB: Wrong owned count 8, 1, rec 129 2018-11-03 16:46:47 29 [ERROR] InnoDB: Apparent corruption in space 1922 page 865 index `PRIMARY` 2018-11-03 16:46:47 29 [ERROR] InnoDB: In page 865 of index `PRIMARY` of table `XXX`.`XXX` … Thread pointer: 0x1649b2d4d98 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... mysqld.exe!page_dir_find_owner_slot()[page0page.cc:99] mysqld.exe!page_rec_get_prev_const()[page0page.ic:762] mysqld.exe!btr_validate_level()[btr0btr.cc:5119] mysqld.exe!btr_validate_index()[btr0btr.cc:5474] mysqld.exe!ha_innobase::check()[ha_innodb.cc:14404] mysqld.exe!handler::ha_check()[handler.cc:4199] mysqld.exe!mysql_admin_table()[sql_admin.cc:803] mysqld.exe!Sql_cmd_check_table::execute()[sql_admin.cc:1348] mysqld.exe!mysql_execute_command()[sql_parse.cc:6295] mysqld.exe!mysql_parse()[sql_parse.cc:8096]
            marko Marko Mäkelä made changes -
            Labels corruption recovery corruption encryption page_compression recovery
            marko Marko Mäkelä made changes -

            InnoDB also currently crashes when a tablespace is missing or inconsistent with the data dictionary. For example, see the crash from MDEV-17947:

            2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to do i/o to a tablespace which exists without .ibd data file. i/o type 10, space id 1355633, page no 0, i/o length 16384 bytes
            2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to access tablespace [space=1355633: page=0] but the tablespace does not exist or is just being dropped.
            2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: tablespace id is 1355633 in the data dictionary but in file ./DB1/TAB_TEMP.ibd it is 1354713!
             
            2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to do i/o to a tablespace which exists without .ibd data file. i/o type 10, space id 1355633, page no 0, i/o length 16384 bytes
            2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to access tablespace [space=1355633: page=0] but the tablespace does not exist or is just being dropped.
            2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Unable to read tablespace 1355633 page no 0 into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. You can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.6/en/ for more details. Aborting...
            2018-12-08 02:19:46 7f4a51ae9b00  InnoDB: Assertion failure in thread 139957174704896 in file ha_innodb.cc line 22061
            InnoDB: We intentionally generate a memory trap.
            InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            InnoDB: If you get repeated assertion failures or crashes, even
            InnoDB: immediately after the mysqld startup, there may be
            InnoDB: corruption in the InnoDB tablespace. Please refer to
            InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
            InnoDB: about forcing recovery.
            181208  2:19:46 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed, 
            something is definitely wrong and this may fail.
             
            Server version: 10.1.35-MariaDB
            key_buffer_size=16777216
            read_buffer_size=262144
            max_used_connections=5
            max_threads=2002
            thread_count=4
            It is possible that mysqld could use up to 
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1595581 K  bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
             
            Thread pointer: 0x7f41af9ed008
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7f4a51ae9140 thread_stack 0x48400
            mysys/stacktrace.c:268(my_print_stacktrace)[0x55a87a7089bb]
            sql/signal_handler.cc:168(handle_fatal_signal)[0x55a87a264d85]
            /lib64/libpthread.so.0(+0x36aca0f7e0)[0x7f4a518167e0]
            /lib64/libc.so.6(gsignal+0x35)[0x7f4a50086495]
            /lib64/libc.so.6(abort+0x175)[0x7f4a50087c75]
            handler/ha_innodb.cc:22034(ib_logf(ib_log_level_t, char const*, ...))[0x55a87a48fe53]
            buf/buf0buf.cc:3268(buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, dberr_t*))[0x55a87a5d90eb]
            fil/fil0fil.cc:3921(fsp_flags_try_adjust(unsigned long, unsigned long))[0x55a87a6352b3]
            fil/fil0fil.cc:5492(fil_space_for_table_exists_in_mem(unsigned long, char const*, bool, bool, bool, mem_block_info_t*, unsigned long, unsigned long))[0x55a87a635bc5]
            dict/dict0load.cc:2409(dict_load_table(char const*, unsigned long, dict_err_ignore_t))[0x55a87a618fb4]
            dict/dict0dict.cc:1155(dict_table_open_on_name(char const*, unsigned long, unsigned long, dict_err_ignore_t))[0x55a87a606e30]
            handler/ha_innodb.cc:6364(ha_innobase::open(char const*, int, unsigned int))[0x55a87a494753]
            sql/handler.cc:2533(handler::ha_open(TABLE*, char const*, int, unsigned int))[0x55a87a26928e]
            sql/table.cc:2975(open_table_from_share(THD*, TABLE_SHARE*, char const*, unsigned int, unsigned int, unsigned int, TABLE*, bool))[0x55a87a190023]
            sql/sql_base.cc:2565(open_table(THD*, TABLE_LIST*, Open_table_context*))[0x55a87a0a61dd]
            sql/sql_base.cc:4091(open_and_process_table)[0x55a87a0aa732]
            sql/sql_show.cc:1166(mysqld_show_create_get_fields(THD*, TABLE_LIST*, List<Item>*, String*))[0x55a87a14b762]
            sql/sql_show.cc:1260(mysqld_show_create(THD*, TABLE_LIST*))[0x55a87a14c0a7]
            sql/sql_parse.cc:3740(mysql_execute_command(THD*))[0x55a87a0ed485]
            sql/sql_parse.cc:7463(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55a87a0f22e5]
            sql/sql_parse.cc:7268(wsrep_mysql_parse)[0x55a87a0f23f4]
            sql/sql_parse.cc:1493(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x55a87a0f4807]
            sql/sql_parse.cc:1126(do_command(THD*))[0x55a87a0f52b7]
            sql/sql_connect.cc:1330(do_handle_one_connection(THD*))[0x55a87a1b699f]
            sql/sql_connect.cc:1244(handle_one_connection)[0x55a87a1b6ad7]
            /lib64/libpthread.so.0(+0x36aca07aa1)[0x7f4a5180eaa1]
            /lib64/libc.so.6(clone+0x6d)[0x7f4a5013cbdd]
             
            Trying to get some variables.
            Some pointers may be invalid and cause the dump to abort.
            Query (0x7f41b68d0020): show create table `TAB_TEMP`
            Connection ID (thread ID): 108197
            Status: NOT_KILLED
             
            Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
            

            Would you also want to prevent crashes in situations like that?

            GeoffMontee Geoff Montee (Inactive) added a comment - InnoDB also currently crashes when a tablespace is missing or inconsistent with the data dictionary. For example, see the crash from MDEV-17947 : 2018-12-08 2:19:46 139957174704896 [ERROR] InnoDB: Trying to do i/o to a tablespace which exists without .ibd data file. i/o type 10, space id 1355633, page no 0, i/o length 16384 bytes 2018-12-08 2:19:46 139957174704896 [ERROR] InnoDB: Trying to access tablespace [space=1355633: page=0] but the tablespace does not exist or is just being dropped. 2018-12-08 2:19:46 139957174704896 [ERROR] InnoDB: tablespace id is 1355633 in the data dictionary but in file ./DB1/TAB_TEMP.ibd it is 1354713! 2018-12-08 2:19:46 139957174704896 [ERROR] InnoDB: Trying to do i/o to a tablespace which exists without .ibd data file. i/o type 10, space id 1355633, page no 0, i/o length 16384 bytes 2018-12-08 2:19:46 139957174704896 [ERROR] InnoDB: Trying to access tablespace [space=1355633: page=0] but the tablespace does not exist or is just being dropped. 2018-12-08 2:19:46 139957174704896 [ERROR] InnoDB: Unable to read tablespace 1355633 page no 0 into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. You can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.6/en/ for more details. Aborting... 2018-12-08 02:19:46 7f4a51ae9b00 InnoDB: Assertion failure in thread 139957174704896 in file ha_innodb.cc line 22061 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 181208 2:19:46 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.35-MariaDB key_buffer_size=16777216 read_buffer_size=262144 max_used_connections=5 max_threads=2002 thread_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1595581 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f41af9ed008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f4a51ae9140 thread_stack 0x48400 mysys/stacktrace.c:268(my_print_stacktrace)[0x55a87a7089bb] sql/signal_handler.cc:168(handle_fatal_signal)[0x55a87a264d85] /lib64/libpthread.so.0(+0x36aca0f7e0)[0x7f4a518167e0] /lib64/libc.so.6(gsignal+0x35)[0x7f4a50086495] /lib64/libc.so.6(abort+0x175)[0x7f4a50087c75] handler/ha_innodb.cc:22034(ib_logf(ib_log_level_t, char const*, ...))[0x55a87a48fe53] buf/buf0buf.cc:3268(buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, dberr_t*))[0x55a87a5d90eb] fil/fil0fil.cc:3921(fsp_flags_try_adjust(unsigned long, unsigned long))[0x55a87a6352b3] fil/fil0fil.cc:5492(fil_space_for_table_exists_in_mem(unsigned long, char const*, bool, bool, bool, mem_block_info_t*, unsigned long, unsigned long))[0x55a87a635bc5] dict/dict0load.cc:2409(dict_load_table(char const*, unsigned long, dict_err_ignore_t))[0x55a87a618fb4] dict/dict0dict.cc:1155(dict_table_open_on_name(char const*, unsigned long, unsigned long, dict_err_ignore_t))[0x55a87a606e30] handler/ha_innodb.cc:6364(ha_innobase::open(char const*, int, unsigned int))[0x55a87a494753] sql/handler.cc:2533(handler::ha_open(TABLE*, char const*, int, unsigned int))[0x55a87a26928e] sql/table.cc:2975(open_table_from_share(THD*, TABLE_SHARE*, char const*, unsigned int, unsigned int, unsigned int, TABLE*, bool))[0x55a87a190023] sql/sql_base.cc:2565(open_table(THD*, TABLE_LIST*, Open_table_context*))[0x55a87a0a61dd] sql/sql_base.cc:4091(open_and_process_table)[0x55a87a0aa732] sql/sql_show.cc:1166(mysqld_show_create_get_fields(THD*, TABLE_LIST*, List<Item>*, String*))[0x55a87a14b762] sql/sql_show.cc:1260(mysqld_show_create(THD*, TABLE_LIST*))[0x55a87a14c0a7] sql/sql_parse.cc:3740(mysql_execute_command(THD*))[0x55a87a0ed485] sql/sql_parse.cc:7463(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55a87a0f22e5] sql/sql_parse.cc:7268(wsrep_mysql_parse)[0x55a87a0f23f4] sql/sql_parse.cc:1493(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x55a87a0f4807] sql/sql_parse.cc:1126(do_command(THD*))[0x55a87a0f52b7] sql/sql_connect.cc:1330(do_handle_one_connection(THD*))[0x55a87a1b699f] sql/sql_connect.cc:1244(handle_one_connection)[0x55a87a1b6ad7] /lib64/libpthread.so.0(+0x36aca07aa1)[0x7f4a5180eaa1] /lib64/libc.so.6(clone+0x6d)[0x7f4a5013cbdd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f41b68d0020): show create table `TAB_TEMP` Connection ID (thread ID): 108197 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off Would you also want to prevent crashes in situations like that?
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            julien.fritsch Julien Fritsch made changes -
            Support case ID 23523 24286 not-21898 not-22741 not-23404 not-23578 24286 not-21898 not-22741 not-23404 not-23578 not-23523
            julien.fritsch Julien Fritsch made changes -
            Support case ID 24286 not-21898 not-22741 not-23404 not-23523 not-23578 24286 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286
            julien.fritsch Julien Fritsch made changes -
            Support case ID 24286 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286
            julien.fritsch Julien Fritsch made changes -
            Support case ID not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 test
            julien.fritsch Julien Fritsch made changes -
            Support case ID not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 test not-21898 not-22741 not-23404 not-23523 not-23578 not-24286
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Vladislav Lesin [ vlad.lesin ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            marko Marko Mäkelä made changes -
            Richard Richard Stracke made changes -
            Support case ID not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 27019
            marko Marko Mäkelä made changes -
            jira-update-service Jira Update Service made changes -
            Support case ID 27019 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 not-21898 not-22741 not-23404 not-23523 not-23578 not-24286 not-27019
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            There is another crash because of error case mishandling:

             
            #6  0x00007f7afa8dabd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x564c4d8a7512 "bpage->buf_fix_count == 0", file=file@entry=0x564c4d8a5f08 "/home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc", line=line@entry=5811, function=function@entry=0x564c4d8abfa0 <buf_mark_space_corrupt(buf_page_t*, fil_space_t const*)::__PRETTY_FUNCTION__> "void buf_mark_space_corrupt(buf_page_t*, const fil_space_t*)") at assert.c:92
            #7  0x00007f7afa8dac82 in __GI___assert_fail (assertion=0x564c4d8a7512 "bpage->buf_fix_count == 0", file=0x564c4d8a5f08 "/home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc", line=5811, function=0x564c4d8abfa0 <buf_mark_space_corrupt(buf_page_t*, fil_space_t const*)::__PRETTY_FUNCTION__> "void buf_mark_space_corrupt(buf_page_t*, const fil_space_t*)") at assert.c:101
            #8  0x0000564c4d2cddf3 in buf_mark_space_corrupt (bpage=0x7f7aef82ec50, space=0x564c504c3bd0) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc:5811
            #9  0x0000564c4d2ced38 in buf_page_io_complete (bpage=0x7f7aef82ec50, dblwr=false, evict=false) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc:6107
            #10 0x0000564c4d2f62ea in buf_read_page_low (err=0x7f7ad6ffaef4, sync=true, type=0, mode=132, page_id=..., zip_size=0, unzip=false, ignore_missing_space=false) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0rea.cc:202
            #11 0x0000564c4d2f6bb7 in buf_read_page (page_id=..., zip_size=0) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0rea.cc:410
            #12 0x0000564c4d2c95bc in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x564c4d845878 "/home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc", line=1219, mtr=0x7f7ad6ffc890, err=0x7f7ad6ffb388) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc:4384
            #13 0x0000564c4d28adb5 in btr_cur_search_to_nth_level_func (index=0x7f7ac8022318, level=0, tuple=0x7f7ac000c688, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x564c5054d3a0, ahi_latch=0x0, file=0x564c4d845878 "/home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc", line=1219, mtr=0x7f7ad6ffc890, autoinc=0) at /home/thiru/mariarepo/10.4/storage/innobase/btr/btr0cur.cc:1544
            #14 0x0000564c4d1c8d4a in btr_pcur_open_low (index=0x7f7ac8022318, level=0, tuple=0x7f7ac000c688, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x564c5054d3a0, file=0x564c4d845878 "/home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc", line=1219, autoinc=0, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/include/btr0pcur.ic:459
            #15 0x0000564c4d1cbf3b in row_search_on_row_ref (pcur=0x564c5054d3a0, mode=2, table=0x7f7ac8020cd8, ref=0x7f7ac000c688, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc:1219
            #16 0x0000564c4d1c079c in row_purge_reposition_pcur (mode=2, node=0x564c5054d2f8, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:78
            #17 0x0000564c4d1c26f2 in row_purge_reset_trx_id (node=0x564c5054d2f8, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:798
            #18 0x0000564c4d1c3db0 in row_purge_record_func (node=0x564c5054d2f8, undo_rec=0x7f7ac80110c8 "", thr=0x564c5054d128, updated_extern=false) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:1205
            #19 0x0000564c4d1c3f80 in row_purge (node=0x564c5054d2f8, undo_rec=0x7f7ac80110c8 "", thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:1254
            #20 0x0000564c4d1c41a9 in row_purge_step (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:1315
            #21 0x0000564c4d146e89 in que_thr_step (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/que/que0que.cc:1042
            #22 0x0000564c4d1470bf in que_run_threads_low (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/que/que0que.cc:1104
            #23 0x0000564c4d1472b1 in que_run_threads (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/que/que0que.cc:1144
            #24 0x0000564c4d201ee4 in srv_task_execute () at /home/thiru/mariarepo/10.4/storage/innobase/srv/srv0srv.cc:2437
            #25 0x0000564c4d20209a in srv_worker_thread (arg=0x0) at /home/thiru/mariarepo/10.4/storage/innobase/srv/srv0srv.cc:2485
            
            

            thiru Thirunarayanan Balathandayuthapani added a comment - There is another crash because of error case mishandling:   #6 0x00007f7afa8dabd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x564c4d8a7512 "bpage->buf_fix_count == 0", file=file@entry=0x564c4d8a5f08 "/home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc", line=line@entry=5811, function=function@entry=0x564c4d8abfa0 <buf_mark_space_corrupt(buf_page_t*, fil_space_t const*)::__PRETTY_FUNCTION__> "void buf_mark_space_corrupt(buf_page_t*, const fil_space_t*)") at assert.c:92 #7 0x00007f7afa8dac82 in __GI___assert_fail (assertion=0x564c4d8a7512 "bpage->buf_fix_count == 0", file=0x564c4d8a5f08 "/home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc", line=5811, function=0x564c4d8abfa0 <buf_mark_space_corrupt(buf_page_t*, fil_space_t const*)::__PRETTY_FUNCTION__> "void buf_mark_space_corrupt(buf_page_t*, const fil_space_t*)") at assert.c:101 #8 0x0000564c4d2cddf3 in buf_mark_space_corrupt (bpage=0x7f7aef82ec50, space=0x564c504c3bd0) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc:5811 #9 0x0000564c4d2ced38 in buf_page_io_complete (bpage=0x7f7aef82ec50, dblwr=false, evict=false) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc:6107 #10 0x0000564c4d2f62ea in buf_read_page_low (err=0x7f7ad6ffaef4, sync=true, type=0, mode=132, page_id=..., zip_size=0, unzip=false, ignore_missing_space=false) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0rea.cc:202 #11 0x0000564c4d2f6bb7 in buf_read_page (page_id=..., zip_size=0) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0rea.cc:410 #12 0x0000564c4d2c95bc in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x564c4d845878 "/home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc", line=1219, mtr=0x7f7ad6ffc890, err=0x7f7ad6ffb388) at /home/thiru/mariarepo/10.4/storage/innobase/buf/buf0buf.cc:4384 #13 0x0000564c4d28adb5 in btr_cur_search_to_nth_level_func (index=0x7f7ac8022318, level=0, tuple=0x7f7ac000c688, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x564c5054d3a0, ahi_latch=0x0, file=0x564c4d845878 "/home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc", line=1219, mtr=0x7f7ad6ffc890, autoinc=0) at /home/thiru/mariarepo/10.4/storage/innobase/btr/btr0cur.cc:1544 #14 0x0000564c4d1c8d4a in btr_pcur_open_low (index=0x7f7ac8022318, level=0, tuple=0x7f7ac000c688, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x564c5054d3a0, file=0x564c4d845878 "/home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc", line=1219, autoinc=0, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/include/btr0pcur.ic:459 #15 0x0000564c4d1cbf3b in row_search_on_row_ref (pcur=0x564c5054d3a0, mode=2, table=0x7f7ac8020cd8, ref=0x7f7ac000c688, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0row.cc:1219 #16 0x0000564c4d1c079c in row_purge_reposition_pcur (mode=2, node=0x564c5054d2f8, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:78 #17 0x0000564c4d1c26f2 in row_purge_reset_trx_id (node=0x564c5054d2f8, mtr=0x7f7ad6ffc890) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:798 #18 0x0000564c4d1c3db0 in row_purge_record_func (node=0x564c5054d2f8, undo_rec=0x7f7ac80110c8 "", thr=0x564c5054d128, updated_extern=false) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:1205 #19 0x0000564c4d1c3f80 in row_purge (node=0x564c5054d2f8, undo_rec=0x7f7ac80110c8 "", thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:1254 #20 0x0000564c4d1c41a9 in row_purge_step (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/row/row0purge.cc:1315 #21 0x0000564c4d146e89 in que_thr_step (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/que/que0que.cc:1042 #22 0x0000564c4d1470bf in que_run_threads_low (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/que/que0que.cc:1104 #23 0x0000564c4d1472b1 in que_run_threads (thr=0x564c5054d128) at /home/thiru/mariarepo/10.4/storage/innobase/que/que0que.cc:1144 #24 0x0000564c4d201ee4 in srv_task_execute () at /home/thiru/mariarepo/10.4/storage/innobase/srv/srv0srv.cc:2437 #25 0x0000564c4d20209a in srv_worker_thread (arg=0x0) at /home/thiru/mariarepo/10.4/storage/innobase/srv/srv0srv.cc:2485
            marko Marko Mäkelä added a comment - - edited

            MDEV-19541 reports a crash during crash recovery, related to a slightly modified test of MDEV-12699.

            marko Marko Mäkelä added a comment - - edited MDEV-19541 reports a crash during crash recovery, related to a slightly modified test of MDEV-12699 .
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            For testing on Linux, we can create a block device with injected bad blocks, like this:

            dd if=/dev/zero of=/var/lib/virtualblock.img bs=512 count=1048576
            losetup /dev/loop0 /var/lib/virtualblock.img
             
            dmsetup create errdev0 << EOF
            0 261144 linear /dev/loop0 0
            261144 5 error
            261149 787427 linear /dev/loop0 261139
            EOF
            

            The device with the ‘hole’ in it will typically be created in /dev/mapper.

            marko Marko Mäkelä added a comment - For testing on Linux, we can create a block device with injected bad blocks, like this: dd if=/dev/zero of=/var/lib/virtualblock.img bs=512 count=1048576 losetup /dev/loop0 /var/lib/virtualblock.img   dmsetup create errdev0 << EOF 0 261144 linear /dev/loop0 0 261144 5 error 261149 787427 linear /dev/loop0 261139 EOF The device with the ‘hole’ in it will typically be created in /dev/mapper .
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            marko Marko Mäkelä made changes -
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.6 [ 24028 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.5 [ 23123 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6 [ 24028 ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            MDEV-23344 reports a crash when trying to access a page that does not exist in the tablespace.

            marko Marko Mäkelä added a comment - MDEV-23344 reports a crash when trying to access a page that does not exist in the tablespace.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels corruption encryption page_compression recovery ServiceNow corruption encryption page_compression recovery
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow corruption encryption page_compression recovery 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption encryption page_compression recovery
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption encryption page_compression recovery corruption encryption page_compression recovery
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 82110 ] MariaDB v4 [ 143964 ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            When it comes to reading corrupted pages into the buffer pool, we seem to already do the right thing with regard to buf_page_is_corrupted() and buf_page_t::read_complete(). One relevant change was MDEV-19541.

            It looks like the main part of the fix would be to check that error results from buf_page_get_gen() will be propagated from the caller.

            I would exclude that in change buffer merge from this. We have seen crashes in btr_page_reorganize() that were invoked from ibuf_insert_to_index_page_low() during ibuf_merge_or_delete_for_page(). Due to the design of the change buffer, such crashes can also occur during CHECK TABLE, where some extra attention has been spent to avoid crashes due to corruption. We already disabled the change buffer by default (MDEV-27734) and deprecated it (MDEV-27735) due to corruption that we are unable to reproduce in our internal testing.

            marko Marko Mäkelä added a comment - When it comes to reading corrupted pages into the buffer pool, we seem to already do the right thing with regard to buf_page_is_corrupted() and buf_page_t::read_complete() . One relevant change was MDEV-19541 . It looks like the main part of the fix would be to check that error results from buf_page_get_gen() will be propagated from the caller. I would exclude that in change buffer merge from this. We have seen crashes in btr_page_reorganize() that were invoked from ibuf_insert_to_index_page_low() during ibuf_merge_or_delete_for_page() . Due to the design of the change buffer, such crashes can also occur during CHECK TABLE , where some extra attention has been spent to avoid crashes due to corruption. We already disabled the change buffer by default ( MDEV-27734 ) and deprecated it ( MDEV-27735 ) due to corruption that we are unable to reproduce in our internal testing.

            It looks like we already have a large part of these robustness fixes present in 10.5 (added by MDEV-15528 and related changes) or 10.6 (added in MDEV-25506 and changes related to atomic DDL). I already went through all direct callers of buf_page_get_gen(), and there is not that much to be fixed. I still have to check all callers of some 12 functions and then implement some fault injection in buf_page_get_gen() so that we can run some stress tests.

            I would declare that those linked open bugs that are related to some other forms of corruption (say, a garbage BLOB pointer or DB_ROLL_PTR, or an incorrect key specified for decrypting an encrypted page) can be out of the scope of this fix. This fix should mainly be about handling failures to read a page.

            marko Marko Mäkelä added a comment - It looks like we already have a large part of these robustness fixes present in 10.5 (added by MDEV-15528 and related changes) or 10.6 (added in MDEV-25506 and changes related to atomic DDL). I already went through all direct callers of buf_page_get_gen() , and there is not that much to be fixed. I still have to check all callers of some 12 functions and then implement some fault injection in buf_page_get_gen() so that we can run some stress tests. I would declare that those linked open bugs that are related to some other forms of corruption (say, a garbage BLOB pointer or DB_ROLL_PTR , or an incorrect key specified for decrypting an encrypted page) can be out of the scope of this fix. This fix should mainly be about handling failures to read a page.
            marko Marko Mäkelä made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            The RQG testing on the development tree
                 origin/bb-10.6-MDEV-13542 dca6937a30eeee56f46fc47ac85656243227e4d2 2022-05-16T16:31:15+03:00
            showed frequent
            - SEGV's with a backtrace like in the description on top of MDEV-25257
            - cases where the server error log contains the message '[ERROR] mysqld got signal 11' but the server process
               did not disappear even after more than 300s waiting --> no core file nor rr trace
            After applying Thiru's initial patch for MDEV-25257
                diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc
            index 20916c2c96e..2c3d48b9573 100644
            --- a/storage/innobase/dict/dict0load.cc
            +++ b/storage/innobase/dict/dict0load.cc
            @@ -1383,6 +1383,7 @@ dict_load_columns(
                                    the flag is set before the table is created. */
                                    if (table->fts == NULL) {
                                            table->fts = fts_create(table);
            +                               table->fts->cache = fts_cache_create(table);
                                    }
                                    ut_a(table->fts->doc_col == ULINT_UNDEFINED);
             to the tree bb-10.6-MDEV-13542 both bad effects were no more seen again in testing.
             
            Now some new dominant problem showed up.
            [rr 3532891 209362]mysqld: /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682: buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool): Assertion `mode == 11 || mode == 12 || block->zip_size() == zip_size' failed.
            (rr) bt
            #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1  0x00007fcf37482859 in __GI_abort () at abort.c:79
            #2  0x00007fcf37482729 in __assert_fail_base (fmt=0x7fcf37618588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", 
                file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", line=2682, function=<optimized out>) at assert.c:92
            #3  0x00007fcf37493f36 in __GI___assert_fail (assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", 
                line=2682, function=0x55996a841880 "buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool)") at assert.c:101
            #4  0x00005599696fa35f in buf_page_get_low (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false)
                at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682
            #5  0x00005599696fbaad in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false)
                at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2984
            #6  0x00005599696ae711 in btr_cur_pess_upd_restore_supremum (block=0x7fcf2aea57e0, rec=0x7fcf2b49e781 "", mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:4782
            #7  0x00005599696b1f88 in btr_cur_pessimistic_update (flags=10, cursor=0x61100005f180, offsets=0x7fcf1a61aac0, offsets_heap=0x7fcf1a61abc0, entry_heap=0x619000bf9080, big_rec=0x7fcf1a61aaa0, update=0x62100026f3f0, cmpl_info=0, 
                thr=0x62100026f750, trx_id=1719, mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:5243
            #8  0x000055996956ff1b in row_upd_clust_rec (flags=0, node=0x62100026f2c0, index=0x616002f91a08, offsets=0x619000bf9850, offsets_heap=0x7fcf1a61abc0, thr=0x62100026f750, mtr=0x7fcf1a61aec0)
                at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2456
            #9  0x0000559969571812 in row_upd_clust_step (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2729
            #10 0x0000559969571f69 in row_upd (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2792
            #11 0x0000559969572a6b in row_upd_step (thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2934
            #12 0x00005599694d46e5 in row_update_for_mysql (prebuilt=0x62100026e988) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0mysql.cc:1691
            #13 0x000055996914b046 in ha_innobase::update_row (this=0x61d0007f94b8, old_row=0x61a0003c80f8 <incomplete sequence \375>, new_row=0x61a0003c7eb8 "\265")
                at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/handler/ha_innodb.cc:8668
            #14 0x00005599688687dd in handler::ha_update_row (this=0x61d0007f94b8, old_data=0x61a0003c80f8 <incomplete sequence \375>, new_data=0x61a0003c7eb8 "\265") at /data/Server/bb-10.6-MDEV-13542G/sql/handler.cc:7602
            #15 0x00005599683484c9 in mysql_update (thd=0x62b00009a218, table_list=0x62b0000a13f8, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551602, ignore=false, found_return=0x7fcf1a61c2f0, 
                updated_return=0x7fcf1a61c310) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_update.cc:1087
            #16 0x000055996806794f in mysql_execute_command (thd=0x62b00009a218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:4423
            #17 0x000055996807fd40 in mysql_parse (thd=0x62b00009a218, rawbuf=0x62b0000a1238 "UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount)  /* E_R Thread2 QNO 1139 CON_ID 16 */", length=116, 
                parser_state=0x7fcf1a61cb20) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:8045
            #18 0x0000559968058133 in dispatch_command (command=COM_QUERY, thd=0x62b00009a218, packet=0x629000c53219 " UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount)  /* E_R Thread2 QNO 1139 CON_ID 16 */ ", 
                packet_length=118, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1912
            #19 0x0000559968055369 in do_command (thd=0x62b00009a218, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1409
            #20 0x0000559968458c53 in do_handle_one_connection (connect=0x608000002eb8, put_in_cache=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1418
            #21 0x00005599684584df in handle_one_connection (arg=0x608000002838) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1312
            #22 0x00007fcf379ac609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #23 0x00007fcf3757f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr)
            pluto:/data/results/1652901598/TBR-1503/dev/shm/rqg/1652901598/57/1/rr
             
            This problem disappeared after applying Thiru's cursor_restore_supremum.patch.
            

            mleich Matthias Leich added a comment - The RQG testing on the development tree origin/bb-10.6-MDEV-13542 dca6937a30eeee56f46fc47ac85656243227e4d2 2022-05-16T16:31:15+03:00 showed frequent - SEGV's with a backtrace like in the description on top of MDEV-25257 - cases where the server error log contains the message '[ERROR] mysqld got signal 11' but the server process did not disappear even after more than 300s waiting --> no core file nor rr trace After applying Thiru's initial patch for MDEV-25257 diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc index 20916c2c96e..2c3d48b9573 100644 --- a/storage/innobase/dict/dict0load.cc +++ b/storage/innobase/dict/dict0load.cc @@ -1383,6 +1383,7 @@ dict_load_columns( the flag is set before the table is created. */ if (table->fts == NULL) { table->fts = fts_create(table); + table->fts->cache = fts_cache_create(table); } ut_a(table->fts->doc_col == ULINT_UNDEFINED); to the tree bb-10.6-MDEV-13542 both bad effects were no more seen again in testing.   Now some new dominant problem showed up. [rr 3532891 209362]mysqld: /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682: buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool): Assertion `mode == 11 || mode == 12 || block->zip_size() == zip_size' failed. (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fcf37482859 in __GI_abort () at abort.c:79 #2 0x00007fcf37482729 in __assert_fail_base (fmt=0x7fcf37618588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", line=2682, function=<optimized out>) at assert.c:92 #3 0x00007fcf37493f36 in __GI___assert_fail (assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", line=2682, function=0x55996a841880 "buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool)") at assert.c:101 #4 0x00005599696fa35f in buf_page_get_low (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682 #5 0x00005599696fbaad in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2984 #6 0x00005599696ae711 in btr_cur_pess_upd_restore_supremum (block=0x7fcf2aea57e0, rec=0x7fcf2b49e781 "", mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:4782 #7 0x00005599696b1f88 in btr_cur_pessimistic_update (flags=10, cursor=0x61100005f180, offsets=0x7fcf1a61aac0, offsets_heap=0x7fcf1a61abc0, entry_heap=0x619000bf9080, big_rec=0x7fcf1a61aaa0, update=0x62100026f3f0, cmpl_info=0, thr=0x62100026f750, trx_id=1719, mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:5243 #8 0x000055996956ff1b in row_upd_clust_rec (flags=0, node=0x62100026f2c0, index=0x616002f91a08, offsets=0x619000bf9850, offsets_heap=0x7fcf1a61abc0, thr=0x62100026f750, mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2456 #9 0x0000559969571812 in row_upd_clust_step (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2729 #10 0x0000559969571f69 in row_upd (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2792 #11 0x0000559969572a6b in row_upd_step (thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2934 #12 0x00005599694d46e5 in row_update_for_mysql (prebuilt=0x62100026e988) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0mysql.cc:1691 #13 0x000055996914b046 in ha_innobase::update_row (this=0x61d0007f94b8, old_row=0x61a0003c80f8 <incomplete sequence \375>, new_row=0x61a0003c7eb8 "\265") at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/handler/ha_innodb.cc:8668 #14 0x00005599688687dd in handler::ha_update_row (this=0x61d0007f94b8, old_data=0x61a0003c80f8 <incomplete sequence \375>, new_data=0x61a0003c7eb8 "\265") at /data/Server/bb-10.6-MDEV-13542G/sql/handler.cc:7602 #15 0x00005599683484c9 in mysql_update (thd=0x62b00009a218, table_list=0x62b0000a13f8, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551602, ignore=false, found_return=0x7fcf1a61c2f0, updated_return=0x7fcf1a61c310) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_update.cc:1087 #16 0x000055996806794f in mysql_execute_command (thd=0x62b00009a218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:4423 #17 0x000055996807fd40 in mysql_parse (thd=0x62b00009a218, rawbuf=0x62b0000a1238 "UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount) /* E_R Thread2 QNO 1139 CON_ID 16 */", length=116, parser_state=0x7fcf1a61cb20) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:8045 #18 0x0000559968058133 in dispatch_command (command=COM_QUERY, thd=0x62b00009a218, packet=0x629000c53219 " UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount) /* E_R Thread2 QNO 1139 CON_ID 16 */ ", packet_length=118, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1912 #19 0x0000559968055369 in do_command (thd=0x62b00009a218, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1409 #20 0x0000559968458c53 in do_handle_one_connection (connect=0x608000002eb8, put_in_cache=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1418 #21 0x00005599684584df in handle_one_connection (arg=0x608000002838) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1312 #22 0x00007fcf379ac609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #23 0x00007fcf3757f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) pluto:/data/results/1652901598/TBR-1503/dev/shm/rqg/1652901598/57/1/rr   This problem disappeared after applying Thiru's cursor_restore_supremum.patch.
            mleich Matthias Leich made changes -
            Attachment cursor_restore_supremum.patch [ 63815 ]

            thiru, thank you for cursor_restore_supremum.patch. I hope that my simpler solution works as well.

            Even today, the fault injection exposed a couple new crashes, which I fixed:

            ./mtr --parallel=auto --big-test --force --max-test-fail=0 --mysqld=--debug=d,intermittent_read_failure
            

            My main effort today was to prevent crashes on change buffer merge failure (motivated by MDEV-28349) as well as on page split or merge. Such corruption should not be a direct consequence of a page checksum mismatch, but something else (say, MDEV-28312, a bug in Galera snapshot transfer).

            marko Marko Mäkelä added a comment - thiru , thank you for cursor_restore_supremum.patch . I hope that my simpler solution works as well. Even today, the fault injection exposed a couple new crashes, which I fixed: ./mtr --parallel=auto --big-test --force --max-test-fail=0 --mysqld=--debug=d,intermittent_read_failure My main effort today was to prevent crashes on change buffer merge failure (motivated by MDEV-28349 ) as well as on page split or merge. Such corruption should not be a direct consequence of a page checksum mismatch, but something else (say, MDEV-28312 , a bug in Galera snapshot transfer).
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2022-06-06 14:56:03.0 2022-06-06 14:56:03.901
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.9 [ 27507 ]
            Fix Version/s 10.7.5 [ 27505 ]
            Fix Version/s 10.8.4 [ 27503 ]
            Fix Version/s 10.9.2 [ 27115 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Description InnoDB is notoriously aborting the whole MariaDB server if it is trying to access a corrupted page.
            MDEV-12253 alleviated the issue somewhat, but not for background processing, such as the purge of no-longer-needed history.
            This bug exists in all InnoDB and XtraDB versions.
            MariaDB essentially must fix [MySQL Bug#10132 Crashing the server on corrupt InnoDB page is unhelpful|https://bugs.mysql.com/bug.php?id=10132] that was filed on April 25, 2005.

            In 10.3 with MDEV-12288 and MDEV-13536, the InnoDB purge threads will access pages more often, causing all sorts of debug assertion failures in the B-tree code. To work around this problem, tests that intentionally cause B-tree pages to be unreadable are amended with the following parameters, so that everything will be purged before the server is restarted to deal with the corruption:
            {noformat}
            --innodb-purge-rseg-truncate-frequency=1
            --skip-innodb-fast-shutdown
            {noformat}
            Since the very first version, InnoDB was aborting the whole MySQL or MariaDB server when trying to access a corrupted page.


            MDEV-12253 alleviated the issue somewhat, but not for background processing, such as the purge of no-longer-needed history.
            This bug exists in all InnoDB and XtraDB versions.
            MariaDB essentially must fix [MySQL Bug#10132 Crashing the server on corrupt InnoDB page is unhelpful|https://bugs.mysql.com/bug.php?id=10132] that was filed on April 25, 2005.

            In 10.3 with MDEV-12288 and MDEV-13536, the InnoDB purge threads will access pages more often, causing all sorts of debug assertion failures in the B-tree code. To work around this problem, tests that intentionally cause B-tree pages to be unreadable are amended with the following parameters, so that everything will be purged before the server is restarted to deal with the corruption:
            {noformat}
            --innodb-purge-rseg-truncate-frequency=1
            --skip-innodb-fast-shutdown
            {noformat}
            marko Marko Mäkelä made changes -
            Description Since the very first version, InnoDB was aborting the whole MySQL or MariaDB server when trying to access a corrupted page.


            MDEV-12253 alleviated the issue somewhat, but not for background processing, such as the purge of no-longer-needed history.
            This bug exists in all InnoDB and XtraDB versions.
            MariaDB essentially must fix [MySQL Bug#10132 Crashing the server on corrupt InnoDB page is unhelpful|https://bugs.mysql.com/bug.php?id=10132] that was filed on April 25, 2005.

            In 10.3 with MDEV-12288 and MDEV-13536, the InnoDB purge threads will access pages more often, causing all sorts of debug assertion failures in the B-tree code. To work around this problem, tests that intentionally cause B-tree pages to be unreadable are amended with the following parameters, so that everything will be purged before the server is restarted to deal with the corruption:
            {noformat}
            --innodb-purge-rseg-truncate-frequency=1
            --skip-innodb-fast-shutdown
            {noformat}
            Since the very first version, InnoDB was aborting the whole MySQL or MariaDB server when trying to access a corrupted page.

            Earlier, small steps have been taken to improve MariaDB stability, such as MDEV-12253.

            We must enable error handling for corrupted pages instead of allowing the server to crash. This is not only about detecting a checksum failure when reading a corrupted page (like suggested in [MySQL Bug#10132 Crashing the server on corrupt InnoDB page is unhelpful|https://bugs.mysql.com/bug.php?id=10132], filed on April 25, 2005), but also about performing some operations on corrupted pages whose checksum appears valid.

            In some background operations, such as the purge of committed transaction history, errors may be reported to the server error log only, or silently ignored. While executing SQL, errors should be reported back to the client, and the transaction could be rolled back.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 110821 114635 110277 135951 119185 118441 132996 109663 161967 179756 175774
            Zendesk active tickets 201658
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            ParadoxV5 Jimmy Hú made changes -
            marko Marko Mäkelä made changes -
            alessandro.vetere Alessandro Vetere made changes -
            thiru Thirunarayanan Balathandayuthapani made changes -

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              7 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.