Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26979

heap-use-after-free or SIGSEGV when accessing INNODB_SYS_TABLESTATS during DDL

Details

    Description

      Bad effect detected by [~elenst] and easy reproduced after hint of [~marko].
      origin/10.6 3ab358f0f1c852c71a539bb2b18b3cf828cc0007 2021-11-04T10:22:06+01:00
       
      In case something like
         SELECT * FROM information_schema.innodb_sys_tablestats ;
      gets frequent executed within some average test running a lot DDL
      than the following might show up
       
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 55483][rr 1428002 55487]==1428002==ERROR: AddressSanitizer: heap-use-after-free on address 0x618000078955 at pc 0x55df0e4f1f7a bp 0x7f882ec63b30 sp 0x7f882ec63b20
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 55490][rr 1428002 55492]READ of size 1 at 0x618000078955 thread T30
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 55996]2021-11-05  6:52:55 32 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 61951]    #0 0x55df0e4f1f79 in dict_sys_t::find(dict_table_t const*) /data/Server/10.6I/storage/innobase/include/dict0dict.h:1513
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 61965]    #1 0x55df0e4f2160 in dict_sys_t::allow_eviction(dict_table_t*) (/data/Server_bin/10.6I_asan/bin/mariadbd+0x28e3160)
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 61975]    #2 0x55df0e4d0eef in i_s_sys_tables_fill_table_stats /data/Server/10.6I/storage/innobase/handler/i_s.cc:5129
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 61993]    #3 0x55df0d4d1d99 in get_schema_tables_result(JOIN*, enum_schema_table_state) /data/Server/10.6I/sql/sql_show.cc:8842
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62032]    #4 0x55df0d3b25e0 in JOIN::exec_inner() /data/Server/10.6I/sql/sql_select.cc:4694
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62034]    #5 0x55df0d3b05c5 in JOIN::exec() /data/Server/10.6I/sql/sql_select.cc:4515
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62042]    #6 0x55df0d3b41c6 in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /data/Server/10.6I/sql/sql_select.cc:4994
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62044]    #7 0x55df0d38928e in handle_select(THD*, LEX*, select_result*, unsigned long) /data/Server/10.6I/sql/sql_select.cc:545
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62078]    #8 0x55df0d303e25 in execute_sqlcom_select /data/Server/10.6I/sql/sql_parse.cc:6256
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62080]    #9 0x55df0d2f2b38 in mysql_execute_command(THD*, bool) /data/Server/10.6I/sql/sql_parse.cc:3946
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62082]    #10 0x55df0d30de8b in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.6I/sql/sql_parse.cc:8030
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62084]    #11 0x55df0d2e6108 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.6I/sql/sql_parse.cc:1896
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62086]    #12 0x55df0d2e34e0 in do_command(THD*, bool) /data/Server/10.6I/sql/sql_parse.cc:1404
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62092]    #13 0x55df0d6e2f32 in do_handle_one_connection(CONNECT*, bool) /data/Server/10.6I/sql/sql_connect.cc:1418
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62102]    #14 0x55df0d6e27c3 in handle_one_connection /data/Server/10.6I/sql/sql_connect.cc:1312
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62110]    #15 0x7f8863a0f608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62112]    #16 0x7f88635e2292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
      ....
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 62554]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.6I/storage/innobase/include/dict0dict.h:1513 in dict_sys_t::find(dict_table_t const*)
      # 2021-11-05T06:55:57 [1426338] | Query (0x62b000214238): SELECT * FROM information_schema.innodb_sys_tablestats
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 78184]
      # 2021-11-05T06:55:57 [1426338] | Connection ID (thread ID): 33
      # 2021-11-05T06:55:57 [1426338] | [rr 1428002 78186]Status: KILL_TIMEOUT
       
      sdp:/data/Results/1636120301/TBR-1218/dev/shm/vardir/1636120301/56/1/rr
       
      RQG
      -------
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental fe8ba343b097bcb54fe31445e5d5699546b53e66 2021-11-03T20:19:53+01:00
      # rqg.pl  : Version 3.4 (2021-09)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/runtime/alter_online.yy \
      # --gendata=conf/runtime/alter_online.zz \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --redefine=ElenaMarko.yy \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--loose-innodb-sync-debug \
      # --mysqld=--innodb_stats_persistent=off \
      # --mysqld=--innodb_adaptive_hash_index=on \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --rr=Extended \
      # --rr_options=--wait \
      # --mysqld=--innodb_undo_tablespaces=3 \
      # --mysqld=--innodb_undo_log_truncate=ON \
      # --mysqld=--innodb_page_size=8K \
      # --mysqld=--innodb-buffer-pool-size=256M \
      # --no_mask \
      # <certain local settings>
       
      ElenaMarko.yy contains
      query_add:
          SELECT * FROM information_schema.innodb_sys_tablestats ;
      
      

      Attachments

        Activity

          alice Alice Sherepa added a comment -

          I am getting probably the same, but without ASAN - there is the assertion `find(table)' in dict_sys_t::allow_eviction

          mysqld: /git/10.9/storage/innobase/include/dict0dict.h:1531: void dict_sys_t::allow_eviction(dict_table_t*): Assertion `find(table)' failed.
          220405 10:25:19 [ERROR] mysqld got signal 6 ;
           
          Server version: 10.9.0-MariaDB-debug-log
           
          ??:0(__assert_fail)[0x7fd4b5646102]
          include/dict0dict.h:1532(dict_sys_t::allow_eviction(dict_table_t*))[0x55976087c994]
          handler/i_s.cc:5145(i_s_sys_tables_fill_table_stats(THD*, TABLE_LIST*, Item*))[0x5597608716c9]
          sql/sql_show.cc:8883(get_schema_tables_result(JOIN*, enum_schema_table_state))[0x5597600ffb06]
          sql/sql_select.cc:4732(JOIN::exec_inner())[0x55976008e76c]
          sql/sql_select.cc:4555(JOIN::exec())[0x55976008dad3]
          sql/sql_select.cc:5036(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55976008f419]
          sql/sql_select.cc:570(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55976007e6ae]
          sql/sql_parse.cc:6260(execute_sqlcom_select(THD*, TABLE_LIST*))[0x5597600279d4]
          sql/sql_parse.cc:3950(mysql_execute_command(THD*, bool))[0x55976001ec3f]
          sql/sp_head.cc:3835(sp_instr_stmt::exec_core(THD*, unsigned int*))[0x55975ff216f1]
          sql/sp_head.cc:3561(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x55975ff20a6a]
          sql/sp_head.cc:3741(sp_instr_stmt::execute(THD*, unsigned int*))[0x55975ff212ad]
          sql/sp_head.cc:1438(sp_head::execute(THD*, bool))[0x55975ff1a7a9]
          sql/sp_head.cc:2425(sp_head::execute_procedure(THD*, List<Item>*))[0x55975ff1d3a5]
          sql/sql_parse.cc:3029(do_execute_sp(THD*, sp_head*))[0x55976001be94]
          sql/sql_parse.cc:3275(Sql_cmd_call::execute(THD*))[0x55976001cac8]
          sql/sql_parse.cc:5995(mysql_execute_command(THD*, bool))[0x55976002669c]
          sql/sql_parse.cc:8035(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55976002c78f]
          sql/sql_parse.cc:1897(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x559760018ced]
          sql/sql_parse.cc:1403(do_command(THD*, bool))[0x5597600175ba]
          sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x5597601e956c]
          sql/sql_connect.cc:1314(handle_one_connection)[0x5597601e9202]
          perfschema/pfs.cc:2203(pfs_spawn_thread)[0x559760719bcf]
          nptl/pthread_create.c:487(start_thread)[0x7fd4b5b03fa3]
          x86_64/clone.S:97(clone)[0x7fd4b570eeff]
          
          

          alice Alice Sherepa added a comment - I am getting probably the same, but without ASAN - there is the assertion `find(table)' in dict_sys_t::allow_eviction mysqld: /git/10.9/storage/innobase/include/dict0dict.h:1531: void dict_sys_t::allow_eviction(dict_table_t*): Assertion `find(table)' failed. 220405 10:25:19 [ERROR] mysqld got signal 6 ;   Server version: 10.9.0-MariaDB-debug-log   ??:0(__assert_fail)[0x7fd4b5646102] include/dict0dict.h:1532(dict_sys_t::allow_eviction(dict_table_t*))[0x55976087c994] handler/i_s.cc:5145(i_s_sys_tables_fill_table_stats(THD*, TABLE_LIST*, Item*))[0x5597608716c9] sql/sql_show.cc:8883(get_schema_tables_result(JOIN*, enum_schema_table_state))[0x5597600ffb06] sql/sql_select.cc:4732(JOIN::exec_inner())[0x55976008e76c] sql/sql_select.cc:4555(JOIN::exec())[0x55976008dad3] sql/sql_select.cc:5036(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55976008f419] sql/sql_select.cc:570(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55976007e6ae] sql/sql_parse.cc:6260(execute_sqlcom_select(THD*, TABLE_LIST*))[0x5597600279d4] sql/sql_parse.cc:3950(mysql_execute_command(THD*, bool))[0x55976001ec3f] sql/sp_head.cc:3835(sp_instr_stmt::exec_core(THD*, unsigned int*))[0x55975ff216f1] sql/sp_head.cc:3561(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x55975ff20a6a] sql/sp_head.cc:3741(sp_instr_stmt::execute(THD*, unsigned int*))[0x55975ff212ad] sql/sp_head.cc:1438(sp_head::execute(THD*, bool))[0x55975ff1a7a9] sql/sp_head.cc:2425(sp_head::execute_procedure(THD*, List<Item>*))[0x55975ff1d3a5] sql/sql_parse.cc:3029(do_execute_sp(THD*, sp_head*))[0x55976001be94] sql/sql_parse.cc:3275(Sql_cmd_call::execute(THD*))[0x55976001cac8] sql/sql_parse.cc:5995(mysql_execute_command(THD*, bool))[0x55976002669c] sql/sql_parse.cc:8035(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55976002c78f] sql/sql_parse.cc:1897(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x559760018ced] sql/sql_parse.cc:1403(do_command(THD*, bool))[0x5597600175ba] sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x5597601e956c] sql/sql_connect.cc:1314(handle_one_connection)[0x5597601e9202] perfschema/pfs.cc:2203(pfs_spawn_thread)[0x559760719bcf] nptl/pthread_create.c:487(start_thread)[0x7fd4b5b03fa3] x86_64/clone.S:97(clone)[0x7fd4b570eeff]

          Sorry, I forgot to check that trace before it was removed. Could a new trace be produced?

          marko Marko Mäkelä added a comment - Sorry, I forgot to check that trace before it was removed. Could a new trace be produced?

          pluto:/data/results/1656332839/TBR-1218-MDEV-26979$ _RR_TRACE_DIR=./1/rr/ rr replay
           
          Maybe a sibling of the problem
          SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.6Y/storage/innobase/handler/i_s.cc:5034 in i_s_dict_fill_sys_tablestats
          pluto:/data/results/1656332839/TBR-1557$ _RR_TRACE_DIR=./1/rr/ rr replay
          

          mleich Matthias Leich added a comment - pluto:/data/results/1656332839/TBR-1218-MDEV-26979$ _RR_TRACE_DIR=./1/rr/ rr replay   Maybe a sibling of the problem SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.6Y/storage/innobase/handler/i_s.cc:5034 in i_s_dict_fill_sys_tablestats pluto:/data/results/1656332839/TBR-1557$ _RR_TRACE_DIR=./1/rr/ rr replay

          This is a race condition between a DDL statement that is rebuilding or removing a table, and access to INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS. The call to dict_sys_t::prevent_eviction() will only detach a table from dict_sys.table_LRU and attach it to dict_sys.table_non_LRU. It will not prevent the table from being removed from the cache as part of DDL transaction commit.

          The fix is to copy all data from table_rec to table->to_fill->field before releasing dict_sys.latch. In that way, we can also remove the hack to prevent table eviction and acquire dict_sys.latch in shared mode.

          marko Marko Mäkelä added a comment - This is a race condition between a DDL statement that is rebuilding or removing a table, and access to INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS . The call to dict_sys_t::prevent_eviction() will only detach a table from dict_sys.table_LRU and attach it to dict_sys.table_non_LRU . It will not prevent the table from being removed from the cache as part of DDL transaction commit. The fix is to copy all data from table_rec to table->to_fill->field before releasing dict_sys.latch . In that way, we can also remove the hack to prevent table eviction and acquire dict_sys.latch in shared mode.

          People

            marko Marko Mäkelä
            mleich Matthias Leich
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.