Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27616

heap-use-after-free storage/innobase/handler/handler0alter.cc:6787 in prepare_inplace_alter_table_dict

Details

    Description

      The ASAN failure was found on
      origin/10.8, 10.8 5595ed9d9f20aeb463ac26e075c9cba41927e85e 2022-01-24T19:26:09+09:00
      + patch for MDEV-19555 + preliminary patch for MDEV-14481.
      Per Marko its unlikely that the problem is caused by the patches.
       
      ==3771038==ERROR: AddressSanitizer: heap-use-after-free on address 0x61f0000492c4 at pc 0x55ea1c35b1c7 bp 0x7f09e985f380 sp 0x7f09e985f370
      READ of size 4 at 0x61f0000492c4 thread T39
      [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
      #0 0x55ea1c35b1c6 in prepare_inplace_alter_table_dict /data/Server/10.8-MDEV-14481/storage/innobase/handler/handler0alter.cc:6787
      #1 0x55ea1c36b433 in ha_innobase::prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/storage/innobase/handler/handler0alter.cc:8276
      #2 0x55ea1ba3825f in handler::ha_prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/sql/handler.cc:5172
      #3 0x55ea1c1db64b in ha_partition::prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/sql/ha_partition.cc:10401
      #4 0x55ea1ba3825f in handler::ha_prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/sql/handler.cc:5172
      #5 0x55ea1b4ab27b in mysql_inplace_alter_table /data/Server/10.8-MDEV-14481/sql/sql_table.cc:7450
      #6 0x55ea1b4bf5c8 in mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool) /data/Server/10.8-MDEV-14481/sql/sql_table.cc:10347
      #7 0x55ea1b64af5f in Sql_cmd_alter_table::execute(THD*) /data/Server/10.8-MDEV-14481/sql/sql_alter.cc:544
      #8 0x55ea1b219b50 in mysql_execute_command(THD*, bool) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:5988
      #9 0x55ea1b225f8e in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:8027
      #10 0x55ea1b1fe37f in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:1894
      #11 0x55ea1b1fb757 in do_command(THD*, bool) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:1402
      ...
      Query (0x6290012e8238): ALTER TABLE a ENGINE = INNODB
      Status: KILL_TIMEOUT
       
      MDEV-27616.log and also pluto:/data/results/1643042626/TBR-1333/rqg.log
      == RQG log which contains output written by ASAN etc.
      pluto:/data/results/1643042626/TBR-1333 # Unpacked 
      pluto:/data/results/1643042626/TBR-1333/dev/shm/rqg/1643042626/19/1/rr
      == location of rr trace
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental 17c90b97a931f3b395258aec37903273627de6d7 2022-01-21T11:19:46+01:00
      # rqg.pl  : Version 4.0.4 (2021-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/partitions_innodb.yy \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma \
      # --mysqld=--plugin-load-add=provider_snappy \
      # --mysqld=--plugin-load-add=provider_lz4 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=on \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --vardir_type=fast \
      # --mysqld=--innodb_page_size=16K \
      # --mysqld=--innodb-buffer-pool-size=256M \
      # --no_mask \
      # <some local settings>
       
      
      

      Attachments

        Activity

          mleich Could you share the entire test case to reproduce the ASAN error?

          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - - edited mleich Could you share the entire test case to reproduce the ASAN error?
          mleich Matthias Leich added a comment - - edited

          Hi Nayuta, its a bit complicated.
          The description contains how to clone my version of RQG and also how the call of the test which showed the problem looked like.
          But this is only half of the truth because hitting that problem is extreme rare and definitely not deterministic.
          I had one hit per 10000 more or less differing RQG tests executed.
          ~ 273  fiddled excessive with partitioning DDL
          ~ 1000 fiddled a bit with partitioning DDL
          There was also most time a CPU load (user+system) > 97% on the testing box
          (nproc=104, 384GB RAM). This was achieved by having around 220 RQG runs
          in parallel and managed by my RQG batch tool.
          Please try some analysis based on the rr trace first.
          If it looks too complicated or hopeless than I can try to simplify the test.
          
          

          mleich Matthias Leich added a comment - - edited Hi Nayuta, its a bit complicated. The description contains how to clone my version of RQG and also how the call of the test which showed the problem looked like. But this is only half of the truth because hitting that problem is extreme rare and definitely not deterministic. I had one hit per 10000 more or less differing RQG tests executed. ~ 273 fiddled excessive with partitioning DDL ~ 1000 fiddled a bit with partitioning DDL There was also most time a CPU load (user+system) > 97% on the testing box (nproc=104, 384GB RAM). This was achieved by having around 220 RQG runs in parallel and managed by my RQG batch tool. Please try some analysis based on the rr trace first. If it looks too complicated or hopeless than I can try to simplify the test.

          I understood the situation. OK. I will check the rr trace soon.

          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - I understood the situation. OK. I will check the rr trace soon.

          I tried to reproduce the problem and to simplify the test by using the automatic RQG Simplifier.
          No replay and also no other unknown bad effect even though experimenting with different setups.

          mleich Matthias Leich added a comment - I tried to reproduce the problem and to simplify the test by using the automatic RQG Simplifier. No replay and also no other unknown bad effect even though experimenting with different setups.

          People

            holyfoot Alexey Botchkov
            mleich Matthias Leich
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.