Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22782

SUMMARY: AddressSanitizer: unknown-crash storage/innobase/trx/trx0trx.cc:566 in trx_t::commit_state()

Details

    Description

      origin/10.2 50641db2d11ad8a2228f7938d851e52decb71a9b 2020-06-01T15:38:04+02:00
      ASAN build
       
      # 2020-06-02T13:57:33 [21795] | [rr 22403 330405][rr 22403 330417]==22403==ERROR: AddressSanitizer: unknown-crash on address 0x3cd14f7066a4 at pc 0x55e7af5e0b92 bp 0x3b2a2785b8c0 sp 0x3b2a2785b8b0
      # 2020-06-02T13:57:33 [21795] | [rr 22403 330420][rr 22403 330422]WRITE of size 1 at 0x3cd14f7066a4 thread T65
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335028]    #0 0x55e7af5e0b91 in trx_t::commit_state() /home/mleich/10.2/storage/innobase/trx/trx0trx.cc:566
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335030]    #1 0x55e7af5d70e7 in trx_commit_in_memory /home/mleich/10.2/storage/innobase/trx/trx0trx.cc:1708
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335032]    #2 0x55e7af5d8dce in trx_commit_low(trx_t*, mtr_t*) /home/mleich/10.2/storage/innobase/trx/trx0trx.cc:1936
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335034]    #3 0x55e7af5d8efa in trx_commit(trx_t*) /home/mleich/10.2/storage/innobase/trx/trx0trx.cc:1960
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335036]    #4 0x55e7af5da53f in trx_commit_for_mysql(trx_t*) /home/mleich/10.2/storage/innobase/trx/trx0trx.cc:2169
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335042]    #5 0x55e7af7dc90e in dict_stats_exec_sql /home/mleich/10.2/storage/innobase/dict/dict0stats.cc:318
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335044]    #6 0x55e7af7ebb75 in dict_stats_delete_from_table_stats /home/mleich/10.2/storage/innobase/dict/dict0stats.cc:3469
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335046]    #7 0x55e7af7ec028 in dict_stats_drop_table(char const*, char*, unsigned long) /home/mleich/10.2/storage/innobase/dict/dict0stats.cc:3554
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335064]    #8 0x55e7af41ee87 in row_drop_table_for_mysql(char const*, trx_t*, enum_sql_command, bool, bool) /home/mleich/10.2/storage/innobase/row/row0mysql.cc:3399
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335066]    #9 0x55e7af1718b2 in ha_innobase::delete_table(char const*, enum_sql_command) (/home/mleich/Server_bin/10.2_asan/bin/mysqld+0x1afb8b2)
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335103]    #10 0x55e7af13d980 in ha_innobase::delete_table(char const*) /home/mleich/10.2/storage/innobase/handler/ha_innodb.cc:13483
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335113]    #11 0x55e7aebe3054 in handler::ha_delete_table(char const*) /home/mleich/10.2/sql/handler.cc:4473
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335115]    #12 0x55e7aebd259e in ha_delete_table(THD*, handlerton*, char const*, char const*, char const*, bool) /home/mleich/10.2/sql/handler.cc:2442
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335125]    #13 0x55e7ae78a4e6 in mysql_rm_table_no_locks(THD*, TABLE_LIST*, bool, bool, bool, bool, bool) /home/mleich/10.2/sql/sql_table.cc:2447
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335127]    #14 0x55e7ae79d5bc in create_table_impl /home/mleich/10.2/sql/sql_table.cc:4867
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335137]    #15 0x55e7ae79eb7c in mysql_create_table_no_lock(THD*, char const*, char const*, Table_specification_st*, Alter_info*, bool*, int) /home/mleich/10.2/sql/sql_table.cc:5070
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335139]    #16 0x55e7ae79f409 in mysql_create_table(THD*, TABLE_LIST*, Table_specification_st*, Alter_info*) /home/mleich/10.2/sql/sql_table.cc:5135
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335141]    #17 0x55e7ae7ca5a6 in Sql_cmd_create_table::execute(THD*) /home/mleich/10.2/sql/sql_table.cc:10969
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335151]    #18 0x55e7ae57ac05 in mysql_execute_command(THD*) /home/mleich/10.2/sql/sql_parse.cc:5972
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335153]    #19 0x55e7ae58665c in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /home/mleich/10.2/sql/sql_parse.cc:7741
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335155]    #20 0x55e7ae55d308 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /home/mleich/10.2/sql/sql_parse.cc:1831
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335157]    #21 0x55e7ae559d2f in do_command(THD*) /home/mleich/10.2/sql/sql_parse.cc:1385
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335161]    #22 0x55e7ae901f75 in do_handle_one_connection(CONNECT*) /home/mleich/10.2/sql/sql_connect.cc:1336
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335163]    #23 0x55e7ae901832 in handle_one_connection /home/mleich/10.2/sql/sql_connect.cc:1241
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335165]    #24 0x7f6b169576da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335183]    #25 0x69f50968688e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)
      ...
      # 2020-06-02T13:57:33 [21795] | [rr 22403 335339]SUMMARY: AddressSanitizer: unknown-crash /home/mleich/10.2/storage/innobase/trx/trx0trx.cc:566 in trx_t::commit_state()
      ...
      Query (0x62b0002df228): CREATE OR REPLACE TABLE t9 (ecol10 SET('foo','bar') ) ROW_FORMAT=COMPACT
      Connection ID (thread ID): 49
      Status: NOT_KILLED
       
      RQG
      -------
      git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich
      origin/experimental 5c63068c24fa6d687422f4d26490b067ff6535e4 2020-05-28T13:50:30+02:00
       
      perl rqg.pl \                       
      --gendata \
      --vcols \
      --views \
      --grammar=conf/mariadb/instant_add.yy \
      --mysqld=--innodb_use_native_aio=1 \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-max-statement-time=30 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=120 \
      --duration=300 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --threads=33 \
      --mysqld=--innodb_page_size=8K \
      --mysqld=--innodb-buffer-pool-size=8M \
      --duration=300 \
      --no_mask \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<local settings> \
      --basedir2=<local settings> \
      --script_debug=_nix_ \
      --rr=Server \
      --rr_options=--chaos
      

      Attachments

        Issue Links

          Activity

            I suspect that this could be fixed by modifying trx_free() so that we will not first poison the entire trx_t in Pool and then unpoison the ‘safe’ parts. We should instead implement a member function of trx_t that poisons all the ‘unsafe’ parts, and remove the poisoning code from Pool. In that way, AddressSanitizer should not get confused.

            I have a vague memory of analyzing a similar trace in the past. To analyze this, I suggest the following commands in rr replay:

            continue
            awatch ((char*)0x7fff8000)[0x3cd14f7066a4 / 8]
            reverse-continue
            reverse-continue

            Usually, two reverse-continue will be needed to cancel the effect of the terminal signal. The access watchpoint will cover both reads and writes of the shadow byte of the interesting address 0x3cd14f7066a4 that was mentioned in the AddressSanitizer output.

            If this bug is due to a poison-unpoison cycle in trx_free() like I suspect, we should see multiple read accesses to the shadow byte in the thread that triggered the crash, interleaved with modifications of the shadow bytes during the execution of trx_free() in another thread.

            marko Marko Mäkelä added a comment - I suspect that this could be fixed by modifying trx_free() so that we will not first poison the entire trx_t in Pool and then unpoison the ‘safe’ parts. We should instead implement a member function of trx_t that poisons all the ‘unsafe’ parts, and remove the poisoning code from Pool . In that way, AddressSanitizer should not get confused. I have a vague memory of analyzing a similar trace in the past. To analyze this, I suggest the following commands in rr replay : continue awatch ((char*)0x7fff8000)[0x3cd14f7066a4 / 8] reverse-continue reverse-continue … Usually, two reverse-continue will be needed to cancel the effect of the terminal signal. The access watchpoint will cover both reads and writes of the shadow byte of the interesting address 0x3cd14f7066a4 that was mentioned in the AddressSanitizer output. If this bug is due to a poison-unpoison cycle in trx_free() like I suspect, we should see multiple read accesses to the shadow byte in the thread that triggered the crash, interleaved with modifications of the shadow bytes during the execution of trx_free() in another thread.

            Even though we lost access to the original rr replay trace, in MDEV-23472 there is a similar case that clearly is caused by the poison-unpoison cycle. I will take care of this.

            marko Marko Mäkelä added a comment - Even though we lost access to the original rr replay trace, in MDEV-23472 there is a similar case that clearly is caused by the poison-unpoison cycle. I will take care of this.

            I fixed the poison-access-unpoison race by replacing the poison-all/unpoison-some with poison-all-but-some. In each major version from 10.2 to 10.5, some special conflict resolution was needed. The 10.5 version also avoids unnecessary MSAN unpoisoning when canceling the MEM_NOACCESS for Valgrind.

            marko Marko Mäkelä added a comment - I fixed the poison-access-unpoison race by replacing the poison-all/unpoison-some with poison-all-but-some. In each major version from 10.2 to 10.5, some special conflict resolution was needed. The 10.5 version also avoids unnecessary MSAN unpoisoning when canceling the MEM_NOACCESS for Valgrind.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.