Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17005

ASAN heap-use-after-free in innobase_get_computed_value

Details

    Description

      10.2 05153a670de6ee

      ==10706==ERROR: AddressSanitizer: heap-use-after-free on address 0x6230001712ec at pc 0x5568429cf673 bp 0x7f55d3e2fc30 sp 0x7f55d3e2fc28
      READ of size 5 at 0x6230001712ec thread T33
          #0 0x5568429cf672 in innobase_get_computed_value(dtuple_t const*, dict_v_col_t const*, dict_index_t const*, mem_block_info_t**, mem_block_info_t*, dict_field_t const*, THD*, TABLE*, unsigned char*, dict_table_t const*, upd_t*, dict_foreign_t*) /data/src/10.2/storage/innobase/handler/ha_innodb.cc:22078
          #1 0x556842c946ad in row_upd_store_v_row /data/src/10.2/storage/innobase/row/row0upd.cc:2188
          #2 0x556842c94c84 in row_upd_store_row(upd_node_t*, THD*, TABLE*) /data/src/10.2/storage/innobase/row/row0upd.cc:2252
          #3 0x556842c98727 in row_upd_del_mark_clust_rec /data/src/10.2/storage/innobase/row/row0upd.cc:2975
          #4 0x556842c99574 in row_upd_clust_step /data/src/10.2/storage/innobase/row/row0upd.cc:3158
          #5 0x556842c9a015 in row_upd /data/src/10.2/storage/innobase/row/row0upd.cc:3279
          #6 0x556842c9ad2e in row_upd_step(que_thr_t*) /data/src/10.2/storage/innobase/row/row0upd.cc:3425
          #7 0x556842bf493d in row_update_for_mysql(row_prebuilt_t*) /data/src/10.2/storage/innobase/row/row0mysql.cc:1829
          #8 0x55684299ba67 in ha_innobase::delete_row(unsigned char const*) /data/src/10.2/storage/innobase/handler/ha_innodb.cc:9228
          #9 0x5568422110e4 in handler::ha_delete_row(unsigned char const*) /data/src/10.2/sql/handler.cc:6019
          #10 0x556841bb0f2e in write_record(THD*, TABLE*, st_copy_info*) /data/src/10.2/sql/sql_insert.cc:1893
          #11 0x5568425f5664 in read_sep_field /data/src/10.2/sql/sql_load.cc:1256
          #12 0x5568425f1458 in mysql_load(THD*, sql_exchange*, TABLE_LIST*, List<Item>&, List<Item>&, List<Item>&, enum_duplicates, bool, bool) /data/src/10.2/sql/sql_load.cc:649
          #13 0x556841c1113f in mysql_execute_command(THD*) /data/src/10.2/sql/sql_parse.cc:4833
          #14 0x556841c25139 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.2/sql/sql_parse.cc:8009
          #15 0x556841bffd20 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.2/sql/sql_parse.cc:1824
          #16 0x556841bfcdc4 in do_command(THD*) /data/src/10.2/sql/sql_parse.cc:1378
          #17 0x556841f3f825 in do_handle_one_connection(CONNECT*) /data/src/10.2/sql/sql_connect.cc:1335
          #18 0x556841f3f23a in handle_one_connection /data/src/10.2/sql/sql_connect.cc:1241
          #19 0x7f55f4bc9493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
          #20 0x7f55f2faf93e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xe893e)
       
      ASAN:SIGSEGV
      ==10706==AddressSanitizer: while reporting a bug found another one.Ignoring.
      

      To reproduce:

      git clone https://github.com/elenst/rqg --branch mdev17005 rqg-mdev17005
      cd rqg-mdev17005
      perl ./runall-new.pl --basedir=/data/bld/10.2-asan --vardir=/dev/shm/vardir --duration=350 --threads=4  --grammar=mdev17005.yy --skip-gendata --gendata-advanced --vcols
      

      Notes:

      • Don't forget to change the basedir value.
      • The grammar is quite big and could definitely use some automatic simplification, but I'm not sure it's necessary, possibly the problem will be obvious right away.
      • Reproducible on 10.2 and 10.3. On 10.0 and 10.1 I hit MDEV-17004 instead.
      • The test reproduces the problem for me almost every time, but sometimes it misses the mark. Just run it again. You can also use this command line to run the same test a desired number of times:

        perl ./runall-trials.pl --trials=10 --basedir=/data/bld/10.2-asan --vardir=/dev/shm/vardir --duration=350 --threads=4  --grammar=mdev17005.yy --skip-gendata --gendata-advanced --vcols
        

        runall-new.pl is replaced by runall-trials.pl --trials=N, all other options are the same. It will stop when the test fails for the first time.

      Attachments

        Issue Links

          Activity

            marko, according to the last test case provided by Elena, it is not "the MDL that could cause a race condition between LOAD DATA and ALTER TABLE" (because DELETE and INSERT and not LOAD DATA and ALTER TABLE are being executed concurrently, which are compatible from the MDL view).

            svoj Sergey Vojtovich added a comment - marko , according to the last test case provided by Elena, it is not "the MDL that could cause a race condition between LOAD DATA and ALTER TABLE" (because DELETE and INSERT and not LOAD DATA and ALTER TABLE are being executed concurrently, which are compatible from the MDL view).

            marko, I confirm that it is the race between DELETE and INSERT (or other any two operations accessing to the table).
            What should happen in good case:
            1. ALTER TABLE is issued. vc_templ->default_rec is initialized with temporary share's default_fields
            2. temporary share is freed, but datadict is still there, with garbage in vc_templ->default_rec
            3. DELETE is issued. It is first after ALTER TABLE finished.
            4. ha_innobase::open() is called, ib_table->get_ref_count() should be one
            5. we reinitialize vc_templ, so no garbage anymore

            What actually happens:
            3. DELETE is issued.
            4. ha_innobase::open() is called and ib_table->get_ref_count() is 1
            5. INSERT (or SELECT etc.) is issued in parallel
            6. ha_innobase::open() is called and ib_table->get_ref_count() is 1
            7. we check ib_table->get_ref_count() and it is 2 in both threads when we want reinitialize vc_templ
            8. garbage is there

            Please, review the fix: https://github.com/MariaDB/server/commits/bb-10.2-MDEV-17005
            there are three commits for now for your convenience.

            nikitamalyavin Nikita Malyavin added a comment - marko , I confirm that it is the race between DELETE and INSERT (or other any two operations accessing to the table). What should happen in good case: 1. ALTER TABLE is issued. vc_templ->default_rec is initialized with temporary share's default_fields 2. temporary share is freed, but datadict is still there, with garbage in vc_templ->default_rec 3. DELETE is issued. It is first after ALTER TABLE finished. 4. ha_innobase::open() is called, ib_table->get_ref_count() should be one 5. we reinitialize vc_templ, so no garbage anymore What actually happens: 3. DELETE is issued. 4. ha_innobase::open() is called and ib_table->get_ref_count() is 1 5. INSERT (or SELECT etc.) is issued in parallel 6. ha_innobase::open() is called and ib_table->get_ref_count() is 1 7. we check ib_table->get_ref_count() and it is 2 in both threads when we want reinitialize vc_templ 8. garbage is there Please, review the fix: https://github.com/MariaDB/server/commits/bb-10.2-MDEV-17005 there are three commits for now for your convenience.

            Great work! I only have a minor suggestion that should reduce the probability of memory fragmentation: Make dict_vcol_templ_t::default_rec a variable-length array that is at the end of the structure. In this way, the memory could be allocated upfront.

            But, now I realize that we do not know s_templ->rec_len = table->s->reclength before the table has been looked up, and thus we cannot allocate memory for default_rec when creating s_templ.

            Your fix is OK to push, but I feel that this area of code needs to be redesigned and rewritten at some point.

            marko Marko Mäkelä added a comment - Great work! I only have a minor suggestion that should reduce the probability of memory fragmentation: Make dict_vcol_templ_t::default_rec a variable-length array that is at the end of the structure. In this way, the memory could be allocated upfront. But, now I realize that we do not know s_templ->rec_len = table->s->reclength before the table has been looked up, and thus we cannot allocate memory for default_rec when creating s_templ . Your fix is OK to push, but I feel that this area of code needs to be redesigned and rewritten at some point.

            marko I think that we shouldn't access TABLE_SHARE on each open. We can just store all the vcol info similar to usual columns in system innodb table, with additional flag, and fill out dict_vcol_templ_t once the dict_table_t has been created.

            For now we can't do that because of architecture: real dict_table_t creation is deep inside innodb, and it shouldn't know anything about TABLE_SHARE there.

            nikitamalyavin Nikita Malyavin added a comment - marko I think that we shouldn't access TABLE_SHARE on each open. We can just store all the vcol info similar to usual columns in system innodb table, with additional flag, and fill out dict_vcol_templ_t once the dict_table_t has been created. For now we can't do that because of architecture: real dict_table_t creation is deep inside innodb, and it shouldn't know anything about TABLE_SHARE there.

            I would like to get rid of the InnoDB system tables (MDEV-11655), not extend the use of them. To my knowledge, InnoDB never stored the expressions of virtual columns, which are needed for evaluation. Those are only available via TABLE_SHARE (and .frm files).

            I think that we should look at changing the undo logging. If the virtual column values are written in the undo log records in an appropriate way (MDEV-11657, MDEV-17466), then rollback and purge should not need to evaluate them, and the access to TABLE_SHARE could be removed altogether. (The processing of FOREIGN KEY constraints that involve ON…CASCADE or ON…SET NULL would still require the evaluation of indexed virtual columns, but that would happen as part of SQL statement execution, where we can assume the TABLE_SHARE to be available.)

            marko Marko Mäkelä added a comment - I would like to get rid of the InnoDB system tables ( MDEV-11655 ), not extend the use of them. To my knowledge, InnoDB never stored the expressions of virtual columns, which are needed for evaluation. Those are only available via TABLE_SHARE (and .frm files). I think that we should look at changing the undo logging. If the virtual column values are written in the undo log records in an appropriate way ( MDEV-11657 , MDEV-17466 ), then rollback and purge should not need to evaluate them, and the access to TABLE_SHARE could be removed altogether. (The processing of FOREIGN KEY  constraints that involve ON…CASCADE or ON…SET NULL would still require the evaluation of indexed virtual columns, but that would happen as part of SQL statement execution, where we can assume the TABLE_SHARE to be available.)

            People

              nikitamalyavin Nikita Malyavin
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.