Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21239

ASAN use-after-poison in a server shutdown in innodb.innodb_buffer_pool_resize

Details

    Description

      ERROR: AddressSanitizer: use-after-poison on address 0x7f78f3c90170 at pc 0x000003aad231 bp 0x7f78f3c8f2f0 sp 0x7f78f3c8f2e8
      READ of size 4 at 0x7f78f3c90170 thread T28
          #0 0x3aad230 in _db_return_ /home/kevg/work/m/bb-10.3-kevgs/build_asan/../dbug/dbug.c:1176:18
          #1 0x3a0d57f in my_free /home/kevg/work/m/bb-10.3-kevgs/build_asan/../mysys/my_malloc.c:226:1
          #2 0x3807c87 in pfs_spawn_thread /home/kevg/work/m/bb-10.3-kevgs/build_asan/../storage/perfschema/pfs.cc:1859:3
          #3 0x7f7909f33668 in start_thread /build/glibc-4WA41p/glibc-2.30/nptl/pthread_create.c:479:8
          #4 0x7f79093fc322 in clone /build/glibc-4WA41p/glibc-2.30/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Address 0x7f78f3c90170 is a wild pointer.
      SUMMARY: AddressSanitizer: use-after-poison /home/kevg/work/m/bb-10.3-kevgs/build_asan/../dbug/dbug.c:1176:18 in _db_return_
      Shadow bytes around the buggy address:
        0x0fef9e789fd0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e789fe0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e789ff0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e78a000: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e78a010: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
      =>0x0fef9e78a020: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7[f7]f7
        0x0fef9e78a030: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e78a040: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e78a050: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e78a060: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0fef9e78a070: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
      Shadow byte legend (one shadow byte represents 8 application bytes):
        Addressable:           00
        Partially addressable: 01 02 03 04 05 06 07 
        Heap left redzone:       fa
        Freed heap region:       fd
        Stack left redzone:      f1
        Stack mid redzone:       f2
        Stack right redzone:     f3
        Stack after return:      f5
        Stack use after scope:   f8
        Global redzone:          f9
        Global init order:       f6
        Poisoned by user:        f7
        Container overflow:      fc
        Array cookie:            ac
        Intra object redzone:    bb
        ASan internal:           fe
        Left alloca redzone:     ca
        Right alloca redzone:    cb
        Shadow gap:              cc
      
      

      This happens when buf_LRU_block_free_non_file_page() calls UNIV_MEM_FREE(block->frame, srv_page_size); after buffer pool was resized. Memory was allocated in os_mem_alloc_large() via shmget(). Then it was freed at buffer pool resize. And that block->frame became a dangling pointer. Sure, program should not touch freed memory. Looks like an ownership + use-after-free error.

      This simple patch silences a problem:

      diff --git a/storage/innobase/buf/buf0lru.cc b/storage/innobase/buf/buf0lru.cc
      index cfe3f9a6bcb..17632bf9c9b 100644
      --- a/storage/innobase/buf/buf0lru.cc
      +++ b/storage/innobase/buf/buf0lru.cc
      @@ -1880,7 +1880,9 @@ buf_LRU_block_free_non_file_page(
                      ut_d(block->page.in_free_list = TRUE);
              }
       
      -       UNIV_MEM_FREE(block->frame, srv_page_size);
      +       ut_d(if (!block->in_withdraw_list) {
      +               UNIV_MEM_FREE(block->frame, srv_page_size);
      +       });
       }
       
       /******************************************************************//**
      
      

      Attachments

        Issue Links

          Activity

            Memory is freed here:

            #0  os_mem_free_large (ptr=0x288bc0000001c, size=713428427603996) at os0proc.cc:157
            #1  0x0000000002847b95 in ut_allocator<unsigned char, true>::deallocate_large (this=0x61b000003280, ptr=0x7fffdffad000 "\377\377\377\377\377\377\377\377@\031\a\340\377\177", size=9895936) at ut0new.h:681
            #2  0x00000000028420a7 in ut_allocator<unsigned char, true>::deallocate_large_dodump (this=0x61b000003280, ptr=0x7fffdffad000 "\377\377\377\377\377\377\377\377@\031\a\340\377\177", size=9895936) at ut0new.h:695
            #3  0x00000000027fc6b9 in buf_pool_resize () at buf0buf.cc:2966
            #4  0x00000000027f8e32 in buf_resize_thread () at buf0buf.cc:3242
            #5  0x00007ffff7f87669 in start_thread (arg=<optimized out>) at pthread_create.c:479
            #6  0x00007ffff7450323 in clone () at clone.S:95
            

            Right after that operation some memory blocks are still alive. Their block->frame points to a freed memory. And buf_LRU_block_free_non_file_page() touches this memory here:

            UNIV_MEM_ALLOC(block->frame, srv_page_size);
            ...
            memset(block->frame, '\0', srv_page_size);
            ...
            memset(block->frame + FIL_PAGE_OFFSET, 0xfe, 4);
            memset(block->frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID, 0xfe, 4);
            ...
            UNIV_MEM_FREE(block->frame, srv_page_size);
            

            Memory is touched more in debug mode and less in release mode.

            kevg Eugene Kosov (Inactive) added a comment - Memory is freed here: #0 os_mem_free_large (ptr=0x288bc0000001c, size=713428427603996) at os0proc.cc:157 #1 0x0000000002847b95 in ut_allocator<unsigned char, true>::deallocate_large (this=0x61b000003280, ptr=0x7fffdffad000 "\377\377\377\377\377\377\377\377@\031\a\340\377\177", size=9895936) at ut0new.h:681 #2 0x00000000028420a7 in ut_allocator<unsigned char, true>::deallocate_large_dodump (this=0x61b000003280, ptr=0x7fffdffad000 "\377\377\377\377\377\377\377\377@\031\a\340\377\177", size=9895936) at ut0new.h:695 #3 0x00000000027fc6b9 in buf_pool_resize () at buf0buf.cc:2966 #4 0x00000000027f8e32 in buf_resize_thread () at buf0buf.cc:3242 #5 0x00007ffff7f87669 in start_thread (arg=<optimized out>) at pthread_create.c:479 #6 0x00007ffff7450323 in clone () at clone.S:95 Right after that operation some memory blocks are still alive. Their block->frame points to a freed memory. And buf_LRU_block_free_non_file_page() touches this memory here: UNIV_MEM_ALLOC(block->frame, srv_page_size); ... memset(block->frame, '\0', srv_page_size); ... memset(block->frame + FIL_PAGE_OFFSET, 0xfe, 4); memset(block->frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID, 0xfe, 4); ... UNIV_MEM_FREE(block->frame, srv_page_size); Memory is touched more in debug mode and less in release mode.

            If there is a real problem, maybe that could explain MDEV-16964. On 10.5, I just got this on a non-ASAN build:

            10.5 ce41a9075ad63e5388045943b80f69b7af0048d2 with some changes

            CURRENT_TEST: innodb.innodb_buffer_pool_resize
            mysqltest: In included file "./include/wait_condition.inc": 
            included from /mariadb/10.5m/mysql-test/suite/innodb/t/innodb_buffer_pool_resize.test at line 58:
            At line 54: query 'let $success= `$wait_condition`' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...
            …
            2019-12-13 16:00:38 0 [Note] InnoDB: Resizing also other hash tables.
            mysqld: /mariadb/10.5m/storage/innobase/include/buf0buf.ic:972: buf_page_t *buf_page_hash_get_low(buf_pool_t *, const page_id_t): Assertion `rw_lock_own(hash_lock, RW_LOCK_X) || rw_lock_own(hash_lock, RW_LOCK_S)' failed.
            191213 16:00:38 [ERROR] mysqld got signal 6 ;
            …
            #6  0x00007f40ae1f0b92 in __GI___assert_fail (assertion=0x55db313684f3 "rw_lock_own(hash_lock, RW_LOCK_X) || rw_lock_own(hash_lock, RW_LOCK_S)", file=0x55db313551ab "/mariadb/10.5m/storage/innobase/include/buf0buf.ic", line=972, function=0x55db3136853a "buf_page_t *buf_page_hash_get_low(buf_pool_t *, const page_id_t)") at assert.c:101
            #7  0x000055db30f14dee in buf_page_hash_get_low (buf_pool=0x55db32c7d080, page_id=...) at /mariadb/10.5m/storage/innobase/include/buf0buf.ic:971
            #8  0x000055db30f1f180 in buf_page_init_for_read (err=<optimized out>, mode=132, page_id=..., zip_size=2048, unzip=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0buf.cc:5303
            #9  0x000055db30f53f73 in buf_read_page_low (err=0x7f403a7f9bbc, sync=false, mode=132, page_id=..., zip_size=2048, unzip=<optimized out>, ignore=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0rea.cc:150
            #10 0x000055db30f55c1f in buf_read_ahead_linear (page_id=..., zip_size=<optimized out>, ibuf=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0rea.cc:700
            #11 0x000055db30f1b247 in buf_page_get_gen (page_id=..., zip_size=2048, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, file=0x55db313a13c5 "/mariadb/10.5m/storage/innobase/row/row0row.cc", line=1225, mtr=0x7f403a7fb740, err=0x7f403a7fa13c, allow_ibuf_merge=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0buf.cc:4842
            #12 0x000055db30ecb2ea in btr_cur_search_to_nth_level_func (index=<optimized out>, level=<optimized out>, tuple=<optimized out>, mode=<optimized out>, latch_mode=<optimized out>, cursor=<optimized out>, ahi_latch=0x0, file=0x55db313a13c5 "/mariadb/10.5m/storage/innobase/row/row0row.cc", line=1225, mtr=0x7f403a7fb740, autoinc=0) at /mariadb/10.5m/storage/innobase/btr/btr0cur.cc:1584
            #13 0x000055db30e05486 in btr_pcur_open_low (index=<optimized out>, level=0, tuple=0x7f3fbc234ee8, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x55db32eaaae8, file=0x0, line=1225, autoinc=0, mtr=0x8) at /mariadb/10.5m/storage/innobase/include/btr0pcur.ic:441
            #14 0x000055db30e03f63 in row_search_on_row_ref (pcur=0x55db32eaaae8, mode=2, table=0x7f407411f248, ref=0x7f3fbc234ee8, mtr=0x7f403a7fb740) at /mariadb/10.5m/storage/innobase/row/row0row.cc:1225
            #15 0x000055db30df3e98 in row_purge_reposition_pcur (mode=2, node=0x55db32eaaa48, mtr=0x7f403a7fb740) at /mariadb/10.5m/storage/innobase/row/row0purge.cc:79
            #16 0x000055db30df6980 in row_purge_reset_trx_id (node=0x55db32eaaa48, mtr=0x7f403a7fb740) at /mariadb/10.5m/storage/innobase/row/row0purge.cc:660
            

            marko Marko Mäkelä added a comment - If there is a real problem, maybe that could explain MDEV-16964 . On 10.5, I just got this on a non-ASAN build: 10.5 ce41a9075ad63e5388045943b80f69b7af0048d2 with some changes CURRENT_TEST: innodb.innodb_buffer_pool_resize mysqltest: In included file "./include/wait_condition.inc": included from /mariadb/10.5m/mysql-test/suite/innodb/t/innodb_buffer_pool_resize.test at line 58: At line 54: query 'let $success= `$wait_condition`' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0... … 2019-12-13 16:00:38 0 [Note] InnoDB: Resizing also other hash tables. mysqld: /mariadb/10.5m/storage/innobase/include/buf0buf.ic:972: buf_page_t *buf_page_hash_get_low(buf_pool_t *, const page_id_t): Assertion `rw_lock_own(hash_lock, RW_LOCK_X) || rw_lock_own(hash_lock, RW_LOCK_S)' failed. 191213 16:00:38 [ERROR] mysqld got signal 6 ; … #6 0x00007f40ae1f0b92 in __GI___assert_fail (assertion=0x55db313684f3 "rw_lock_own(hash_lock, RW_LOCK_X) || rw_lock_own(hash_lock, RW_LOCK_S)", file=0x55db313551ab "/mariadb/10.5m/storage/innobase/include/buf0buf.ic", line=972, function=0x55db3136853a "buf_page_t *buf_page_hash_get_low(buf_pool_t *, const page_id_t)") at assert.c:101 #7 0x000055db30f14dee in buf_page_hash_get_low (buf_pool=0x55db32c7d080, page_id=...) at /mariadb/10.5m/storage/innobase/include/buf0buf.ic:971 #8 0x000055db30f1f180 in buf_page_init_for_read (err=<optimized out>, mode=132, page_id=..., zip_size=2048, unzip=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0buf.cc:5303 #9 0x000055db30f53f73 in buf_read_page_low (err=0x7f403a7f9bbc, sync=false, mode=132, page_id=..., zip_size=2048, unzip=<optimized out>, ignore=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0rea.cc:150 #10 0x000055db30f55c1f in buf_read_ahead_linear (page_id=..., zip_size=<optimized out>, ibuf=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0rea.cc:700 #11 0x000055db30f1b247 in buf_page_get_gen (page_id=..., zip_size=2048, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, file=0x55db313a13c5 "/mariadb/10.5m/storage/innobase/row/row0row.cc", line=1225, mtr=0x7f403a7fb740, err=0x7f403a7fa13c, allow_ibuf_merge=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0buf.cc:4842 #12 0x000055db30ecb2ea in btr_cur_search_to_nth_level_func (index=<optimized out>, level=<optimized out>, tuple=<optimized out>, mode=<optimized out>, latch_mode=<optimized out>, cursor=<optimized out>, ahi_latch=0x0, file=0x55db313a13c5 "/mariadb/10.5m/storage/innobase/row/row0row.cc", line=1225, mtr=0x7f403a7fb740, autoinc=0) at /mariadb/10.5m/storage/innobase/btr/btr0cur.cc:1584 #13 0x000055db30e05486 in btr_pcur_open_low (index=<optimized out>, level=0, tuple=0x7f3fbc234ee8, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x55db32eaaae8, file=0x0, line=1225, autoinc=0, mtr=0x8) at /mariadb/10.5m/storage/innobase/include/btr0pcur.ic:441 #14 0x000055db30e03f63 in row_search_on_row_ref (pcur=0x55db32eaaae8, mode=2, table=0x7f407411f248, ref=0x7f3fbc234ee8, mtr=0x7f403a7fb740) at /mariadb/10.5m/storage/innobase/row/row0row.cc:1225 #15 0x000055db30df3e98 in row_purge_reposition_pcur (mode=2, node=0x55db32eaaa48, mtr=0x7f403a7fb740) at /mariadb/10.5m/storage/innobase/row/row0purge.cc:79 #16 0x000055db30df6980 in row_purge_reset_trx_id (node=0x55db32eaaa48, mtr=0x7f403a7fb740) at /mariadb/10.5m/storage/innobase/row/row0purge.cc:660

            In MDEV-18851 and PR#1221 there is a proposal to replace os_mem_alloc_large() with my_large_malloc(). Those should be kept in sync with what we are going to do here.

            marko Marko Mäkelä added a comment - In MDEV-18851 and PR#1221 there is a proposal to replace os_mem_alloc_large() with my_large_malloc() . Those should be kept in sync with what we are going to do here.

            ASAN documentation requires us to manually unpoison when we manually poisoned https://github.com/llvm/llvm-project/blob/master/compiler-rt/include/sanitizer/asan_interface.h#L21 And we didn't, which is our failure.

            kevg Eugene Kosov (Inactive) added a comment - ASAN documentation requires us to manually unpoison when we manually poisoned https://github.com/llvm/llvm-project/blob/master/compiler-rt/include/sanitizer/asan_interface.h#L21 And we didn't, which is our failure.

            marko sorry, I can't explain your failure here. It seems unrelated to ASAN. And I didn't find any issues with buffer pool while debugging this MDEV.

            kevg Eugene Kosov (Inactive) added a comment - marko sorry, I can't explain your failure here. It seems unrelated to ASAN. And I didn't find any issues with buffer pool while debugging this MDEV.

            Thanks, this is a nice, clear fix. Luckily seems to only be a bug in our instrumentation, not a real problem.

            marko Marko Mäkelä added a comment - Thanks, this is a nice, clear fix. Luckily seems to only be a bug in our instrumentation, not a real problem.

            People

              kevg Eugene Kosov (Inactive)
              kevg Eugene Kosov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.