Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24391

heap-use-after-free in fil_space_t::flush_low()

    XMLWordPrintable

    Details

      Description

      As part of MDEV-23855, when we reduced fil_system.mutex contention, we introduced a race condition between two fil_space_t::flush_low(). I analyzed this in an rr record session of an AddressSanitizer build.

      We have two threads executing fil_space_t::flush_low() on the same tablespace, like this:

      10.5 5eb539555b36a60944eefeb84d5d6d436ba61e63

      Thread 153 hit Hardware watchpoint 8: -location fil_system.unflushed_spaces.size_
       
      Old value = 1
      New value = 0
      sized_ilist<fil_space_t, unflushed_spaces_tag_t>::erase (pos=..., 
          this=0x565394181298 <fil_system+216>)
          at /Server/bb-10.5-MDEV-24369/include/ilist.h:252
      252	    return BASE::erase(pos);
      4: /x $space.n_pending._M_i = 0x20000001
      5: $space.is_in_unflushed_spaces = false
      6: fil_system.unflushed_spaces = {<ilist<fil_space_t, unflushed_spaces_tag_t>> = {sentinel_ = {next = 0x615000032318, prev = 0x615000032318}}, size_ = 0}
      (rr) when
      Current event: 782346
      (rr) bt
      #0  sized_ilist<fil_space_t, unflushed_spaces_tag_t>::erase (pos=..., 
          this=0x565394181298 <fil_system+216>)
          at /Server/bb-10.5-MDEV-24369/include/ilist.h:252
      #1  sized_ilist<fil_space_t, unflushed_spaces_tag_t>::remove (value=..., 
          this=0x565394181298 <fil_system+216>)
          at /Server/bb-10.5-MDEV-24369/include/ilist.h:261
      #2  fil_space_t::flush_low (this=this@entry=0x615000032318)
          at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:532
      #3  0x00005653926e01cd in fil_flush_file_spaces ()
          at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:3520
      #4  0x0000565392563f23 in buf_dblwr_t::write_completed (
          this=0x565394c02020 <buf_dblwr>)
          at /Server/bb-10.5-MDEV-24369/storage/innobase/buf/buf0dblwr.cc:486
      #5  0x000056539259066b in buf_page_write_complete (request=...)
          at /Server/bb-10.5-MDEV-24369/storage/innobase/buf/buf0flu.cc:382
      #6  0x00005653926f87cb in fil_aio_callback (request=...)
          at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:3474
      #7  0x00005653920279b7 in io_callback (cb=<optimized out>)
          at /Server/bb-10.5-MDEV-24369/storage/innobase/os/os0file.cc:3895
      #8  0x00005653928a43ab in tpool::simulated_aio::simulated_aio_callback (
          param=0x64000179ef80)
          at /Server/bb-10.5-MDEV-24369/tpool/aio_simulated.cc:162
      (rr) thr a 5 bt
       
      Thread 5 (Thread 1662862.1663094):
      #0  __pthread_mutex_unlock_usercnt (decr=1, mutex=0x60d000005010) at pthread_mutex_unlock.c:58
      #1  __GI___pthread_mutex_unlock (mutex=0x60d000005010) at pthread_mutex_unlock.c:357
      #2  0x0000565392029012 in OSMutex::exit (this=0x60d000005008) at /Server/bb-10.5-MDEV-24369/storage/innobase/include/sync0types.h:384
      #3  os_event::set (this=0x60d000004ff8) at /Server/bb-10.5-MDEV-24369/storage/innobase/os/os0event.cc:68
      #4  os_event_set (event=0x60d000004ff8) at /Server/bb-10.5-MDEV-24369/storage/innobase/os/os0event.cc:451
      #5  0x0000565391dc9d7e in TTASEventMutex<GenericPolicy>::exit (this=0x5653941811e0 <fil_system+32>) at /Server/bb-10.5-MDEV-24369/storage/innobase/include/sync0types.h:385
      #6  PolicyMutex<TTASEventMutex<GenericPolicy> >::exit (this=0x5653941811e0 <fil_system+32>) at /Server/bb-10.5-MDEV-24369/storage/innobase/include/ib0mutex.h:564
      #7  0x00005653926e01c5 in fil_flush_file_spaces () at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:3519
      #8  0x0000565392583f13 in log_checkpoint () at /Server/bb-10.5-MDEV-24369/storage/innobase/buf/buf0flu.cc:1692
      #9  0x00005653925a0337 in buf_flush_page_cleaner ()
      

      When Thread 5 was checking, the flag was still set:

      rescan:
        mutex_enter(&fil_system.mutex);
       
        for (fil_space_t &space : fil_system.unflushed_spaces)
        {
          if (space.needs_flush_not_stopping())
          {
            mutex_exit(&fil_system.mutex);
            space.flush_low();
            goto rescan;
          }
        }
      

      But, while that thread is releasing the mutex, Thread 153 had cleared the flag, giving permission for fil_delete_tablespace() to proceed a little later. Then, once the fil_space_t object had been freed, ASAN would complain about heap-use-after-free when Thread 5 gets a chance to invoke space.flush_low().

      A possible fix might be to ensure that as long as the page cleaner thread is running, it will be the only caller of fil_flush_file_spaces().

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: