[MDEV-24391] heap-use-after-free in fil_space_t::flush_low() - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.5.7
Fix Version/s: 10.5.9
Component/s: Storage Engine - InnoDB
Labels:

Description

As part of ~~MDEV-23855~~, when we reduced fil_system.mutex contention, we introduced a race condition between two fil_space_t::flush_low(). I analyzed this in an rr record session of an AddressSanitizer build.

We have two threads executing fil_space_t::flush_low() on the same tablespace, like this:

10.5 5eb539555b36a60944eefeb84d5d6d436ba61e63
Thread 153 hit Hardware watchpoint 8: -location fil_system.unflushed_spaces.size_

Old value = 1
New value = 0
sized_ilist<fil_space_t, unflushed_spaces_tag_t>::erase (pos=...,
this=0x565394181298 <fil_system+216>)
at /Server/bb-10.5-MDEV-24369/include/ilist.h:252
252 return BASE::erase(pos);
4: /x $space.n_pending._M_i = 0x20000001
5: $space.is_in_unflushed_spaces = false
6: fil_system.unflushed_spaces = {<ilist<fil_space_t, unflushed_spaces_tag_t>> = {sentinel_ = {next = 0x615000032318, prev = 0x615000032318}}, size_ = 0}
(rr) when
Current event: 782346
(rr) bt
#0 sized_ilist<fil_space_t, unflushed_spaces_tag_t>::erase (pos=...,
this=0x565394181298 <fil_system+216>)
at /Server/bb-10.5-MDEV-24369/include/ilist.h:252
#1 sized_ilist<fil_space_t, unflushed_spaces_tag_t>::remove (value=...,
this=0x565394181298 <fil_system+216>)
at /Server/bb-10.5-MDEV-24369/include/ilist.h:261
#2 fil_space_t::flush_low (this=this@entry=0x615000032318)
at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:532
#3 0x00005653926e01cd in fil_flush_file_spaces ()
at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:3520
#4 0x0000565392563f23 in buf_dblwr_t::write_completed (
this=0x565394c02020 <buf_dblwr>)
at /Server/bb-10.5-MDEV-24369/storage/innobase/buf/buf0dblwr.cc:486
#5 0x000056539259066b in buf_page_write_complete (request=...)
at /Server/bb-10.5-MDEV-24369/storage/innobase/buf/buf0flu.cc:382
#6 0x00005653926f87cb in fil_aio_callback (request=...)
at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:3474
#7 0x00005653920279b7 in io_callback (cb=<optimized out>)
at /Server/bb-10.5-MDEV-24369/storage/innobase/os/os0file.cc:3895
#8 0x00005653928a43ab in tpool::simulated_aio::simulated_aio_callback (
param=0x64000179ef80)
at /Server/bb-10.5-MDEV-24369/tpool/aio_simulated.cc:162
…
(rr) thr a 5 bt

Thread 5 (Thread 1662862.1663094):
#0 __pthread_mutex_unlock_usercnt (decr=1, mutex=0x60d000005010) at pthread_mutex_unlock.c:58
#1 __GI___pthread_mutex_unlock (mutex=0x60d000005010) at pthread_mutex_unlock.c:357
#2 0x0000565392029012 in OSMutex::exit (this=0x60d000005008) at /Server/bb-10.5-MDEV-24369/storage/innobase/include/sync0types.h:384
#3 os_event::set (this=0x60d000004ff8) at /Server/bb-10.5-MDEV-24369/storage/innobase/os/os0event.cc:68
#4 os_event_set (event=0x60d000004ff8) at /Server/bb-10.5-MDEV-24369/storage/innobase/os/os0event.cc:451
#5 0x0000565391dc9d7e in TTASEventMutex<GenericPolicy>::exit (this=0x5653941811e0 <fil_system+32>) at /Server/bb-10.5-MDEV-24369/storage/innobase/include/sync0types.h:385
#6 PolicyMutex<TTASEventMutex<GenericPolicy> >::exit (this=0x5653941811e0 <fil_system+32>) at /Server/bb-10.5-MDEV-24369/storage/innobase/include/ib0mutex.h:564
#7 0x00005653926e01c5 in fil_flush_file_spaces () at /Server/bb-10.5-MDEV-24369/storage/innobase/fil/fil0fil.cc:3519
#8 0x0000565392583f13 in log_checkpoint () at /Server/bb-10.5-MDEV-24369/storage/innobase/buf/buf0flu.cc:1692
#9 0x00005653925a0337 in buf_flush_page_cleaner ()

When Thread 5 was checking, the flag was still set:

rescan:

  mutex_enter(&fil_system.mutex);

  for (fil_space_t &space : fil_system.unflushed_spaces)

    if (space.needs_flush_not_stopping())

      mutex_exit(&fil_system.mutex);

      space.flush_low();

      goto rescan;

But, while that thread is releasing the mutex, Thread 153 had cleared the flag, giving permission for fil_delete_tablespace() to proceed a little later. Then, once the fil_space_t object had been freed, ASAN would complain about heap-use-after-free when Thread 5 gets a chance to invoke space.flush_low().

A possible fix might be to ensure that as long as the page cleaner thread is running, it will be the only caller of fil_flush_file_spaces().

Attachments

Issue Links

is caused by

MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP

Closed

relates to

MDEV-23416 innodb.innodb_mutexes failed in buildbot with extra warning

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2020-12-10 16:42

Updated:: 2020-12-20 12:05

Resolved:: 2020-12-11 14:16

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.