Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15983

Reduce fil_system.mutex contention further

    Details

      Description

      The test encryption.innodb-missing-key occasionally fails due to an apparent race condition when fil_space_release() is being invoked:

      ==5551==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f0a0658ca10 at pc 0x557a04108229 bp 0x7f0a0658c970 sp 0x7f0a0658c968
      WRITE of size 8 at 0x7f0a0658ca10 thread T21
          #0 0x557a04108228 in latch_t::latch_t(latch_id_t) /mariadb/10.3/storage/innobase/include/sync0types.h:995
          #1 0x557a0411f328 in MutexDebug<TTASEventMutex<GenericPolicy> >::Context::Context(latch_id_t) /mariadb/10.3/storage/innobase/include/sync0policy.h:62
          #2 0x557a0411b35d in MutexDebug<TTASEventMutex<GenericPolicy> >::enter(TTASEventMutex<GenericPolicy> const*, char const*, unsigned int) /mariadb/10.3/storage/innobase/include/sync0policy.ic:66
          #3 0x557a0411800f in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter(TTASEventMutex<GenericPolicy> const&, char const*, unsigned int) /mariadb/10.3/storage/innobase/include/sync0policy.h:348
          #4 0x557a0411402d in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) /mariadb/10.3/storage/innobase/include/ib0mutex.h:635
          #5 0x557a048879a7 in fil_space_release(fil_space_t*) /mariadb/10.3/storage/innobase/fil/fil0fil.cc:2102
          #6 0x557a047507e9 in buf_load /mariadb/10.3/storage/innobase/buf/buf0dump.cc:701
          #7 0x557a04750e4e in buf_dump_thread /mariadb/10.3/storage/innobase/buf/buf0dump.cc:829
      

      This looks like a false alarm, because the supposedly wrong access occurs inside the memory that was allocated from the stack for the constructor call Context::Context(). The class Context derives from latch_t.

      The failure looks similar to the one that was reported in MDEV-9359:

      CURRENT_TEST: encryption.innodb-missing-key
      mysqltest: At line 44: query 'SELECT SLEEP(5)' failed: 2013: Lost connection to MySQL server during query
      

      In any case, there is no need to acquire fil_system.mutex for decrementing a reference count. Let us use atomic memory access for fil_space_t::n_pending_ops and fil_space_t::n_pending_ios.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                marko Marko Mäkelä
                Reporter:
                marko Marko Mäkelä
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: