Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5089

possible deadlocks between rwlocks and mutexes

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5.40
    • OTHER
    • None

    Description

      safe_mutex has built-in deadlock detection, but it only works for mutexes.
      Still, compiling with -DUSE_MUTEX_INSTEAD_OF_RW_LOCKS we can magically turn all rwlocks into mutexes.

      When done, safe_mutex reports new locking order violations, the server doesn't even start.

      Attachments

        Issue Links

          Activity

            svoj Sergey Vojtovich added a comment - - edited

            SHOW VARIABLES may acquire LOCK_system_variables_hash read lock twice: in fill_variables() and then in intern_sys_var_ptr(). According to pthread_rwlock_rdlock() manual it seems to be acceptable: "A thread may hold multiple concurrent read locks on rwlock (that is, successfully call the pthread_rwlock_rdlock() function n times)."

            The same functions also have mixed lock order:
            fill_variables(): mysql_rwlock_rdlock(&LOCK_system_variables_hash);
            show_status_array(): mysql_mutex_lock(&LOCK_global_system_variables);
            intern_sys_var_ptr(): mysql_rwlock_rdlock(&LOCK_system_variables_hash);

            The above should also be acceptable because we don't acquire any locks while we're under wrlock(&LOCK_system_variables_hash).

            Will keep this code intact.

            svoj Sergey Vojtovich added a comment - - edited SHOW VARIABLES may acquire LOCK_system_variables_hash read lock twice: in fill_variables() and then in intern_sys_var_ptr(). According to pthread_rwlock_rdlock() manual it seems to be acceptable: "A thread may hold multiple concurrent read locks on rwlock (that is, successfully call the pthread_rwlock_rdlock() function n times)." The same functions also have mixed lock order: fill_variables(): mysql_rwlock_rdlock(&LOCK_system_variables_hash); show_status_array(): mysql_mutex_lock(&LOCK_global_system_variables); intern_sys_var_ptr(): mysql_rwlock_rdlock(&LOCK_system_variables_hash); The above should also be acceptable because we don't acquire any locks while we're under wrlock(&LOCK_system_variables_hash). Will keep this code intact.

            Pushed one minor revision to 5.5.37:
            revno: 4109
            revision-id: svoj@mariadb.org-20140213074049-wo2l3qdtgi0s2mjd
            parent: psergey@askmonty.org-20140311180702-1pntx903p1df1fyn
            committer: Sergey Vojtovich <svoj@mariadb.org>
            branch nick: 5.5
            timestamp: Thu 2014-02-13 11:40:49 +0400
            message:
            MDEV-5089 - possible deadlocks between rwlocks and mutexes

            Pre-MDL versions had direct relationship between LOCK_open and
            LOCK_global_system_variables, e.g.:
            intern_sys_var_ptr // locks LOCK_global_system_variable
            mysql_sys_var_char
            create_options_are_valid
            ha_innobase::create
            handler::ha_create
            ha_create_table
            rea_create_table
            mysql_create_table_no_lock // locks LOCK_open
            mysql_create_table

            With MDL this relationship was removed, but mutex order was still
            recorded. In fact there is indirect relationship between LOCK_open
            and LOCK_global_system_variables via rwlocks in reverse order.

            Removed LOCK_open and LOCK_global_system_variables order recording,
            instead assert that LOCK_open is never held in intern_sys_var_ptr().

            This solves only one of many problems detected with MDEV-5089.

            svoj Sergey Vojtovich added a comment - Pushed one minor revision to 5.5.37: revno: 4109 revision-id: svoj@mariadb.org-20140213074049-wo2l3qdtgi0s2mjd parent: psergey@askmonty.org-20140311180702-1pntx903p1df1fyn committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Thu 2014-02-13 11:40:49 +0400 message: MDEV-5089 - possible deadlocks between rwlocks and mutexes Pre-MDL versions had direct relationship between LOCK_open and LOCK_global_system_variables, e.g.: intern_sys_var_ptr // locks LOCK_global_system_variable mysql_sys_var_char create_options_are_valid ha_innobase::create handler::ha_create ha_create_table rea_create_table mysql_create_table_no_lock // locks LOCK_open mysql_create_table With MDL this relationship was removed, but mutex order was still recorded. In fact there is indirect relationship between LOCK_open and LOCK_global_system_variables via rwlocks in reverse order. Removed LOCK_open and LOCK_global_system_variables order recording, instead assert that LOCK_open is never held in intern_sys_var_ptr(). This solves only one of many problems detected with MDEV-5089 .

            There is read-lock in reverse order in Aria:

            ha_maria::repair();
              lock(share->intern_lock);
              _ma_update_auto_increment_key()/maria_rlast()/maria_rprev();
                rdlock(keyinfo->root_lock);
                unlock(keyinfo->root_lock);
              unlock(share->intern_lock);

            Which may conflict with write-lock e.g. in maria_write():

            maria_write();
              wrlock(keyinfo->root_lock);
              _ma_ck_write_btree()/.../_ma_new();
                lock(share->intern_lock);
                unlock(share->intern_lock);
              unlock(keyinfo->root_lock);

            But since repair code is executed with protection of exclusive lock (TL_WRITE) deadlock doesn't seem to be possible.

            svoj Sergey Vojtovich added a comment - There is read-lock in reverse order in Aria: ha_maria::repair(); lock(share->intern_lock); _ma_update_auto_increment_key()/maria_rlast()/maria_rprev(); rdlock(keyinfo->root_lock); unlock(keyinfo->root_lock); unlock(share->intern_lock); Which may conflict with write-lock e.g. in maria_write(): maria_write(); wrlock(keyinfo->root_lock); _ma_ck_write_btree()/.../_ma_new(); lock(share->intern_lock); unlock(share->intern_lock); unlock(keyinfo->root_lock); But since repair code is executed with protection of exclusive lock (TL_WRITE) deadlock doesn't seem to be possible.

            In 5.5 there is possible deadlock between 3 mutexes and 2 rwlocks:

            lock(LOCK_open)                    -> rdlock(LOCK_grant)                  SELECT * FROM INFORMATION_SCHEMA.COLUMNS
            wrlock(LOCK_grant)                 -> lock(acl_cache->lock)               GRANT/REVOKE CREATE/DROP USER
            lock(acl_cache->lock)              -> lock(LOCK_global_system_variables)  FLUSH PRIVILEGES
            lock(LOCK_global_system_variables) -> wrlock(LOCK_logger)                 SET @@global.log_output="TABLE"
            rdlock(LOCK_logger)                -> lock(LOCK_open)                     SELECT 1

            But threads are serialized by table-level locks. E.g. GRANT/etc acquires write-lock and FLUSH PRIVILEGES acquires read-lock. No actual deadlock possible.

            In 10.0 things are even better: there is no LOCK_open -> LOCK_grant order.

            svoj Sergey Vojtovich added a comment - In 5.5 there is possible deadlock between 3 mutexes and 2 rwlocks: lock(LOCK_open) -> rdlock(LOCK_grant) SELECT * FROM INFORMATION_SCHEMA.COLUMNS wrlock(LOCK_grant) -> lock(acl_cache->lock) GRANT/REVOKE CREATE/DROP USER lock(acl_cache->lock) -> lock(LOCK_global_system_variables) FLUSH PRIVILEGES lock(LOCK_global_system_variables) -> wrlock(LOCK_logger) SET @@global.log_output="TABLE" rdlock(LOCK_logger) -> lock(LOCK_open) SELECT 1 But threads are serialized by table-level locks. E.g. GRANT/etc acquires write-lock and FLUSH PRIVILEGES acquires read-lock. No actual deadlock possible. In 10.0 things are even better: there is no LOCK_open -> LOCK_grant order.

            All issues found in 5.5 are reported/fixed. No extra problems were detected in 10.0 and 10.1.

            svoj Sergey Vojtovich added a comment - All issues found in 5.5 are reported/fixed. No extra problems were detected in 10.0 and 10.1.

            People

              svoj Sergey Vojtovich
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.