Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21330

Lock monitor doesn't print a semaphore's last reserved thread in non-debug builds and INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is totally broken

Details

    Description

      When a semaphore wait is in progress, InnoDB prints information about the wait in sync_array_cell_print():

      https://github.com/MariaDB/server/blob/mariadb-10.2.30/storage/innobase/sync/sync0arr.cc#L478

      This function prints the following information:

      • The thread ID of the thread waiting for the semaphore.
      • The type of semaphore. (i.e. mutex or RW-latch)
      • The source file and line where the thread is waiting for the lock.
      • The source file and line where the semaphore was last reserved.

      The information about where the semaphore was last reserved is some of the most useful information available, since it can often give clues about what kind of statement may be holding the semaphore. Unfortunately, for most kinds of semaphores, it appears that this information is only printed in debug builds. We can see that it is dependent on UNIV_DEBUG. i.e.:

      	if (type == SYNC_MUTEX) {
      		WaitMutex*	mutex = cell->latch.mutex;
      		const WaitMutex::MutexPolicy&	policy = mutex->policy();
      #ifdef UNIV_DEBUG
      		const char*	name = policy.get_enter_filename();
      		if (name == NULL) {
      			/* The mutex might have been released. */
      			name = "NULL";
      		}
      #endif /* UNIV_DEBUG */
       
      		if (mutex) {
      		fprintf(file,
      			"Mutex at %p, %s, lock var %x\n"
      #ifdef UNIV_DEBUG
      			"Last time reserved in file %s line %u"
      #endif /* UNIV_DEBUG */
      			"\n",
      			(void*) mutex,
      			policy.to_string().c_str(),
      			mutex->state()
      #ifdef UNIV_DEBUG
      			,name,
      			policy.get_enter_line()
      #endif /* UNIV_DEBUG */
      			);
      		}
      	} else if (type == SYNC_BUF_BLOCK) {
      		BlockWaitMutex*	mutex = cell->latch.bpmutex;
       
      		const BlockWaitMutex::MutexPolicy&	policy =
      			mutex->policy();
      #ifdef UNIV_DEBUG
      		const char*	name = policy.get_enter_filename();
      		if (name == NULL) {
      			/* The mutex might have been released. */
      			name = "NULL";
      		}
      #endif /* UNIV_DEBUG */
       
      		fprintf(file,
      			"Mutex at %p, %s, lock var %lu\n"
      #ifdef UNIV_DEBUG
      			"Last time reserved in file %s line %lu"
      #endif /* UNIV_DEBUG */
      			"\n",
      			(void*) mutex,
      			policy.to_string().c_str(),
      			(ulong) mutex->state()
      #ifdef UNIV_DEBUG
      			,name,
      			(ulong) policy.get_enter_line()
      #endif /* UNIV_DEBUG */
      		       );
      	}
      

      For the INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS table, the rows are built in the sync_arr_fill_sys_semphore_waits_table() function:

      https://github.com/MariaDB/server/blob/mariadb-10.2.30/storage/innobase/sync/sync0arr.cc#L1292

      In that function, a lot of the fields are not filled out in any build, because some code is commented out. i.e.:

      			if (type == SYNC_MUTEX) {
      				mutex = static_cast<WaitMutex*>(cell->latch.mutex);
       
      				if (mutex) {
      					// JAN: FIXME
      					// OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_OBJECT_NAME], mutex->cmutex_name));
      					OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAIT_OBJECT], (longlong)mutex));
      					OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "MUTEX"));
      					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_HOLDER_THREAD_ID], (longlong)mutex->thread_id));
      					//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_HOLDER_FILE], innobase_basename(mutex->file_name)));
      					//OK(fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->store(mutex->line, true));
      					//fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->set_notnull();
      					//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_CREATED_FILE], innobase_basename(mutex->cfile_name)));
      					//OK(fields[SYS_SEMAPHORE_WAITS_CREATED_LINE]->store(mutex->cline, true));
      					//fields[SYS_SEMAPHORE_WAITS_CREATED_LINE]->set_notnull();
      					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAITERS_FLAG], (longlong)mutex->waiters));
      					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_LOCK_WORD], (longlong)mutex->lock_word));
      					//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_FILE], innobase_basename(mutex->file_name)));
      					//OK(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->store(mutex->line, true));
      					//fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->set_notnull();
      					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_OS_WAIT_COUNT], mutex->count_os_wait));
      				}
      			} else if (type == RW_LOCK_X_WAIT
      				|| type == RW_LOCK_X
      				|| type == RW_LOCK_SX
      			        || type == RW_LOCK_S) {
      				rw_lock_t* rwlock=NULL;
       
      				rwlock = static_cast<rw_lock_t *> (cell->latch.lock);
       
      				if (rwlock) {
      					ulint writer = rw_lock_get_writer(rwlock);
       
      					OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAIT_OBJECT], (longlong)rwlock));
      					if (type == RW_LOCK_X) {
      						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_X"));
      					} else if (type == RW_LOCK_X_WAIT) {
      						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_X_WAIT"));
      					} else if (type == RW_LOCK_S) {
      						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_S"));
      					} else if (type == RW_LOCK_SX) {
      						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_SX"));
      					}
       
      					if (writer != RW_LOCK_NOT_LOCKED) {
      						// JAN: FIXME
      						// OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_OBJECT_NAME], rwlock->lock_name));
      						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WRITER_THREAD], (longlong)os_thread_pf(rwlock->writer_thread)));
       
      						if (writer == RW_LOCK_X) {
      							OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_RESERVATION_MODE], "RW_LOCK_X"));
      						} else if (writer == RW_LOCK_X_WAIT) {
      							OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_RESERVATION_MODE], "RW_LOCK_X_WAIT"));
      						} else if (type == RW_LOCK_SX) {
      							OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_RESERVATION_MODE], "RW_LOCK_SX"));
      						}
       
      						//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_HOLDER_THREAD_ID], (longlong)rwlock->thread_id));
      						//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_HOLDER_FILE], innobase_basename(rwlock->file_name)));
      						//OK(fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->store(rwlock->line, true));
      						//fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->set_notnull();
      						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_READERS], rw_lock_get_reader_count(rwlock)));
      						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAITERS_FLAG], (longlong)rwlock->waiters));
      						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_LOCK_WORD], (longlong)rwlock->lock_word));
      						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_LAST_READER_FILE], innobase_basename(rwlock->last_s_file_name)));
      						OK(fields[SYS_SEMAPHORE_WAITS_LAST_READER_LINE]->store(rwlock->last_s_line, true));
      						fields[SYS_SEMAPHORE_WAITS_LAST_READER_LINE]->set_notnull();
      						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_FILE], innobase_basename(rwlock->last_x_file_name)));
      						OK(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->store(rwlock->last_x_line, true));
      						fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->set_notnull();
      						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_OS_WAIT_COUNT], rwlock->count_os_wait));
      					}
      				}
      			}
      

      Attachments

        Issue Links

          Activity

            It is possible that the underlying information is not available in release builds, to reduce the memory footprint. Perhaps we should handle buf_block_t as a special case? Also, perhaps the latches could store a short numeric index into an array of all potential locations where they can be acquired?

            marko Marko Mäkelä added a comment - It is possible that the underlying information is not available in release builds, to reduce the memory footprint. Perhaps we should handle buf_block_t as a special case? Also, perhaps the latches could store a short numeric index into an array of all potential locations where they can be acquired?

            MDEV-15053 would remove buf_block_t::mutex and buf_pool_t::zip_mutex. But there are still numerous buf_block_t::lock, which I suppose are a more visible problem.

            marko Marko Mäkelä added a comment - MDEV-15053 would remove buf_block_t::mutex and buf_pool_t::zip_mutex . But there are still numerous buf_block_t::lock , which I suppose are a more visible problem.

            I've added filename and line number variables to mutexes. Every thread updates when taking a lock and this is a possible scalability issue. It was measured. For SSD max measured slowdown is 0.68% and for RAM filesystem it is 0.9%. Is it a lot or a little? We have tickets like https://jira.mariadb.org/browse/MDEV-7962 wsrep_on() takes 0.14% in OLTP RO and https://jira.mariadb.org/browse/MDEV-21212 buf_page_get_gen > buf_pool>stat.n_page_gets++ is a cpu waste (0.5-1%) So, 0.68% is a lot for us.

            kevg Eugene Kosov (Inactive) added a comment - I've added filename and line number variables to mutexes. Every thread updates when taking a lock and this is a possible scalability issue. It was measured. For SSD max measured slowdown is 0.68% and for RAM filesystem it is 0.9%. Is it a lot or a little? We have tickets like https://jira.mariadb.org/browse/MDEV-7962 wsrep_on() takes 0.14% in OLTP RO and https://jira.mariadb.org/browse/MDEV-21212 buf_page_get_gen > buf_pool >stat.n_page_gets++ is a cpu waste (0.5-1%) So, 0.68% is a lot for us.

            Even if there was no observable performance regression, I would rather not add any instrumentation to the InnoDB homebrew mutexes, because in MDEV-21452 we would like to switch to using normal mutexes and condition variables instead of InnoDB’s special mutexes adn events. It looks like in MDEV-23399 we would have to do that already for some mutexes and events.

            marko Marko Mäkelä added a comment - Even if there was no observable performance regression, I would rather not add any instrumentation to the InnoDB homebrew mutexes, because in MDEV-21452 we would like to switch to using normal mutexes and condition variables instead of InnoDB’s special mutexes adn events. It looks like in MDEV-23399 we would have to do that already for some mutexes and events.

            The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS that was introduced in MDEV-7109 was removed in MDEV-21452 (MariaDB Server 10.6.0).

            Since MariaDB Server 10.2.2, the view only covered InnoDB rw-locks. The rw-lock implementation was replaced in MariaDB 10.6.0 in MDEV-24142 and MDEV-24167. The last remains of the sync_array were removed (or replaced with a single std::atomic timestamp that allows a long wait of dict_sys.mutex to be caught) in MDEV-24152. Only PERFORMANCE_SCHEMA can provide any information about InnoDB mutexes or rw-locks. Buffer block latches are not instrumented at all.

            marko Marko Mäkelä added a comment - The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS that was introduced in MDEV-7109 was removed in MDEV-21452 (MariaDB Server 10.6.0). Since MariaDB Server 10.2.2, the view only covered InnoDB rw-locks. The rw-lock implementation was replaced in MariaDB 10.6.0 in MDEV-24142 and MDEV-24167 . The last remains of the sync_array were removed (or replaced with a single std::atomic timestamp that allows a long wait of dict_sys.mutex to be caught) in MDEV-24152 . Only PERFORMANCE_SCHEMA can provide any information about InnoDB mutexes or rw-locks. Buffer block latches are not instrumented at all.

            What is the use case for which this is needed in MariaDB Server 10.4 and 10.5?

            For debugging hangs like MDEV-27983 (a 10.6 regression) or MDEV-29835 (a 10.2.2 regression similar to MDEV-14637), the only way was to check the output of thread apply all backtrace full. Starting with MariaDB Server 10.6.12, the contents of mtr_t::m_memo is typically rather readable, because it uses a std::vector like format. That takes away much of the guessing.

            marko Marko Mäkelä added a comment - What is the use case for which this is needed in MariaDB Server 10.4 and 10.5? For debugging hangs like MDEV-27983 (a 10.6 regression) or MDEV-29835 (a 10.2.2 regression similar to MDEV-14637 ), the only way was to check the output of thread apply all backtrace full . Starting with MariaDB Server 10.6.12, the contents of mtr_t::m_memo is typically rather readable, because it uses a std::vector like format. That takes away much of the guessing.

            People

              julien.fritsch Julien Fritsch
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.