[MDEV-21330] Lock monitor doesn't print a semaphore's last reserved thread in non-debug builds and INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is totally broken Created: 2019-12-17  Updated: 2023-05-30  Resolved: 2023-05-30

Status: Closed
Project: MariaDB Server
Component/s: Information Schema, Storage Engine - InnoDB
Affects Version/s: 10.2.30, 10.3.21, 10.4.11
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Julien Fritsch
Resolution: Won't Fix Votes: 1
Labels: None

Issue Links:
Blocks
blocks MDEV-19988 Document Information Schema INNODB_SY... Open
Duplicate
is duplicated by MDEV-22527 Add thread info for innodb semaphore ... Closed
Relates
relates to MDEV-7109 Add support for INFORMATION_SCHEMA.IN... Closed
relates to MDEV-15053 Reduce buf_pool_t::mutex contention Closed
relates to MDEV-17237 thread IDs are printed in different f... Open
relates to MDEV-17238 Document special thread IDs used in S... Open
relates to MDEV-18391 Print ENGINE INNODB STATUS in machine... Open
relates to MDEV-18429 Consistent non-locking reads do not a... Closed
relates to MDEV-18572 Thread executing DROP TABLE listed tw... Open
relates to MDEV-18582 Port status variables related to SHOW... Closed
relates to MDEV-18698 Show InnoDB's internal background thr... Open
relates to MDEV-21390 lock_print_info_summary() should work... Closed
relates to MDEV-21452 Use condition variables and normal mu... Closed
relates to MDEV-21566 Lock monitor doesn't print a name for... Closed
relates to MDEV-22087 Increase buffer size for query in SHO... Open
relates to MDEV-23399 10.5 performance regression with IO-b... Closed
relates to MDEV-24142 rw_lock_t has unnecessarily complex w... Closed
relates to MDEV-24167 InnoDB unnecessarily uses complex rw-... Closed

 Description   

When a semaphore wait is in progress, InnoDB prints information about the wait in sync_array_cell_print():

https://github.com/MariaDB/server/blob/mariadb-10.2.30/storage/innobase/sync/sync0arr.cc#L478

This function prints the following information:

  • The thread ID of the thread waiting for the semaphore.
  • The type of semaphore. (i.e. mutex or RW-latch)
  • The source file and line where the thread is waiting for the lock.
  • The source file and line where the semaphore was last reserved.

The information about where the semaphore was last reserved is some of the most useful information available, since it can often give clues about what kind of statement may be holding the semaphore. Unfortunately, for most kinds of semaphores, it appears that this information is only printed in debug builds. We can see that it is dependent on UNIV_DEBUG. i.e.:

	if (type == SYNC_MUTEX) {
		WaitMutex*	mutex = cell->latch.mutex;
		const WaitMutex::MutexPolicy&	policy = mutex->policy();
#ifdef UNIV_DEBUG
		const char*	name = policy.get_enter_filename();
		if (name == NULL) {
			/* The mutex might have been released. */
			name = "NULL";
		}
#endif /* UNIV_DEBUG */
 
		if (mutex) {
		fprintf(file,
			"Mutex at %p, %s, lock var %x\n"
#ifdef UNIV_DEBUG
			"Last time reserved in file %s line %u"
#endif /* UNIV_DEBUG */
			"\n",
			(void*) mutex,
			policy.to_string().c_str(),
			mutex->state()
#ifdef UNIV_DEBUG
			,name,
			policy.get_enter_line()
#endif /* UNIV_DEBUG */
			);
		}
	} else if (type == SYNC_BUF_BLOCK) {
		BlockWaitMutex*	mutex = cell->latch.bpmutex;
 
		const BlockWaitMutex::MutexPolicy&	policy =
			mutex->policy();
#ifdef UNIV_DEBUG
		const char*	name = policy.get_enter_filename();
		if (name == NULL) {
			/* The mutex might have been released. */
			name = "NULL";
		}
#endif /* UNIV_DEBUG */
 
		fprintf(file,
			"Mutex at %p, %s, lock var %lu\n"
#ifdef UNIV_DEBUG
			"Last time reserved in file %s line %lu"
#endif /* UNIV_DEBUG */
			"\n",
			(void*) mutex,
			policy.to_string().c_str(),
			(ulong) mutex->state()
#ifdef UNIV_DEBUG
			,name,
			(ulong) policy.get_enter_line()
#endif /* UNIV_DEBUG */
		       );
	}

For the INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS table, the rows are built in the sync_arr_fill_sys_semphore_waits_table() function:

https://github.com/MariaDB/server/blob/mariadb-10.2.30/storage/innobase/sync/sync0arr.cc#L1292

In that function, a lot of the fields are not filled out in any build, because some code is commented out. i.e.:

			if (type == SYNC_MUTEX) {
				mutex = static_cast<WaitMutex*>(cell->latch.mutex);
 
				if (mutex) {
					// JAN: FIXME
					// OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_OBJECT_NAME], mutex->cmutex_name));
					OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAIT_OBJECT], (longlong)mutex));
					OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "MUTEX"));
					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_HOLDER_THREAD_ID], (longlong)mutex->thread_id));
					//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_HOLDER_FILE], innobase_basename(mutex->file_name)));
					//OK(fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->store(mutex->line, true));
					//fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->set_notnull();
					//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_CREATED_FILE], innobase_basename(mutex->cfile_name)));
					//OK(fields[SYS_SEMAPHORE_WAITS_CREATED_LINE]->store(mutex->cline, true));
					//fields[SYS_SEMAPHORE_WAITS_CREATED_LINE]->set_notnull();
					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAITERS_FLAG], (longlong)mutex->waiters));
					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_LOCK_WORD], (longlong)mutex->lock_word));
					//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_FILE], innobase_basename(mutex->file_name)));
					//OK(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->store(mutex->line, true));
					//fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->set_notnull();
					//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_OS_WAIT_COUNT], mutex->count_os_wait));
				}
			} else if (type == RW_LOCK_X_WAIT
				|| type == RW_LOCK_X
				|| type == RW_LOCK_SX
			        || type == RW_LOCK_S) {
				rw_lock_t* rwlock=NULL;
 
				rwlock = static_cast<rw_lock_t *> (cell->latch.lock);
 
				if (rwlock) {
					ulint writer = rw_lock_get_writer(rwlock);
 
					OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAIT_OBJECT], (longlong)rwlock));
					if (type == RW_LOCK_X) {
						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_X"));
					} else if (type == RW_LOCK_X_WAIT) {
						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_X_WAIT"));
					} else if (type == RW_LOCK_S) {
						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_S"));
					} else if (type == RW_LOCK_SX) {
						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_WAIT_TYPE], "RW_LOCK_SX"));
					}
 
					if (writer != RW_LOCK_NOT_LOCKED) {
						// JAN: FIXME
						// OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_OBJECT_NAME], rwlock->lock_name));
						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WRITER_THREAD], (longlong)os_thread_pf(rwlock->writer_thread)));
 
						if (writer == RW_LOCK_X) {
							OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_RESERVATION_MODE], "RW_LOCK_X"));
						} else if (writer == RW_LOCK_X_WAIT) {
							OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_RESERVATION_MODE], "RW_LOCK_X_WAIT"));
						} else if (type == RW_LOCK_SX) {
							OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_RESERVATION_MODE], "RW_LOCK_SX"));
						}
 
						//OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_HOLDER_THREAD_ID], (longlong)rwlock->thread_id));
						//OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_HOLDER_FILE], innobase_basename(rwlock->file_name)));
						//OK(fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->store(rwlock->line, true));
						//fields[SYS_SEMAPHORE_WAITS_HOLDER_LINE]->set_notnull();
						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_READERS], rw_lock_get_reader_count(rwlock)));
						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_WAITERS_FLAG], (longlong)rwlock->waiters));
						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_LOCK_WORD], (longlong)rwlock->lock_word));
						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_LAST_READER_FILE], innobase_basename(rwlock->last_s_file_name)));
						OK(fields[SYS_SEMAPHORE_WAITS_LAST_READER_LINE]->store(rwlock->last_s_line, true));
						fields[SYS_SEMAPHORE_WAITS_LAST_READER_LINE]->set_notnull();
						OK(field_store_string(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_FILE], innobase_basename(rwlock->last_x_file_name)));
						OK(fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->store(rwlock->last_x_line, true));
						fields[SYS_SEMAPHORE_WAITS_LAST_WRITER_LINE]->set_notnull();
						OK(field_store_ulint(fields[SYS_SEMAPHORE_WAITS_OS_WAIT_COUNT], rwlock->count_os_wait));
					}
				}
			}



 Comments   
Comment by Marko Mäkelä [ 2019-12-17 ]

It is possible that the underlying information is not available in release builds, to reduce the memory footprint. Perhaps we should handle buf_block_t as a special case? Also, perhaps the latches could store a short numeric index into an array of all potential locations where they can be acquired?

Comment by Marko Mäkelä [ 2020-05-27 ]

MDEV-15053 would remove buf_block_t::mutex and buf_pool_t::zip_mutex. But there are still numerous buf_block_t::lock, which I suppose are a more visible problem.

Comment by Eugene Kosov (Inactive) [ 2020-07-30 ]

I've added filename and line number variables to mutexes. Every thread updates when taking a lock and this is a possible scalability issue. It was measured. For SSD max measured slowdown is 0.68% and for RAM filesystem it is 0.9%. Is it a lot or a little? We have tickets like https://jira.mariadb.org/browse/MDEV-7962 wsrep_on() takes 0.14% in OLTP RO and https://jira.mariadb.org/browse/MDEV-21212 buf_page_get_gen > buf_pool>stat.n_page_gets++ is a cpu waste (0.5-1%) So, 0.68% is a lot for us.

Comment by Marko Mäkelä [ 2020-09-01 ]

Even if there was no observable performance regression, I would rather not add any instrumentation to the InnoDB homebrew mutexes, because in MDEV-21452 we would like to switch to using normal mutexes and condition variables instead of InnoDB’s special mutexes adn events. It looks like in MDEV-23399 we would have to do that already for some mutexes and events.

Comment by Marko Mäkelä [ 2020-12-17 ]

The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS that was introduced in MDEV-7109 was removed in MDEV-21452 (MariaDB Server 10.6.0).

Since MariaDB Server 10.2.2, the view only covered InnoDB rw-locks. The rw-lock implementation was replaced in MariaDB 10.6.0 in MDEV-24142 and MDEV-24167. The last remains of the sync_array were removed (or replaced with a single std::atomic timestamp that allows a long wait of dict_sys.mutex to be caught) in MDEV-24152. Only PERFORMANCE_SCHEMA can provide any information about InnoDB mutexes or rw-locks. Buffer block latches are not instrumented at all.

Comment by Marko Mäkelä [ 2023-05-16 ]

What is the use case for which this is needed in MariaDB Server 10.4 and 10.5?

For debugging hangs like MDEV-27983 (a 10.6 regression) or MDEV-29835 (a 10.2.2 regression similar to MDEV-14637), the only way was to check the output of thread apply all backtrace full. Starting with MariaDB Server 10.6.12, the contents of mtr_t::m_memo is typically rather readable, because it uses a std::vector like format. That takes away much of the guessing.

Generated at Thu Feb 08 09:06:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.