Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Description
MySQL 5.6.45 contains a change that refers to Oracle Bug #25289359 DML/DDL ON LARGE FULLTEXT TABLES CAUSE SEMAPHORE TIMEOUTS AND ASSERTION/SUICIDE.
There is no test case, but there is debug instrumentation. The change continues the bogus assumption that time(NULL)) is a monotonically increasing sequence. That assumption is demonstrably broken in MDEV-14154. The change might also assume fair scheduling among the threads, which might not hold on a heavily loaded system.
I think that we should study what the problem is and whether it affects MariaDB, and then come up with a better fix.
Attachments
Issue Links
- relates to
-
MDEV-14154 Failing assertion: slot->last_run <= current_time in fts0opt.cc
-
- Closed
-
-
MDEV-16264 Implement a common work queue for InnoDB background tasks
-
- Closed
-
-
MDEV-20127 Merge new release of InnoDB 5.6.45 to 10.1
-
- Closed
-
The main idea of the Oracle fix is related to the limiting the atomicity of fts_sync_write_words().
This part of the fix wrongly assumes that the system clock is monotonic (never moving backwards):
ulint cache_lock_time = ut_time() - sync_start_time;
Similar to our
MDEV-14154changes, in particular the one that removed bogus assertions, we should use something like this:That is, we will time out if the time moved backwards.
Because time(NULL) may have a much lower overhead than my_interval_timer() or other monotonic clock sources and because the precision of one second suffices here, I think that we should stick to time(NULL).
Anyway, the main idea of the Oracle change is to extend the innodb_fatal_semaphore_wait_threshold (srv_fatal_semaphore_wait_threshold) if a fts_sync_table() operation from outside the optimizer thread is taking longer. That is, it will prevent the operation of the InnoDB built-in watchdog, instead of actually fixing the root cause of the problem.