Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.0(EOL)
-
None
-
Linux
-
10.0.23
Description
Sometimes at nightly cron-jobs doing "mysqloptimize" one of the tables get marked "crashed" and automatic repair fails.
The last one is caused by related bug.
I tried to force the error by stressing the database with inserts/reads during mysqloptimize, but I was unable to get it crashed at the moment. But it happens sometimes at night (when the cron job runs).
The script which consists of
mysqlcheck -s -A
mysqloptimize -s -A
mysqlanalyze -s -A
is running for years without any problem. The problem first occured after update from MariaDB 5.5 to 10.0.19.
If this is relevant: all tables have Aria engine.
I still expect some racing condition - as a "user" I'm not familiar when a table is marked "crashed" - but there must be some condition when some lock, or gathering the lock interfears with this state after lock is removed.
Attachments
Issue Links
- relates to
-
MDEV-8475 stale .TMM file causes Aria engine to stop serving the table
-
- Closed
-
- links to
Alessandro, mokraemer, thanks, it seems I am now able to reproduce the problem with a concurrent test.
svoj (or whoever ends up fixing it),
below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it.
RQG grammar mdev8571.yy
query_init:
CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1;
query:
REPLACE INTO my_table () VALUES (),();
thread1:
OPTIMIZE TABLE my_table;
my_table:
t1 | t2;
RQG command line
perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir>
When the problem is hit, the output looks like this:
# 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test
# 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds.
# 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started
# 2015-09-17T03:24:40 [25179] Started periodic reporting process...
# 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed
# 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed
...
It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line).
The printed error is not the problem in itself, it's just a side effect (see
MDEV-8475). The real problem is that at some point a TMM file does not get removed.There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated.
I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because
MDEV-8475has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.