[MDEV-8571] After mysqloptimize sometimes one of the tables is marked as crashed - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.0(EOL)
Fix Version/s: 10.0.23, 10.1.10
Component/s: Admin statements
Labels:
None
Environment:
Linux

Sprint:
10.0.23

Description

Sometimes at nightly cron-jobs doing "mysqloptimize" one of the tables get marked "crashed" and automatic repair fails.
The last one is caused by related bug.

I tried to force the error by stressing the database with inserts/reads during mysqloptimize, but I was unable to get it crashed at the moment. But it happens sometimes at night (when the cron job runs).
The script which consists of
mysqlcheck -s -A
mysqloptimize -s -A
mysqlanalyze -s -A

is running for years without any problem. The problem first occured after update from MariaDB 5.5 to 10.0.19.
If this is relevant: all tables have Aria engine.

I still expect some racing condition - as a "user" I'm not familiar when a table is marked "crashed" - but there must be some condition when some lock, or gathering the lock interfears with this state after lock is removed.

Attachments

Issue Links

relates to

MDEV-8475 stale .TMM file causes Aria engine to stop serving the table

Closed

links to

Github revision

Activity

Ascending order - Click to sort in descending order

View 18 older comments

Elena Stepanova added a comment - 2015-09-17 03:36

Alessandro, mokraemer, thanks, it seems I am now able to reproduce the problem with a concurrent test.

svoj (or whoever ends up fixing it),
below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it.

RQG grammar mdev8571.yy
query_init:
CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1;

query:
REPLACE INTO my_table () VALUES (),();

thread1:
OPTIMIZE TABLE my_table;

my_table:
t1 \| t2;

RQG command line
perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir>

When the problem is hit, the output looks like this:

# 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test

# 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds.

# 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started

# 2015-09-17T03:24:40 [25179] Started periodic reporting process...

# 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed

# 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed

...

It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line).
The printed error is not the problem in itself, it's just a side effect (see ~~MDEV-8475~~). The real problem is that at some point a TMM file does not get removed.
There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated.

I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because ~~MDEV-8475~~ has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.

Elena Stepanova added a comment - 2015-09-17 03:36 Alessandro , mokraemer , thanks, it seems I am now able to reproduce the problem with a concurrent test. svoj (or whoever ends up fixing it), below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it. RQG grammar mdev8571.yy query_init: CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1; query: REPLACE INTO my_table () VALUES (),(); thread1: OPTIMIZE TABLE my_table; my_table: t1 | t2; RQG command line perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir> When the problem is hit, the output looks like this: # 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test # 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds. # 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started # 2015-09-17T03:24:40 [25179] Started periodic reporting process... # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed ... It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line). The printed error is not the problem in itself, it's just a side effect (see MDEV-8475 ). The real problem is that at some point a TMM file does not get removed. There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated. I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because MDEV-8475 has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.

Marc added a comment - 2015-09-17 12:35

wow finally we got a test-case - thanks Elena!

Marc added a comment - 2015-09-17 12:35 wow finally we got a test-case - thanks Elena!

Sergey Vojtovich added a comment - 2015-12-10 12:36

elenst, I spent half of the day attempting to run it with --basedir=.

Sergey Vojtovich added a comment - 2015-12-10 12:36 elenst , I spent half of the day attempting to run it with --basedir=.

Sergey Vojtovich added a comment - 2015-12-10 13:47

serg, please review fix for this bug.

Sergey Vojtovich added a comment - 2015-12-10 13:47 serg , please review fix for this bug.

Sergei Golubchik added a comment - 2015-12-10 19:18

ok to push, obviously

Sergei Golubchik added a comment - 2015-12-10 19:18 ok to push, obviously

People

Assignee:: Sergey Vojtovich

Reporter:: Marc

Votes:: 2 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2015-08-05 14:55

Updated:: 2017-03-13 15:08

Resolved:: 2015-12-11 09:48

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server