Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8571

After mysqloptimize sometimes one of the tables is marked as crashed

Details

    • 10.0.23

    Description

      Sometimes at nightly cron-jobs doing "mysqloptimize" one of the tables get marked "crashed" and automatic repair fails.
      The last one is caused by related bug.

      I tried to force the error by stressing the database with inserts/reads during mysqloptimize, but I was unable to get it crashed at the moment. But it happens sometimes at night (when the cron job runs).
      The script which consists of
      mysqlcheck -s -A
      mysqloptimize -s -A
      mysqlanalyze -s -A

      is running for years without any problem. The problem first occured after update from MariaDB 5.5 to 10.0.19.
      If this is relevant: all tables have Aria engine.

      I still expect some racing condition - as a "user" I'm not familiar when a table is marked "crashed" - but there must be some condition when some lock, or gathering the lock interfears with this state after lock is removed.

      Attachments

        Issue Links

          Activity

            Alessandro, mokraemer, thanks, it seems I am now able to reproduce the problem with a concurrent test.


            svoj (or whoever ends up fixing it),
            below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it.

            RQG grammar mdev8571.yy

            query_init:
            	CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1;
             
            query:
            	REPLACE INTO my_table () VALUES (),();
             
            thread1:
            	OPTIMIZE TABLE my_table;
             
            my_table:
            	t1 | t2;

            RQG command line

            perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir> 

            When the problem is hit, the output looks like this:

            # 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test
            # 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds.
            # 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started
            # 2015-09-17T03:24:40 [25179] Started periodic reporting process...
            # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed
            # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed
            ...

            It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line).
            The printed error is not the problem in itself, it's just a side effect (see MDEV-8475). The real problem is that at some point a TMM file does not get removed.
            There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated.

            I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because MDEV-8475 has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.

            elenst Elena Stepanova added a comment - Alessandro , mokraemer , thanks, it seems I am now able to reproduce the problem with a concurrent test. svoj (or whoever ends up fixing it), below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it. RQG grammar mdev8571.yy query_init: CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1;   query: REPLACE INTO my_table () VALUES (),();   thread1: OPTIMIZE TABLE my_table;   my_table: t1 | t2; RQG command line perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir> When the problem is hit, the output looks like this: # 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test # 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds. # 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started # 2015-09-17T03:24:40 [25179] Started periodic reporting process... # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed ... It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line). The printed error is not the problem in itself, it's just a side effect (see MDEV-8475 ). The real problem is that at some point a TMM file does not get removed. There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated. I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because MDEV-8475 has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.
            mokraemer Marc added a comment -

            wow finally we got a test-case - thanks Elena!

            mokraemer Marc added a comment - wow finally we got a test-case - thanks Elena!

            elenst, I spent half of the day attempting to run it with --basedir=.

            svoj Sergey Vojtovich added a comment - elenst , I spent half of the day attempting to run it with --basedir=.

            serg, please review fix for this bug.

            svoj Sergey Vojtovich added a comment - serg , please review fix for this bug.

            ok to push, obviously

            serg Sergei Golubchik added a comment - ok to push, obviously

            People

              svoj Sergey Vojtovich
              mokraemer Marc
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.