Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10682

Race condition between ANALYZE TABLE and STATS_AUTO_RECALC

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-deb-precise-amd64/builds/6502/steps/test_5/logs/stdio

      innodb.innodb_stats 'innodb_plugin'      w3 [ fail ]
              Test ended at 2016-06-28 07:48:15
       
      CURRENT_TEST: innodb.innodb_stats
      --- /usr/share/mysql/mysql-test/suite/innodb/r/innodb_stats.result	2016-06-28 04:58:03.000000000 +0300
      +++ /run/shm/var/3/log/innodb_stats.reject	2016-06-28 07:48:14.987555677 +0300
      @@ -174,11 +174,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	1
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	3
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -226,11 +226,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	1
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	10
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -278,11 +278,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	2
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	2
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -330,11 +330,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	2
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	3
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -382,11 +382,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	3
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	3
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -434,11 +434,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	3
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	5
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -461,7 +461,7 @@
       SEQ_IN_INDEX	1
       COLUMN_NAME	a
       COLLATION	A
      -CARDINALITY	5
      +CARDINALITY	2
       SUB_PART	NULL
       PACKED	NULL
       NULLABLE	YES
      @@ -486,11 +486,11 @@
       stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
       ORDER BY stat_name;
       stat_name	n_diff_pfx01
      -stat_value	5
      +stat_value	0
       sample_size	1
       stat_description	a
       stat_name	n_diff_pfx02
      -stat_value	10
      +stat_value	0
       sample_size	1
       stat_description	a,DB_ROW_ID
       stat_name	n_leaf_pages
      @@ -513,7 +513,7 @@
       SEQ_IN_INDEX	1
       COLUMN_NAME	a
       COLLATION	A
      -CARDINALITY	10
      +CARDINALITY	2
       SUB_PART	NULL
       PACKED	NULL
       NULLABLE	YES
       
      mysqltest: Result length mismatch
       
       - skipping '/run/shm/var/3/log/innodb.innodb_stats-innodb_plugin/'
       
      Retrying test innodb.innodb_stats, attempt(2/3)...
      

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos73-ppc64/builds/15294/steps/mtr/logs/stdio

            innodb.innodb_stats 'innodb'             w4 [ fail ]
                    Test ended at 2020-12-11 17:08:22
             
            CURRENT_TEST: innodb.innodb_stats
            --- /usr/share/mysql-test/suite/innodb/r/innodb_stats.result	2020-12-11 14:20:22.000000000 +0000
            +++ /dev/shm/var/4/log/innodb_stats.reject	2020-12-11 17:08:22.400141447 +0000
            @@ -443,11 +443,11 @@
             stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
             ORDER BY stat_name;
             stat_name	n_diff_pfx01
            -stat_value	3
            +stat_value	5
             sample_size	1
             stat_description	a
             stat_name	n_diff_pfx02
            -stat_value	5
            +stat_value	9
             sample_size	1
             stat_description	a,DB_ROW_ID
             stat_name	n_leaf_pages
             
            mysqltest: Result content mismatch
            

            10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64/builds/17058/steps/mtr/logs/stdio

            innodb.innodb_stats 'innodb'             w3 [ fail ]
                    Test ended at 2020-12-09 17:04:23
             
            CURRENT_TEST: innodb.innodb_stats
            --- /usr/share/mysql-test/suite/innodb/r/innodb_stats.result	2020-12-09 14:58:26.000000000 +0000
            +++ /dev/shm/var/3/log/innodb_stats.reject	2020-12-09 17:04:22.902801315 +0000
            @@ -205,7 +205,7 @@
             SEQ_IN_INDEX	1
             COLUMN_NAME	a
             COLLATION	A
            -CARDINALITY	3
            +CARDINALITY	0
             SUB_PART	NULL
             PACKED	NULL
             NULLABLE	YES
            @@ -337,11 +337,11 @@
             stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
             ORDER BY stat_name;
             stat_name	n_diff_pfx01
            -stat_value	2
            +stat_value	0
             sample_size	1
             stat_description	a
             stat_name	n_diff_pfx02
            -stat_value	3
            +stat_value	0
             sample_size	1
             stat_description	a,DB_ROW_ID
             stat_name	n_leaf_pages
             
            mysqltest: Result content mismatch
            

            10.2 http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-trusty-x86/builds/20366/steps/mtr/logs/stdio

            innodb.innodb_stats 'innodb'             w2 [ fail ]
                    Test ended at 2020-11-21 03:20:42
             
            CURRENT_TEST: innodb.innodb_stats
            --- /usr/local/mariadb-10.2.37-linux-i686/mysql-test/suite/innodb/r/innodb_stats.result	2020-11-21 02:30:48.000000000 +0000
            +++ /usr/local/mariadb-10.2.37-linux-i686/mysql-test/suite/innodb/r/innodb_stats.reject	2020-11-21 03:20:41.943562528 +0000
            @@ -330,11 +330,11 @@
             stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
             ORDER BY stat_name;
             stat_name	n_diff_pfx01
            -stat_value	2
            +stat_value	0
             sample_size	1
             stat_description	a
             stat_name	n_diff_pfx02
            -stat_value	3
            +stat_value	0
             sample_size	1
             stat_description	a,DB_ROW_ID
             stat_name	n_leaf_pages
             
            mysqltest: Result content mismatch
            

            alice Alice Sherepa added a comment - 10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos73-ppc64/builds/15294/steps/mtr/logs/stdio innodb.innodb_stats 'innodb' w4 [ fail ] Test ended at 2020-12-11 17:08:22   CURRENT_TEST: innodb.innodb_stats --- /usr/share/mysql-test/suite/innodb/r/innodb_stats.result 2020-12-11 14:20:22.000000000 +0000 +++ /dev/shm/var/4/log/innodb_stats.reject 2020-12-11 17:08:22.400141447 +0000 @@ -443,11 +443,11 @@ stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size') ORDER BY stat_name; stat_name n_diff_pfx01 -stat_value 3 +stat_value 5 sample_size 1 stat_description a stat_name n_diff_pfx02 -stat_value 5 +stat_value 9 sample_size 1 stat_description a,DB_ROW_ID stat_name n_leaf_pages   mysqltest: Result content mismatch 10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64/builds/17058/steps/mtr/logs/stdio innodb.innodb_stats 'innodb' w3 [ fail ] Test ended at 2020-12-09 17:04:23   CURRENT_TEST: innodb.innodb_stats --- /usr/share/mysql-test/suite/innodb/r/innodb_stats.result 2020-12-09 14:58:26.000000000 +0000 +++ /dev/shm/var/3/log/innodb_stats.reject 2020-12-09 17:04:22.902801315 +0000 @@ -205,7 +205,7 @@ SEQ_IN_INDEX 1 COLUMN_NAME a COLLATION A -CARDINALITY 3 +CARDINALITY 0 SUB_PART NULL PACKED NULL NULLABLE YES @@ -337,11 +337,11 @@ stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size') ORDER BY stat_name; stat_name n_diff_pfx01 -stat_value 2 +stat_value 0 sample_size 1 stat_description a stat_name n_diff_pfx02 -stat_value 3 +stat_value 0 sample_size 1 stat_description a,DB_ROW_ID stat_name n_leaf_pages   mysqltest: Result content mismatch 10.2 http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-trusty-x86/builds/20366/steps/mtr/logs/stdio innodb.innodb_stats 'innodb' w2 [ fail ] Test ended at 2020-11-21 03:20:42   CURRENT_TEST: innodb.innodb_stats --- /usr/local/mariadb-10.2.37-linux-i686/mysql-test/suite/innodb/r/innodb_stats.result 2020-11-21 02:30:48.000000000 +0000 +++ /usr/local/mariadb-10.2.37-linux-i686/mysql-test/suite/innodb/r/innodb_stats.reject 2020-11-21 03:20:41.943562528 +0000 @@ -330,11 +330,11 @@ stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size') ORDER BY stat_name; stat_name n_diff_pfx01 -stat_value 2 +stat_value 0 sample_size 1 stat_description a stat_name n_diff_pfx02 -stat_value 3 +stat_value 0 sample_size 1 stat_description a,DB_ROW_ID stat_name n_leaf_pages   mysqltest: Result content mismatch

            I analyzed this in 10.6, using a patch:

            diff --git a/mysql-test/lib/My/SafeProcess/safe_process.cc b/mysql-test/lib/My/SafeProcess/safe_process.cc
            index 4d0d1e2a3a0..abc167a4300 100644
            --- a/mysql-test/lib/My/SafeProcess/safe_process.cc
            +++ b/mysql-test/lib/My/SafeProcess/safe_process.cc
            @@ -144,7 +144,7 @@ static int kill_child(bool was_killed)
               message("Killing child: %d", child_pid);
               // Terminate whole process group
               if (! was_killed)
            -    kill(-child_pid, SIGKILL);
            +    kill(-child_pid, SIGABRT);
             
               pid_t ret_pid= waitpid(child_pid, &status, 0);
               if (ret_pid == child_pid)
            

            while ./mtr --mysqld=--skip-innodb-use-native-aio --rr=-h --parallel=auto innodb.innodb_stats{,,,,,,,,,,,,,,,,,,,,,}; do :; done
            

            Finally, it failed like this:

            CURRENT_TEST: innodb.innodb_stats
            --- /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.result	2021-03-10 13:48:23.463589473 +0200
            +++ /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.reject	2021-03-18 09:03:05.692473203 +0200
            @@ -522,7 +522,7 @@
             SEQ_IN_INDEX	1
             COLUMN_NAME	a
             COLLATION	A
            -CARDINALITY	10
            +CARDINALITY	2
             SUB_PART	NULL
             PACKED	NULL
             NULLABLE	YES
             
            mysqltest: Result length mismatch
            

            I found the race condition:

            10.6 7c5c6fa65c9f0ac8862baf8ab74ad69dd0beb4f8

            (rr) rc
            Continuing.
             
            Thread 15 hit Hardware watchpoint 9: -location *index->stat_n_diff_key_vals
             
            Old value = 1
            New value = 0
            0x0000556a020d9421 in dict_stats_analyze_index_level (index=index@entry=0xe5b24215618, level=level@entry=0, n_diff=0xe5b24b87190, total_recs=total_recs@entry=0x7fc91d1a5e50, 
                total_pages=total_pages@entry=0x7fc91d1a5e58, n_diff_boundaries=n_diff_boundaries@entry=0x0, mtr=0x7fc91d1a5e80) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:1222
            1222					n_diff[i]++;
            (rr) 
            Continuing.
            [Switching to Thread 42277.42442]
             
            Thread 8 hit Hardware watchpoint 9: -location *index->stat_n_diff_key_vals
             
            Old value = 0
            New value = 8
            0x0000556a020d0383 in dict_stats_empty_index (index=index@entry=0xe5b24215618, empty_defrag_stats=empty_defrag_stats@entry=false) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:521
            521			index->stat_n_diff_key_vals[i] = 0;
            (rr) thr a 15 bt
             
            Thread 15 (Thread 42277.42574 (mmap_copy_4_mar)):
            #0  0x0000556a01e01e36 in rec_offs_n_fields (offsets=offsets@entry=0xe5b2421c8f8) at /mariadb/10.6/storage/innobase/include/rem0rec.h:623
            #1  0x0000556a01ed2e92 in rec_offs_nth_type (n=0, offsets=0xe5b2421c8f8) at /mariadb/10.6/storage/innobase/include/rem0rec.h:642
            #2  rec_offs_nth_extern (n=0, offsets=0xe5b2421c8f8) at /mariadb/10.6/storage/innobase/include/rem0rec.h:673
            #3  cmp_rec_rec (rec1=rec1@entry=0x2c4a5eba4149 "", rec2=rec2@entry=0x2c4a5eba4166 "", offsets1=offsets1@entry=0xe5b2404b078, offsets2=offsets2@entry=0xe5b2421c8f8, index=index@entry=0xe5b24215618, nulls_unequal=nulls_unequal@entry=false, matched_fields=0x7fc91d1a5cc0) at /mariadb/10.6/storage/innobase/rem/rem0cmp.cc:948
            #4  0x0000556a020d9704 in dict_stats_analyze_index_level (index=index@entry=0xe5b24215618, level=level@entry=0, n_diff=0xe5b24b87190, total_recs=total_recs@entry=0x7fc91d1a5e50, total_pages=total_pages@entry=0x7fc91d1a5e58, n_diff_boundaries=n_diff_boundaries@entry=0x0, mtr=0x7fc91d1a5e80) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:1193
            #5  0x0000556a020da920 in dict_stats_analyze_index (index=index@entry=0xe5b24215618) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2007
            #6  0x0000556a020db598 in dict_stats_update_persistent (table=table@entry=0xe5b24213808) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2262
            #7  0x0000556a020dc9eb in dict_stats_update (table=table@entry=0xe5b24213808, stats_upd_option=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:3233
            #8  0x0000556a01dc3dc0 in ha_innobase::info_low (this=0xe5b24b7f4b0, flag=flag@entry=28, is_analyze=is_analyze@entry=true) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:14004
            #9  0x0000556a01dc48b5 in ha_innobase::analyze (this=<optimized out>) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:14294
            …
            (rr) bt
            #0  0x0000556a020d0383 in dict_stats_empty_index (index=index@entry=0xe5b24215618, empty_defrag_stats=empty_defrag_stats@entry=false) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:521
            #1  0x0000556a020db583 in dict_stats_update_persistent (table=table@entry=0xe5b24213808) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2259
            #2  0x0000556a020dc9eb in dict_stats_update (table=table@entry=0xe5b24213808, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:3233
            #3  0x0000556a020df13b in dict_stats_process_entry_from_recalc_pool () at /mariadb/10.6/storage/innobase/dict/dict0stats_bg.cc:374
            #4  0x0000556a020df14b in dict_stats_func () at /mariadb/10.6/storage/innobase/dict/dict0stats_bg.cc:408
            

            My first attempt at fixing this was not successful, because it still failed like this:

            CURRENT_TEST: innodb.innodb_stats
            --- /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.result	2021-03-10 13:48:23.463589473 +0200
            +++ /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.reject	2021-03-18 09:44:46.406899999 +0200
            @@ -336,11 +336,11 @@
             stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
             ORDER BY stat_name;
             stat_name	n_diff_pfx01
            -stat_value	2
            +stat_value	0
             sample_size	1
             stat_description	a
             stat_name	n_diff_pfx02
            -stat_value	3
            +stat_value	0
             sample_size	1
             stat_description	a,DB_ROW_ID
             stat_name	n_leaf_pages
             
            mysqltest: Result content mismatch
            

            For the record, this is my first attempt:

            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index 4727141ec7d..8e9d2fad5ea 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -13991,6 +13991,9 @@ ha_innobase::info_low(
             			if (dict_stats_is_persistent_enabled(ib_table)) {
             
             				if (is_analyze) {
            +					dict_sys.mutex_lock();
            +					dict_stats_recalc_pool_del(ib_table);
            +					dict_sys.mutex_unlock();
             					opt = DICT_STATS_RECALC_PERSISTENT;
             				} else {
             					/* This is e.g. 'SHOW INDEXES', fetch
            

            marko Marko Mäkelä added a comment - I analyzed this in 10.6, using a patch: diff --git a/mysql-test/lib/My/SafeProcess/safe_process.cc b/mysql-test/lib/My/SafeProcess/safe_process.cc index 4d0d1e2a3a0..abc167a4300 100644 --- a/mysql-test/lib/My/SafeProcess/safe_process.cc +++ b/mysql-test/lib/My/SafeProcess/safe_process.cc @@ -144,7 +144,7 @@ static int kill_child(bool was_killed) message("Killing child: %d", child_pid); // Terminate whole process group if (! was_killed) - kill(-child_pid, SIGKILL); + kill(-child_pid, SIGABRT); pid_t ret_pid= waitpid(child_pid, &status, 0); if (ret_pid == child_pid) while ./mtr --mysqld=--skip-innodb-use-native-aio --rr=-h --parallel=auto innodb.innodb_stats{,,,,,,,,,,,,,,,,,,,,,}; do :; done Finally, it failed like this: CURRENT_TEST: innodb.innodb_stats --- /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.result 2021-03-10 13:48:23.463589473 +0200 +++ /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.reject 2021-03-18 09:03:05.692473203 +0200 @@ -522,7 +522,7 @@ SEQ_IN_INDEX 1 COLUMN_NAME a COLLATION A -CARDINALITY 10 +CARDINALITY 2 SUB_PART NULL PACKED NULL NULLABLE YES   mysqltest: Result length mismatch I found the race condition: 10.6 7c5c6fa65c9f0ac8862baf8ab74ad69dd0beb4f8 (rr) rc Continuing.   Thread 15 hit Hardware watchpoint 9: -location *index->stat_n_diff_key_vals   Old value = 1 New value = 0 0x0000556a020d9421 in dict_stats_analyze_index_level (index=index@entry=0xe5b24215618, level=level@entry=0, n_diff=0xe5b24b87190, total_recs=total_recs@entry=0x7fc91d1a5e50, total_pages=total_pages@entry=0x7fc91d1a5e58, n_diff_boundaries=n_diff_boundaries@entry=0x0, mtr=0x7fc91d1a5e80) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:1222 1222 n_diff[i]++; (rr) Continuing. [Switching to Thread 42277.42442]   Thread 8 hit Hardware watchpoint 9: -location *index->stat_n_diff_key_vals   Old value = 0 New value = 8 0x0000556a020d0383 in dict_stats_empty_index (index=index@entry=0xe5b24215618, empty_defrag_stats=empty_defrag_stats@entry=false) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:521 521 index->stat_n_diff_key_vals[i] = 0; (rr) thr a 15 bt   Thread 15 (Thread 42277.42574 (mmap_copy_4_mar)): #0 0x0000556a01e01e36 in rec_offs_n_fields (offsets=offsets@entry=0xe5b2421c8f8) at /mariadb/10.6/storage/innobase/include/rem0rec.h:623 #1 0x0000556a01ed2e92 in rec_offs_nth_type (n=0, offsets=0xe5b2421c8f8) at /mariadb/10.6/storage/innobase/include/rem0rec.h:642 #2 rec_offs_nth_extern (n=0, offsets=0xe5b2421c8f8) at /mariadb/10.6/storage/innobase/include/rem0rec.h:673 #3 cmp_rec_rec (rec1=rec1@entry=0x2c4a5eba4149 "", rec2=rec2@entry=0x2c4a5eba4166 "", offsets1=offsets1@entry=0xe5b2404b078, offsets2=offsets2@entry=0xe5b2421c8f8, index=index@entry=0xe5b24215618, nulls_unequal=nulls_unequal@entry=false, matched_fields=0x7fc91d1a5cc0) at /mariadb/10.6/storage/innobase/rem/rem0cmp.cc:948 #4 0x0000556a020d9704 in dict_stats_analyze_index_level (index=index@entry=0xe5b24215618, level=level@entry=0, n_diff=0xe5b24b87190, total_recs=total_recs@entry=0x7fc91d1a5e50, total_pages=total_pages@entry=0x7fc91d1a5e58, n_diff_boundaries=n_diff_boundaries@entry=0x0, mtr=0x7fc91d1a5e80) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:1193 #5 0x0000556a020da920 in dict_stats_analyze_index (index=index@entry=0xe5b24215618) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2007 #6 0x0000556a020db598 in dict_stats_update_persistent (table=table@entry=0xe5b24213808) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2262 #7 0x0000556a020dc9eb in dict_stats_update (table=table@entry=0xe5b24213808, stats_upd_option=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:3233 #8 0x0000556a01dc3dc0 in ha_innobase::info_low (this=0xe5b24b7f4b0, flag=flag@entry=28, is_analyze=is_analyze@entry=true) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:14004 #9 0x0000556a01dc48b5 in ha_innobase::analyze (this=<optimized out>) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:14294 … (rr) bt #0 0x0000556a020d0383 in dict_stats_empty_index (index=index@entry=0xe5b24215618, empty_defrag_stats=empty_defrag_stats@entry=false) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:521 #1 0x0000556a020db583 in dict_stats_update_persistent (table=table@entry=0xe5b24213808) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:2259 #2 0x0000556a020dc9eb in dict_stats_update (table=table@entry=0xe5b24213808, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.6/storage/innobase/dict/dict0stats.cc:3233 #3 0x0000556a020df13b in dict_stats_process_entry_from_recalc_pool () at /mariadb/10.6/storage/innobase/dict/dict0stats_bg.cc:374 #4 0x0000556a020df14b in dict_stats_func () at /mariadb/10.6/storage/innobase/dict/dict0stats_bg.cc:408 My first attempt at fixing this was not successful, because it still failed like this: CURRENT_TEST: innodb.innodb_stats --- /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.result 2021-03-10 13:48:23.463589473 +0200 +++ /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.reject 2021-03-18 09:44:46.406899999 +0200 @@ -336,11 +336,11 @@ stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size') ORDER BY stat_name; stat_name n_diff_pfx01 -stat_value 2 +stat_value 0 sample_size 1 stat_description a stat_name n_diff_pfx02 -stat_value 3 +stat_value 0 sample_size 1 stat_description a,DB_ROW_ID stat_name n_leaf_pages   mysqltest: Result content mismatch For the record, this is my first attempt: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 4727141ec7d..8e9d2fad5ea 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -13991,6 +13991,9 @@ ha_innobase::info_low( if (dict_stats_is_persistent_enabled(ib_table)) { if (is_analyze) { + dict_sys.mutex_lock(); + dict_stats_recalc_pool_del(ib_table); + dict_sys.mutex_unlock(); opt = DICT_STATS_RECALC_PERSISTENT; } else { /* This is e.g. 'SHOW INDEXES', fetch

            The problem with my initial fix is that it would not wait for existing background processing to terminate. It did reduce the probability of the race condition, but the processing of statistics for the table could have been started already during the execution of an INSERT, before the ANALYZE command was executed. The following fix allowed the test pass for 26×100 runs:

            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index 4727141ec7d..77fc14b5f0d 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -13991,6 +13991,15 @@ ha_innobase::info_low(
             			if (dict_stats_is_persistent_enabled(ib_table)) {
             
             				if (is_analyze) {
            +					row_mysql_lock_data_dictionary(
            +						m_prebuilt->trx);
            +					dict_stats_wait_bg_to_stop_using_table(
            +						ib_table, m_prebuilt->trx);
            +					row_mysql_unlock_data_dictionary(
            +						m_prebuilt->trx);
            +					dict_sys.mutex_lock();
            +					dict_stats_recalc_pool_del(ib_table);
            +					dict_sys.mutex_unlock();
             					opt = DICT_STATS_RECALC_PERSISTENT;
             				} else {
             					/* This is e.g. 'SHOW INDEXES', fetch
            @@ -14003,6 +14012,13 @@ ha_innobase::info_low(
             
             			ret = dict_stats_update(ib_table, opt);
             
            +			if (opt == DICT_STATS_RECALC_PERSISTENT) {
            +				dict_sys.mutex_lock();
            +				ib_table->stats_bg_flag
            +					&= byte(~BG_STAT_SHOULD_QUIT);
            +				dict_sys.mutex_unlock();
            +			}
            +
             			if (ret != DB_SUCCESS) {
             				m_prebuilt->trx->op_info = "";
             				DBUG_RETURN(HA_ERR_GENERIC);
            

            This signaling is very error-prone, of course. It would be much better to remove dict_table_t::stats_bg_flag and have dict_stats_process_entry_from_recalc_pool() acquire a metadata lock (MDL) that would make it mutually exclusive with ANALYZE TABLE and any DDL operations.

            marko Marko Mäkelä added a comment - The problem with my initial fix is that it would not wait for existing background processing to terminate. It did reduce the probability of the race condition, but the processing of statistics for the table could have been started already during the execution of an INSERT , before the ANALYZE command was executed. The following fix allowed the test pass for 26×100 runs: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 4727141ec7d..77fc14b5f0d 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -13991,6 +13991,15 @@ ha_innobase::info_low( if (dict_stats_is_persistent_enabled(ib_table)) { if (is_analyze) { + row_mysql_lock_data_dictionary( + m_prebuilt->trx); + dict_stats_wait_bg_to_stop_using_table( + ib_table, m_prebuilt->trx); + row_mysql_unlock_data_dictionary( + m_prebuilt->trx); + dict_sys.mutex_lock(); + dict_stats_recalc_pool_del(ib_table); + dict_sys.mutex_unlock(); opt = DICT_STATS_RECALC_PERSISTENT; } else { /* This is e.g. 'SHOW INDEXES', fetch @@ -14003,6 +14012,13 @@ ha_innobase::info_low( ret = dict_stats_update(ib_table, opt); + if (opt == DICT_STATS_RECALC_PERSISTENT) { + dict_sys.mutex_lock(); + ib_table->stats_bg_flag + &= byte(~BG_STAT_SHOULD_QUIT); + dict_sys.mutex_unlock(); + } + if (ret != DB_SUCCESS) { m_prebuilt->trx->op_info = ""; DBUG_RETURN(HA_ERR_GENERIC); This signaling is very error-prone, of course. It would be much better to remove dict_table_t::stats_bg_flag and have dict_stats_process_entry_from_recalc_pool() acquire a metadata lock (MDL) that would make it mutually exclusive with ANALYZE TABLE and any DDL operations.

            I started a new test (35×1000 repetitions), and it turns out that unfortunately, even the above fix is insufficient:

            innodb.innodb_stats 'innodb'             w30 [ 547 fail ]
                    Test ended at 2021-03-18 11:04:09
             
            CURRENT_TEST: innodb.innodb_stats
            --- /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.result	2021-03-10 13:48:23.463589473 +0200
            +++ /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.reject	2021-03-18 11:04:08.705201680 +0200
            @@ -124,11 +124,11 @@
             stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size')
             ORDER BY stat_name;
             stat_name	n_diff_pfx01
            -stat_value	1
            +stat_value	0
             sample_size	1
             stat_description	a
             stat_name	n_diff_pfx02
            -stat_value	2
            +stat_value	0
             sample_size	1
             stat_description	a,DB_ROW_ID
             stat_name	n_leaf_pages
             
            mysqltest: Result content mismatch
            

            I am not sure whether this would be practical to repeat under rr record. I think that my fix, which significantly reduces the probability of the test failure, is a step to the right direction, but we cannot consider this race condition to be fully fixed.

            Proper MDL protection for the background statistics calculation is something that will have to be evaluated in MDEV-15020.

            marko Marko Mäkelä added a comment - I started a new test (35×1000 repetitions), and it turns out that unfortunately, even the above fix is insufficient: innodb.innodb_stats 'innodb' w30 [ 547 fail ] Test ended at 2021-03-18 11:04:09   CURRENT_TEST: innodb.innodb_stats --- /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.result 2021-03-10 13:48:23.463589473 +0200 +++ /mariadb/10.6/mysql-test/suite/innodb/r/innodb_stats.reject 2021-03-18 11:04:08.705201680 +0200 @@ -124,11 +124,11 @@ stat_name IN ('n_diff_pfx01', 'n_diff_pfx02', 'n_leaf_pages', 'size') ORDER BY stat_name; stat_name n_diff_pfx01 -stat_value 1 +stat_value 0 sample_size 1 stat_description a stat_name n_diff_pfx02 -stat_value 2 +stat_value 0 sample_size 1 stat_description a,DB_ROW_ID stat_name n_leaf_pages   mysqltest: Result content mismatch I am not sure whether this would be practical to repeat under rr record . I think that my fix, which significantly reduces the probability of the test failure, is a step to the right direction, but we cannot consider this race condition to be fully fixed. Proper MDL protection for the background statistics calculation is something that will have to be evaluated in MDEV-15020 .

            There is a race condition in my above fix. We must first invoke dict_stats_recalc_pool_del() and then ensure that innodb_stats_auto_recalc is no longer processing the table. With that improved fix, the test passed on 10.6:

            innodb.innodb_stats 'innodb'             w11 [ 1000 pass ]    367
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 19758.330 of 788 seconds executing testcases
             
            Completed: All 35000 tests were successful.
            

            On 10.2, my fix appeared to be solid even without that improvement:

            innodb.innodb_stats 'innodb'             w26 [ pass ]    621
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 10107.392 of 448 seconds executing testcases
             
            Completed: All 35000 tests were successful.
            

            marko Marko Mäkelä added a comment - There is a race condition in my above fix. We must first invoke dict_stats_recalc_pool_del() and then ensure that innodb_stats_auto_recalc is no longer processing the table. With that improved fix, the test passed on 10.6: innodb.innodb_stats 'innodb' w11 [ 1000 pass ] 367 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 19758.330 of 788 seconds executing testcases   Completed: All 35000 tests were successful. On 10.2, my fix appeared to be solid even without that improvement: innodb.innodb_stats 'innodb' w26 [ pass ] 621 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 10107.392 of 448 seconds executing testcases   Completed: All 35000 tests were successful.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.