Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7084

innodb index stats inadequate using constant innodb_stats_sample_pages

Details

    Description

      The rows_per_key for idx_obs_daily_1 and idx_obs_daily_2 are nowhere near the same. The rows_per_key for the first element on ibx_obs_daily_2 on the other 5 database slave servers of identical content was been ~2K and ~8K.

      Its highly suspicious that exactly the same number got calculated here for rows_per_field when the first element of the each of these indexes is a different field.

      ANALYZE table was run on all tables 2 weeks ago when the first instance of an incorrect query plan was observed. Though data changes this index stats also got corrupted again.

      MariaDB [weather]> select * from information_schema.INNODB_INDEX_STATS where table_schema='weather' and table_name='obs_daily';
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | table_schema | table_name | index_name      | fields | rows_per_key     | index_total_pages | index_leaf_pages |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | weather      | obs_daily  | PRIMARY         |      1 | 1                |            204608 |           178951 |
      | weather      | obs_daily  | idx_obs_daily_1 |      3 | 2708708, 1670, 1 |             66944 |            58327 |
      | weather      | obs_daily  | idx_obs_daily_2 |      2 | 2708708, 0       |             27904 |            24298 |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      3 rows in set (0.00 sec)
       
      MariaDB [weather]> analyze table weather.obs_daily;
      +-------------------+---------+----------+----------+
      | Table             | Op      | Msg_type | Msg_text |
      +-------------------+---------+----------+----------+
      | weather.obs_daily | analyze | status   | OK       |
      +-------------------+---------+----------+----------+
      1 row in set (0.04 sec)
       
      MariaDB [weather]> select * from information_schema.INNODB_INDEX_STATS where table_schema='weather' and table_name='obs_daily';
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | table_schema | table_name | index_name      | fields | rows_per_key     | index_total_pages | index_leaf_pages |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | weather      | obs_daily  | PRIMARY         |      1 | 1                |            206976 |           180899 |
      | weather      | obs_daily  | idx_obs_daily_1 |      3 | 2884335, 3533, 1 |             67328 |            58696 |
      | weather      | obs_daily  | idx_obs_daily_2 |      2 | 8472, 1          |             28032 |            24442 |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      3 rows in set (0.00 sec)

      | obs_daily | CREATE TABLE `obs_daily` (
        `iId` int(11) NOT NULL AUTO_INCREMENT, 
        `sLocType` varchar(10) NOT NULL DEFAULT '',  
        `sLocCode` varchar(30) NOT NULL DEFAULT '',  
        `dtDate` date NOT NULL DEFAULT '0000-00-00',  
      ....
        PRIMARY KEY (`iId`),
        UNIQUE KEY `idx_obs_daily_1` (`sLocType`,`sLocCode`,`dtDate`),
        KEY `idx_obs_daily_2` (`dtDate`)
      ) ENGINE=InnoDB AUTO_INCREMENT=23562482 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC |

      Attachments

        Issue Links

          Activity

            I think you are correct and because srv_stats_sample_pages can be > 10 we should have:

            if (srv_stats_sample_traditional) {
            		/* It makes no sense to test more pages than are contained
            		in the index, thus we lower the number if it is too high */
            		if (srv_stats_sample_pages > index->stat_index_size) {
            			if (index->stat_index_size > 0) {
            				n_sample_pages = index->stat_index_size;
            			} else {
            				n_sample_pages = 1;
            			}
            		} else {
            			n_sample_pages = srv_stats_sample_pages;
            		}
            	} else {
            		/* New logaritmic number of pages that are estimated. We
            		first pick minimun from srv_stats_sample_pages and number of
            		pages on index. Then we pick maximum from previous number of
            		pages and log2(number of index pages) * srv_stats_sample_pages. */
            		if (index->stat_index_size > 0) {
            			n_sample_pages = ut_min(index->stat_index_size, 
                                                                               ut_max(ut_min(srv_stats_sample_pages, index->stat_index_size),
            				                                                log2(index->stat_index_size)*srv_stats_sample_pages));
            		} else {
            			n_sample_pages = 1;
            		}
            	}

            jplindst Jan Lindström (Inactive) added a comment - I think you are correct and because srv_stats_sample_pages can be > 10 we should have: if (srv_stats_sample_traditional) { /* It makes no sense to test more pages than are contained in the index, thus we lower the number if it is too high */ if (srv_stats_sample_pages > index->stat_index_size) { if (index->stat_index_size > 0) { n_sample_pages = index->stat_index_size; } else { n_sample_pages = 1; } } else { n_sample_pages = srv_stats_sample_pages; } } else { /* New logaritmic number of pages that are estimated. We first pick minimun from srv_stats_sample_pages and number of pages on index. Then we pick maximum from previous number of pages and log2(number of index pages) * srv_stats_sample_pages. */ if (index->stat_index_size > 0) { n_sample_pages = ut_min(index->stat_index_size, ut_max(ut_min(srv_stats_sample_pages, index->stat_index_size), log2(index->stat_index_size)*srv_stats_sample_pages)); } else { n_sample_pages = 1; } }
            danblack Daniel Black added a comment - - edited

            Maths makes some of these paths unreachable.

            So taking index size as I and sample as S and log(I)*S as L

            so the current expression is min(I, max( min(S,I), L)

            case 1: assume S < I

            min(I, max( min(S,I), L) becomes

            min(I , max( S, L))

            but since L=LOG2(I)*S and log2(I) >=1 L>S always so max(S,L) = L.

            so we have:

            min(I , L)

            case 2: assume I < S

            min(I, max( min(S,I), L) becomes

            min(I, max( I, L))

            case 2a: L > I

            min(I, max( I, L))

            becomes

            min(I, L)

            becomes I

            case 2b: when L < I

            min(I, max( I, L))

            becomes

            min(I, I )

            becomes I

            so our expression is:

            n_pages = S < I ? min(I,L) : I

            hence the simple version:

            	} else {
            		/* New logarithmic number of pages that are estimated.
                            If we have only 0 or 1 index pages then we can only take 1 sample.
             
                            So taking index size as I and sample as S and log(I)*S as L
             
                            requirement 1) we want the out limit of the expression to not exceed I;
                            requirement 2) we want the ideal pages to be at least S;
                            so the current expression is min(I, max( min(S,I), L)
             
                            looking for simplifications:
             
                            case 1: assume S < I
                            min(I, max( min(S,I), L) -> min(I , max( S, L))
             
                            but since L=LOG2(I)*S and log2(I) >=1   L>S always so max(S,L) = L.
             
                            so we have: min(I , L)
             
                             case 2: assume I < S
                             min(I, max( min(S,I), L) -> min(I, max( I, L))
             
                             case 2a: L > I
                             min(I, max( I, L)) -> min(I, L) -> I
             
                            case 2b: when L < I
                            min(I, max( I, L))  ->  min(I, I ) -> I
                            
                            so taking all case2 paths is I, our expression is:
                            n_pages = S < I? min(I,L) : I
                            */
                            if (index->stat_index_size > 1) {
                              n_sample_pages = (srv_stats_sample_pages < index->stat_index_size) ? ut_min(index->stat_index_size, log2(index->stat_index_size)*srv_stats_sample_pages) : index->stat_index_size;
                            }
                           else 
                           {
                             n_sample_pages = 1;
                          }
            	}

            edited: to fix a transposition in the comments

            danblack Daniel Black added a comment - - edited Maths makes some of these paths unreachable. So taking index size as I and sample as S and log(I)*S as L so the current expression is min(I, max( min(S,I), L) case 1: assume S < I min(I, max( min(S,I), L) becomes min(I , max( S, L)) but since L=LOG2(I)*S and log2(I) >=1 L>S always so max(S,L) = L. so we have: min(I , L) case 2: assume I < S min(I, max( min(S,I), L) becomes min(I, max( I, L)) case 2a: L > I min(I, max( I, L)) becomes min(I, L) becomes I case 2b: when L < I min(I, max( I, L)) becomes min(I, I ) becomes I so our expression is: n_pages = S < I ? min(I,L) : I hence the simple version: } else { /* New logarithmic number of pages that are estimated. If we have only 0 or 1 index pages then we can only take 1 sample.   So taking index size as I and sample as S and log(I)*S as L   requirement 1) we want the out limit of the expression to not exceed I; requirement 2) we want the ideal pages to be at least S; so the current expression is min(I, max( min(S,I), L)   looking for simplifications:   case 1: assume S < I min(I, max( min(S,I), L) -> min(I , max( S, L))   but since L=LOG2(I)*S and log2(I) >=1 L>S always so max(S,L) = L.   so we have: min(I , L)   case 2: assume I < S min(I, max( min(S,I), L) -> min(I, max( I, L))   case 2a: L > I min(I, max( I, L)) -> min(I, L) -> I   case 2b: when L < I min(I, max( I, L)) -> min(I, I ) -> I so taking all case2 paths is I, our expression is: n_pages = S < I? min(I,L) : I */ if (index->stat_index_size > 1) { n_sample_pages = (srv_stats_sample_pages < index->stat_index_size) ? ut_min(index->stat_index_size, log2(index->stat_index_size)*srv_stats_sample_pages) : index->stat_index_size; } else { n_sample_pages = 1; } } edited: to fix a transposition in the comments

            Thanks, I really should educate myself more on math.

            jplindst Jan Lindström (Inactive) added a comment - Thanks, I really should educate myself more on math.
            danblack Daniel Black added a comment -

            You're welcome. If you hadn't come back with a longer form I woudn't have done the proper proof and the slight improvement on my original 6 hrs ago wouldn't have been realised.

            danblack Daniel Black added a comment - You're welcome. If you hadn't come back with a longer form I woudn't have done the proper proof and the slight improvement on my original 6 hrs ago wouldn't have been realised.
            danblack Daniel Black added a comment -

            thanks Jan. I see you committed this r4377

            much appreciate the credit in the commit too.

            danblack Daniel Black added a comment - thanks Jan. I see you committed this r4377 much appreciate the credit in the commit too.

            People

              jplindst Jan Lindström (Inactive)
              danblack Daniel Black
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.