Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32286

ANALYZE displays a huge number of InnoDB secondary index pages_accessed

Details

    Description

      When a table is accessed by an index, "pages_accessed" in the "r_engine_stats" of ANALYZE FORMAT=JSON output displays suspiciously large numbers. See the test case attached and compare pages_accessed=14 for full table scan against pages_accessed=2003 for an access by index.

      Attachments

        Issue Links

          Activity

            I think that we need mtr test coverage of pages_accessed, starting from something simple, such as ANALYZE FORMAT=JSON SELECT * FROM empty_table, which should access 1 page. When I was debugging the test case that I posted to MDEV-29151 yesterday, I noticed that SELECT * FROM t1 LIMIT 1 would incorrectly report accessing 3 pages, although InnoDB only incremented the thread-local counter by 1.

            marko Marko Mäkelä added a comment - I think that we need mtr test coverage of pages_accessed , starting from something simple, such as ANALYZE FORMAT=JSON SELECT * FROM empty_table , which should access 1 page. When I was debugging the test case that I posted to MDEV-29151 yesterday, I noticed that SELECT * FROM t1 LIMIT 1 would incorrectly report accessing 3 pages, although InnoDB only incremented the thread-local counter by 1.
            psergei Sergei Petrunia added a comment - - edited

            Is it ANALYZE that causes the problem? Or its "source" data that is printed into the slow query log?

            I take the attached testcase, add printing to slow query log:

            --- /tmp/innodb-pages-accessed.test     2023-09-29 12:29:11.591509426 +0300
            +++ main/_a1.test       2023-09-29 11:51:57.407542018 +0300
            @@ -19,6 +19,8 @@
             insert into domains values (1, 'ab'),(2, 'ac'),(3, 'ad'), (4, 'ba'), (5, 'bb'),
               (6, 'be'), (7, 'ca'), (8, 'cb'), (9, 'cc'), (10, 'cd');
             
            +set log_slow_verbosity=engine;
            +set long_query_time=0;
             --echo # Creating initial dataset...
             set unique_checks=0, foreign_key_checks=0;
             insert into t1
            

            run the test and see

            # Pages_accessed: 32  Pages_read: 0  Pages_updated: 0  Old_rows_read: 0
            # Pages_read_time: 0.0000  Engine_time: 31.0940
            ...
            # Pages_accessed: 2003  Pages_read: 0  Pages_updated: 0  Old_rows_read: 0
            # Pages_read_time: 0.0000  Engine_time: 15.3070
            ...
            # Pages_accessed: 2003  Pages_read: 0  Pages_updated: 0  Old_rows_read: 0
            # Pages_read_time: 0.0000  Engine_time: 15.3070
            

            It looks like the slow query log has wrong data, too. So the cause is MDEV-31558.

            psergei Sergei Petrunia added a comment - - edited Is it ANALYZE that causes the problem? Or its "source" data that is printed into the slow query log? I take the attached testcase, add printing to slow query log: --- /tmp/innodb-pages-accessed.test 2023-09-29 12:29:11.591509426 +0300 +++ main/_a1.test 2023-09-29 11:51:57.407542018 +0300 @@ -19,6 +19,8 @@ insert into domains values (1, 'ab'),(2, 'ac'),(3, 'ad'), (4, 'ba'), (5, 'bb'), (6, 'be'), (7, 'ca'), (8, 'cb'), (9, 'cc'), (10, 'cd'); +set log_slow_verbosity=engine; +set long_query_time=0; --echo # Creating initial dataset... set unique_checks=0, foreign_key_checks=0; insert into t1 run the test and see # Pages_accessed: 32 Pages_read: 0 Pages_updated: 0 Old_rows_read: 0 # Pages_read_time: 0.0000 Engine_time: 31.0940 ... # Pages_accessed: 2003 Pages_read: 0 Pages_updated: 0 Old_rows_read: 0 # Pages_read_time: 0.0000 Engine_time: 15.3070 ... # Pages_accessed: 2003 Pages_read: 0 Pages_updated: 0 Old_rows_read: 0 # Pages_read_time: 0.0000 Engine_time: 15.3070 It looks like the slow query log has wrong data, too. So the cause is MDEV-31558 .
            marko Marko Mäkelä added a comment - - edited

            I debugged this a little deeper, this time, using ./mtr --rr and rr replay. It looks like for the last ANALYZE TABLE statement the counter is indeed being incremented that many times by buf_page_get_low(), and it is being reset at the start of the statement:

            10.6 52e7016248d3e3f80c6a912ddc0399a15d868efc

            Thread 4 hit Hardware watchpoint 6: -location hs.pages_accessed
             
            Old value = 2005
            New value = 0
            0x0000561c79cac77e in memset (__len=56, __ch=0, __dest=0x7fa0800c13b0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
            59	  return __builtin___memset_chk (__dest, __ch, __len,
            2: x/i $pc
            => 0x561c79cac77e <_ZN5TABLE4initEP3THDP10TABLE_LIST+574>:	vmovdqu %ymm0,0x48(%rdi)
            (rr) bt
            #0  0x0000561c79cac77e in memset (__len=56, __ch=0, __dest=0x7fa0800c13b0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
            #1  ha_handler_stats::reset (this=0x7fa0800c13b0) at /mariadb/10.6/sql/ha_handler_stats.h:37
            #2  handler::ha_handler_stats_reset (this=<optimized out>) at /mariadb/10.6/sql/handler.h:4873
            #3  TABLE::init (this=0x7fa0800bfb08, thd=thd@entry=0x7fa080000c68, tl=tl@entry=0x7fa08001da90) at /mariadb/10.6/sql/table.cc:5781
            #4  0x0000561c79afc922 in open_table (thd=thd@entry=0x7fa080000c68, table_list=table_list@entry=0x7fa08001da90, ot_ctx=ot_ctx@entry=0x7fa08de42910) at /mariadb/10.6/sql/sql_base.cc:2156
            #5  0x0000561c79aff55d in open_and_process_table (ot_ctx=0x7fa08de42910, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, 
                tables=0x7fa08001da90, thd=0x7fa080000c68) at /mariadb/10.6/sql/sql_base.cc:3857
            #6  open_tables (thd=thd@entry=0x7fa080000c68, options=@0x7fa0800061a8: {m_options = DDL_options_st::OPT_NONE}, start=start@entry=0x7fa08de42998, counter=counter@entry=0x7fa08de429ac, flags=flags@entry=0, 
                prelocking_strategy=prelocking_strategy@entry=0x7fa08de42a00) at /mariadb/10.6/sql/sql_base.cc:4341
            #7  0x0000561c79b00256 in open_and_lock_tables (thd=thd@entry=0x7fa080000c68, options=<optimized out>, tables=<optimized out>, tables@entry=0x7fa08001da90, derived=derived@entry=true, flags=flags@entry=0, 
                prelocking_strategy=prelocking_strategy@entry=0x7fa08de42a00) at /mariadb/10.6/sql/sql_base.cc:5314
            #8  0x0000561c79baa5b5 in open_and_lock_tables (flags=0, derived=true, tables=0x7fa08001da90, thd=0x7fa080000c68) at /mariadb/10.6/sql/sql_base.h:512
            #9  execute_sqlcom_select (thd=thd@entry=0x7fa080000c68, all_tables=0x7fa08001da90) at /mariadb/10.6/sql/sql_parse.cc:6206
            #10 0x0000561c79bb6906 in mysql_execute_command (thd=thd@entry=0x7fa080000c68, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /mariadb/10.6/sql/sql_parse.cc:3961
            #11 0x0000561c79bb7c73 in mysql_parse (thd=0x7fa080000c68, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /mariadb/10.6/sql/sql_parse.cc:8050
            #12 0x0000561c79bb9892 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fa080000c68, 
                packet=packet@entry=0x7fa08008ccf9 "analyze format=json\nSELECT SQL_NO_CACHE  `cnk`, `domain`\nFROM t1\nFORCE INDEX(domain)\nWHERE `domain` = 'be' AND status = 1 AND\n(`start_date`= '2022-07-18 00:00:00' OR `end_date` = '2022-07-18 00:00:00'"..., packet_length=packet_length@entry=214, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1995
            

            It turns out that this last statement (which will report pages_accessed as 2,003) will actually invoke row_search_mvcc() 1,001 times.

            If I set breakpoints on ha_handler_stats::reset() and row_search_mvcc(), I can see that the first statement (which reported 32 pages accessed) invoked row_search_mvcc() 10,001 times.

            The second statement with FORCE INDEX(active_epims) reports 2,005 pages accessed after invoking row_search_mvcc() 1,001 times.

            The third statement with FORCE INDEX(domain) reports 2,003 pages accessed after invoking row_search_mvcc() 1,001 times.

            I had already filed MDEV-31589 for the order-of-2 difference during secondary index lookup.

            I assume that handler::ha_index_next_same() is executing an index range scan and not repetitive secondary index lookups. It looks like InnoDB could be missing some cursor optimization here. In MDEV-31577, monty decided not to count pages accessed as a result of restoring the persistent cursor position when the shortcut is successful.

            marko Marko Mäkelä added a comment - - edited I debugged this a little deeper, this time, using ./mtr --rr and rr replay . It looks like for the last ANALYZE TABLE statement the counter is indeed being incremented that many times by buf_page_get_low() , and it is being reset at the start of the statement: 10.6 52e7016248d3e3f80c6a912ddc0399a15d868efc Thread 4 hit Hardware watchpoint 6: -location hs.pages_accessed   Old value = 2005 New value = 0 0x0000561c79cac77e in memset (__len=56, __ch=0, __dest=0x7fa0800c13b0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59 59 return __builtin___memset_chk (__dest, __ch, __len, 2: x/i $pc => 0x561c79cac77e <_ZN5TABLE4initEP3THDP10TABLE_LIST+574>: vmovdqu %ymm0,0x48(%rdi) (rr) bt #0 0x0000561c79cac77e in memset (__len=56, __ch=0, __dest=0x7fa0800c13b0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59 #1 ha_handler_stats::reset (this=0x7fa0800c13b0) at /mariadb/10.6/sql/ha_handler_stats.h:37 #2 handler::ha_handler_stats_reset (this=<optimized out>) at /mariadb/10.6/sql/handler.h:4873 #3 TABLE::init (this=0x7fa0800bfb08, thd=thd@entry=0x7fa080000c68, tl=tl@entry=0x7fa08001da90) at /mariadb/10.6/sql/table.cc:5781 #4 0x0000561c79afc922 in open_table (thd=thd@entry=0x7fa080000c68, table_list=table_list@entry=0x7fa08001da90, ot_ctx=ot_ctx@entry=0x7fa08de42910) at /mariadb/10.6/sql/sql_base.cc:2156 #5 0x0000561c79aff55d in open_and_process_table (ot_ctx=0x7fa08de42910, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fa08001da90, thd=0x7fa080000c68) at /mariadb/10.6/sql/sql_base.cc:3857 #6 open_tables (thd=thd@entry=0x7fa080000c68, options=@0x7fa0800061a8: {m_options = DDL_options_st::OPT_NONE}, start=start@entry=0x7fa08de42998, counter=counter@entry=0x7fa08de429ac, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7fa08de42a00) at /mariadb/10.6/sql/sql_base.cc:4341 #7 0x0000561c79b00256 in open_and_lock_tables (thd=thd@entry=0x7fa080000c68, options=<optimized out>, tables=<optimized out>, tables@entry=0x7fa08001da90, derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7fa08de42a00) at /mariadb/10.6/sql/sql_base.cc:5314 #8 0x0000561c79baa5b5 in open_and_lock_tables (flags=0, derived=true, tables=0x7fa08001da90, thd=0x7fa080000c68) at /mariadb/10.6/sql/sql_base.h:512 #9 execute_sqlcom_select (thd=thd@entry=0x7fa080000c68, all_tables=0x7fa08001da90) at /mariadb/10.6/sql/sql_parse.cc:6206 #10 0x0000561c79bb6906 in mysql_execute_command (thd=thd@entry=0x7fa080000c68, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /mariadb/10.6/sql/sql_parse.cc:3961 #11 0x0000561c79bb7c73 in mysql_parse (thd=0x7fa080000c68, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /mariadb/10.6/sql/sql_parse.cc:8050 #12 0x0000561c79bb9892 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fa080000c68, packet=packet@entry=0x7fa08008ccf9 "analyze format=json\nSELECT SQL_NO_CACHE `cnk`, `domain`\nFROM t1\nFORCE INDEX(domain)\nWHERE `domain` = 'be' AND status = 1 AND\n(`start_date`= '2022-07-18 00:00:00' OR `end_date` = '2022-07-18 00:00:00'"..., packet_length=packet_length@entry=214, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1995 It turns out that this last statement (which will report pages_accessed as 2,003) will actually invoke row_search_mvcc() 1,001 times. If I set breakpoints on ha_handler_stats::reset() and row_search_mvcc() , I can see that the first statement (which reported 32 pages accessed) invoked row_search_mvcc() 10,001 times. The second statement with FORCE INDEX(active_epims) reports 2,005 pages accessed after invoking row_search_mvcc() 1,001 times. The third statement with FORCE INDEX(domain) reports 2,003 pages accessed after invoking row_search_mvcc() 1,001 times. I had already filed MDEV-31589 for the order-of-2 difference during secondary index lookup. I assume that handler::ha_index_next_same() is executing an index range scan and not repetitive secondary index lookups. It looks like InnoDB could be missing some cursor optimization here. In MDEV-31577 , monty decided not to count pages accessed as a result of restoring the persistent cursor position when the shortcut is successful.

            I assume that handler::ha_index_next_same() is executing an index range scan and not repetitive secondary index lookups

            Correct. it's a single-table select.

            psergei Sergei Petrunia added a comment - I assume that handler::ha_index_next_same() is executing an index range scan and not repetitive secondary index lookups Correct. it's a single-table select.

            People

              vlad.lesin Vladislav Lesin
              oleg.smirnov Oleg Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.