[MDEV-32286] ANALYZE displays a huge number of InnoDB secondary index pages_accessed Created: 2023-09-29  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Critical
Reporter: Oleg Smirnov Assignee: Vladislav Lesin
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: File innodb-pages-accessed.test    
Issue Links:
Relates
relates to MDEV-29151 Index choice depends on index definit... Closed
relates to MDEV-31577 Make ANALYZE FORMAT=JSON print innodb... Closed
relates to MDEV-31589 Secondary index MVCC access is unnece... Stalled

 Description   

When a table is accessed by an index, "pages_accessed" in the "r_engine_stats" of ANALYZE FORMAT=JSON output displays suspiciously large numbers. See the test case attached and compare pages_accessed=14 for full table scan against pages_accessed=2003 for an access by index.



 Comments   
Comment by Marko Mäkelä [ 2023-09-29 ]

I think that we need mtr test coverage of pages_accessed, starting from something simple, such as ANALYZE FORMAT=JSON SELECT * FROM empty_table, which should access 1 page. When I was debugging the test case that I posted to MDEV-29151 yesterday, I noticed that SELECT * FROM t1 LIMIT 1 would incorrectly report accessing 3 pages, although InnoDB only incremented the thread-local counter by 1.

Comment by Sergei Petrunia [ 2023-09-29 ]

Is it ANALYZE that causes the problem? Or its "source" data that is printed into the slow query log?

I take the attached testcase, add printing to slow query log:

--- /tmp/innodb-pages-accessed.test     2023-09-29 12:29:11.591509426 +0300
+++ main/_a1.test       2023-09-29 11:51:57.407542018 +0300
@@ -19,6 +19,8 @@
 insert into domains values (1, 'ab'),(2, 'ac'),(3, 'ad'), (4, 'ba'), (5, 'bb'),
   (6, 'be'), (7, 'ca'), (8, 'cb'), (9, 'cc'), (10, 'cd');
 
+set log_slow_verbosity=engine;
+set long_query_time=0;
 --echo # Creating initial dataset...
 set unique_checks=0, foreign_key_checks=0;
 insert into t1

run the test and see

# Pages_accessed: 32  Pages_read: 0  Pages_updated: 0  Old_rows_read: 0
# Pages_read_time: 0.0000  Engine_time: 31.0940
...
# Pages_accessed: 2003  Pages_read: 0  Pages_updated: 0  Old_rows_read: 0
# Pages_read_time: 0.0000  Engine_time: 15.3070
...
# Pages_accessed: 2003  Pages_read: 0  Pages_updated: 0  Old_rows_read: 0
# Pages_read_time: 0.0000  Engine_time: 15.3070

It looks like the slow query log has wrong data, too. So the cause is MDEV-31558.

Comment by Marko Mäkelä [ 2023-09-29 ]

I debugged this a little deeper, this time, using ./mtr --rr and rr replay. It looks like for the last ANALYZE TABLE statement the counter is indeed being incremented that many times by buf_page_get_low(), and it is being reset at the start of the statement:

10.6 52e7016248d3e3f80c6a912ddc0399a15d868efc

Thread 4 hit Hardware watchpoint 6: -location hs.pages_accessed
 
Old value = 2005
New value = 0
0x0000561c79cac77e in memset (__len=56, __ch=0, __dest=0x7fa0800c13b0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
59	  return __builtin___memset_chk (__dest, __ch, __len,
2: x/i $pc
=> 0x561c79cac77e <_ZN5TABLE4initEP3THDP10TABLE_LIST+574>:	vmovdqu %ymm0,0x48(%rdi)
(rr) bt
#0  0x0000561c79cac77e in memset (__len=56, __ch=0, __dest=0x7fa0800c13b0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#1  ha_handler_stats::reset (this=0x7fa0800c13b0) at /mariadb/10.6/sql/ha_handler_stats.h:37
#2  handler::ha_handler_stats_reset (this=<optimized out>) at /mariadb/10.6/sql/handler.h:4873
#3  TABLE::init (this=0x7fa0800bfb08, thd=thd@entry=0x7fa080000c68, tl=tl@entry=0x7fa08001da90) at /mariadb/10.6/sql/table.cc:5781
#4  0x0000561c79afc922 in open_table (thd=thd@entry=0x7fa080000c68, table_list=table_list@entry=0x7fa08001da90, ot_ctx=ot_ctx@entry=0x7fa08de42910) at /mariadb/10.6/sql/sql_base.cc:2156
#5  0x0000561c79aff55d in open_and_process_table (ot_ctx=0x7fa08de42910, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, 
    tables=0x7fa08001da90, thd=0x7fa080000c68) at /mariadb/10.6/sql/sql_base.cc:3857
#6  open_tables (thd=thd@entry=0x7fa080000c68, options=@0x7fa0800061a8: {m_options = DDL_options_st::OPT_NONE}, start=start@entry=0x7fa08de42998, counter=counter@entry=0x7fa08de429ac, flags=flags@entry=0, 
    prelocking_strategy=prelocking_strategy@entry=0x7fa08de42a00) at /mariadb/10.6/sql/sql_base.cc:4341
#7  0x0000561c79b00256 in open_and_lock_tables (thd=thd@entry=0x7fa080000c68, options=<optimized out>, tables=<optimized out>, tables@entry=0x7fa08001da90, derived=derived@entry=true, flags=flags@entry=0, 
    prelocking_strategy=prelocking_strategy@entry=0x7fa08de42a00) at /mariadb/10.6/sql/sql_base.cc:5314
#8  0x0000561c79baa5b5 in open_and_lock_tables (flags=0, derived=true, tables=0x7fa08001da90, thd=0x7fa080000c68) at /mariadb/10.6/sql/sql_base.h:512
#9  execute_sqlcom_select (thd=thd@entry=0x7fa080000c68, all_tables=0x7fa08001da90) at /mariadb/10.6/sql/sql_parse.cc:6206
#10 0x0000561c79bb6906 in mysql_execute_command (thd=thd@entry=0x7fa080000c68, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /mariadb/10.6/sql/sql_parse.cc:3961
#11 0x0000561c79bb7c73 in mysql_parse (thd=0x7fa080000c68, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /mariadb/10.6/sql/sql_parse.cc:8050
#12 0x0000561c79bb9892 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fa080000c68, 
    packet=packet@entry=0x7fa08008ccf9 "analyze format=json\nSELECT SQL_NO_CACHE  `cnk`, `domain`\nFROM t1\nFORCE INDEX(domain)\nWHERE `domain` = 'be' AND status = 1 AND\n(`start_date`= '2022-07-18 00:00:00' OR `end_date` = '2022-07-18 00:00:00'"..., packet_length=packet_length@entry=214, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1995

It turns out that this last statement (which will report pages_accessed as 2,003) will actually invoke row_search_mvcc() 1,001 times.

If I set breakpoints on ha_handler_stats::reset() and row_search_mvcc(), I can see that the first statement (which reported 32 pages accessed) invoked row_search_mvcc() 10,001 times.

The second statement with FORCE INDEX(active_epims) reports 2,005 pages accessed after invoking row_search_mvcc() 1,001 times.

The third statement with FORCE INDEX(domain) reports 2,003 pages accessed after invoking row_search_mvcc() 1,001 times.

I had already filed MDEV-31589 for the order-of-2 difference during secondary index lookup.

I assume that handler::ha_index_next_same() is executing an index range scan and not repetitive secondary index lookups. It looks like InnoDB could be missing some cursor optimization here. In MDEV-31577, monty decided not to count pages accessed as a result of restoring the persistent cursor position when the shortcut is successful.

Comment by Sergei Petrunia [ 2023-10-03 ]

I assume that handler::ha_index_next_same() is executing an index range scan and not repetitive secondary index lookups

Correct. it's a single-table select.

Generated at Thu Feb 08 10:30:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.