Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34190

ANALYZE: r_engine_stats shows unrealistically low pages_read_count

Details

    Description

      Consider an attached testcase mdev34190-test.diff

      It creates a table with 160K rows, each row being about 1K.
      This should take about 10K InnoDB disk pages (assuming page fill ratio=100%).

      Then, I restart the server, and do an ANALYZE for a full table scan.

      I take care to prevent any other activity:

      • Disable buffer pool save/load
      • Disable background statistics collection
      • Following Marko's advise, use innodb_fast_shutdown=0

      I get:

      show status like 'innodb_pages_read';
      Variable_name   Value
      Innodb_pages_read       143
      analyze format=json 
      select * from t1 force index (PRIMARY) 
      order by a asc, b asc, c asc, d asc;
      ANALYZE
      

      {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 910.051,
          "table": {
            "table_name": "t1",
            "access_type": "index",
            "key": "PRIMARY",
            "key_length": "1028",
            "used_key_parts": ["a", "b", "c", "d"],
            "r_loops": 1,
            "rows": 1,
            "r_rows": 163840,
            "r_table_time_ms": 794.313,
            "r_other_time_ms": 115.711,
            "r_engine_stats": {
              "pages_accessed": 11708,
              "pages_read_count": 271,
              "pages_read_time_ms": 13.03
            },
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          }
        }
      }
      

      only 271 pages were read? which took just 13 ms of 794 ms spent in the engine?

      show status like 'innodb_pages_read';
      Variable_name   Value
      Innodb_pages_read       12112
      

      Note that Innodb_pages_read shows a more reasonable number of 12K.

      If I get the same query to do a backward scan by using ORDER BY DESC, I get a more reasonable numbers:

      analyze format=json 
      select * from t1 force index (PRIMARY) 
      order by a desc, b desc, c desc, d desc;
      

      ANALYZE
      {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 1129.6,
          "table": {
            "table_name": "t1",
            "access_type": "index",
            "key": "PRIMARY",
            "key_length": "1028",
            "used_key_parts": ["a", "b", "c", "d"],
            "r_loops": 1,
            "rows": 1,
            "r_rows": 163841,
            "r_table_time_ms": 1036.171,
            "r_other_time_ms": 93.424,
            "r_engine_stats": {
              "pages_accessed": 11708,
              "pages_read_count": 11707,
              "pages_read_time_ms": 315.503
            },
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          }
        }
      }
      

      Attachments

        Issue Links

          Activity

            Looking for occurrences where Innodb_pages_read gets incremented but mariadb_increment_pages_read() is not incremented...

            I find this:

            (gdb) wher
              #0  buf_page_init_for_read (mode=132, page_id={m_id = 21474836498}, zip_size=0, unzip=false) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/storage/innobase/buf/buf0rea.
              #1  0x00005555569caec4 in buf_read_page_low (space=0x7fffcc047ab8, sync=false, mode=132, page_id={m_id = 21474836498}, zip_size=0, unzip=false) at /home/psergey/dev-git2/10.6-mdev3412
              #2  0x00005555569cbf60 in buf_read_ahead_linear (page_id={m_id = 21474836495}, zip_size=0, ibuf=false) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/storage/innobase/bu
              #3  0x0000555556983d7f in btr_pcur_move_to_next_page (cursor=0x7fffcc02eb30, mtr=0x7fffea013af0) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/storage/innobase/btr/btr0
              #4  0x00005555568b2ae5 in row_search_mvcc (buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409
              #5  0x00005555566ab55b in ha_innobase::general_fetch (this=0x7fffcc02b240, buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcd
              #6  0x00005555566ab7e2 in ha_innobase::index_next (this=0x7fffcc02b240, buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcded-
              #7  0x0000555556281ba6 in handler::ha_index_next (this=0x7fffcc02b240, buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcded-1
              #8  0x0000555555f5d78d in join_read_next (info=0x7fffcc01b6e8) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/sql/sql_select.cc:22990
              #9  0x0000555555e11fb9 in READ_RECORD::read_record (this=0x7fffcc01b6e8) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/sql/records.h:81
            

            buf_read_page_low() calls mariadb_increment_pages_read().
            But it does it only when called with sync=true.

            buf_read_ahead_linear() calls buf_read_page_low(...sync=false... ), so nothing gets incremented.

            psergei Sergei Petrunia added a comment - Looking for occurrences where Innodb_pages_read gets incremented but mariadb_increment_pages_read() is not incremented... I find this: (gdb) wher #0 buf_page_init_for_read (mode=132, page_id={m_id = 21474836498}, zip_size=0, unzip=false) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/storage/innobase/buf/buf0rea. #1 0x00005555569caec4 in buf_read_page_low (space=0x7fffcc047ab8, sync=false, mode=132, page_id={m_id = 21474836498}, zip_size=0, unzip=false) at /home/psergey/dev-git2/10.6-mdev3412 #2 0x00005555569cbf60 in buf_read_ahead_linear (page_id={m_id = 21474836495}, zip_size=0, ibuf=false) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/storage/innobase/bu #3 0x0000555556983d7f in btr_pcur_move_to_next_page (cursor=0x7fffcc02eb30, mtr=0x7fffea013af0) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/storage/innobase/btr/btr0 #4 0x00005555568b2ae5 in row_search_mvcc (buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409 #5 0x00005555566ab55b in ha_innobase::general_fetch (this=0x7fffcc02b240, buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcd #6 0x00005555566ab7e2 in ha_innobase::index_next (this=0x7fffcc02b240, buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcded- #7 0x0000555556281ba6 in handler::ha_index_next (this=0x7fffcc02b240, buf=0x7fffcc02c400 "\374\060\071bdcded-142a-11ef-a9a6-00d86110704409bdcded-142a-11ef-a9a6-00d86110704409bdcded-1 #8 0x0000555555f5d78d in join_read_next (info=0x7fffcc01b6e8) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/sql/sql_select.cc:22990 #9 0x0000555555e11fb9 in READ_RECORD::read_record (this=0x7fffcc01b6e8) at /home/psergey/dev-git2/10.6-mdev34125-fix-analyze-timings/sql/records.h:81 buf_read_page_low() calls mariadb_increment_pages_read(). But it does it only when called with sync=true. buf_read_ahead_linear() calls buf_read_page_low(...sync=false... ), so nothing gets incremented.

            It might not be trivial to track the time it took to do async reads. If we did 8 async disk reads in 1 ms, do we count it as 8ms or 1ms?

            But counting pages that prefetch code has read seems much easier.
            I did mdev34190-poc1.diff
            with which I get

            analyze format=json 
            select * from t1 force index (PRIMARY) 
            order by a asc, b asc, c asc, d asc;
            ANALYZE
            {
              "query_block": {
                "select_id": 1,
                "r_loops": 1,
                "r_total_time_ms": 873.152,
                "table": {
                  "table_name": "t1",
                  "access_type": "index",
                  "key": "PRIMARY",
                  "key_length": "1028",
                  "used_key_parts": ["a", "b", "c", "d"],
                  "r_loops": 1,
                  "rows": 1,
                  "r_rows": 163841,
                  "r_table_time_ms": 761.335,
                  "r_other_time_ms": 111.789,
                  "r_engine_stats": {
                    "pages_accessed": 11708,
                    "pages_read_count": 255,
                    "pages_prefetch_read_count": 11695,
                    "pages_read_time_ms": 11.218
                  },
                  "filtered": 100,
                  "r_filtered": 100,
                  "using_index": true
                }
              }
            }
            

            Note the

                    "pages_prefetch_read_count": 11695,
            

            Need to discuss this with monty and marko.

            psergei Sergei Petrunia added a comment - It might not be trivial to track the time it took to do async reads. If we did 8 async disk reads in 1 ms, do we count it as 8ms or 1ms? But counting pages that prefetch code has read seems much easier. I did mdev34190-poc1.diff with which I get analyze format=json select * from t1 force index (PRIMARY) order by a asc, b asc, c asc, d asc; ANALYZE { "query_block" : { "select_id" : 1, "r_loops" : 1, "r_total_time_ms" : 873.152, "table" : { "table_name" : "t1" , "access_type" : "index" , "key" : "PRIMARY" , "key_length" : "1028" , "used_key_parts" : [ "a" , "b" , "c" , "d" ], "r_loops" : 1, "rows" : 1, "r_rows" : 163841, "r_table_time_ms" : 761.335, "r_other_time_ms" : 111.789, "r_engine_stats" : { "pages_accessed" : 11708, "pages_read_count" : 255, "pages_prefetch_read_count" : 11695, "pages_read_time_ms" : 11.218 }, "filtered" : 100, "r_filtered" : 100, "using_index" : true } } } Note the "pages_prefetch_read_count": 11695, Need to discuss this with monty and marko .
            psergei Sergei Petrunia added a comment - - edited

            Input from Marko: the patch covers buf_read_ahead_linear but there's also buf_read_ahead_random... I probably need to take it into account in the patch..

            psergei Sergei Petrunia added a comment - - edited Input from Marko: the patch covers buf_read_ahead_linear but there's also buf_read_ahead_random ... I probably need to take it into account in the patch..
            psergei Sergei Petrunia added a comment - - edited

            commit be6b09e6caac9f6407bdb02379babba69029f15b (HEAD -> bb-10.6-MDEV-34190, origin/bb-10.6-MDEV-34190)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Mon May 27 17:00:12 2024 +0300
             
                MDEV-34190: r_engine_stats.pages_read_count is unrealistically low
                
            

            Need a review input for this.

            Created https://github.com/MariaDB/server/pull/3304 to get it.

            psergei Sergei Petrunia added a comment - - edited commit be6b09e6caac9f6407bdb02379babba69029f15b (HEAD -> bb-10.6-MDEV-34190, origin/bb-10.6-MDEV-34190) Author: Sergei Petrunia <sergey@mariadb.com> Date: Mon May 27 17:00:12 2024 +0300   MDEV-34190: r_engine_stats.pages_read_count is unrealistically low Need a review input for this. Created https://github.com/MariaDB/server/pull/3304 to get it.
            monty Michael Widenius added a comment - - edited

            Starting review

            monty Michael Widenius added a comment - - edited Starting review

            Patch ok. One small thing that needs to be checked.

            monty Michael Widenius added a comment - Patch ok. One small thing that needs to be checked.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.