Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17886

Benchmark speed of EITS ANALYZE TABLE, histogram collection

Details

    Description

      This task is for writing down the results of EITS' variant of ANALYZE TABLE . That is, we want to know whether it is expensive to collect EITS histograms. (We know it is expensive as it involves a full table scan and a Unique object but we want to know the size of the overhead)

      Attachments

        Issue Links

          Activity

            An example of EITS data load speed

            Hi, so I was messing with a big table with real customer data for an optimizer issue.
            Thought I would use that opportunity to check EITS speed.
             
             
            Table name: changed to t1 
            Table rows:  740M
            Table size on disk: 89G
            Loading data from mysqldump: 151 minutes (this is with flush_log_at_trx_commit=0, etc)
            Machine: AWS with SSD, 16G RAM, 4G buffer pool.
             
            CHECKSUM TABLE t1: 11 minutes
            ANALYZE TABLE t1 PERSISTENT FOR ALL: 1 hour 5 min
             
            I got the same time for two runs regardless of whether histogram_size=0 (so
            it doesn't collect a histogram) or histogram_size=128, DOUBLE_PREC_HB.
             
            In both cases, the server used 3.8G extra disk space at the end of ANALYZE.
             
            Take-ways:
            - Collecting a histogram is heavier than just reading the table. in our case, 6x times
            - for non-utf8 columns, the time is tolerable.
            

            psergei Sergei Petrunia added a comment - An example of EITS data load speed Hi, so I was messing with a big table with real customer data for an optimizer issue. Thought I would use that opportunity to check EITS speed.     Table name: changed to t1 Table rows: 740M Table size on disk: 89G Loading data from mysqldump: 151 minutes (this is with flush_log_at_trx_commit=0, etc) Machine: AWS with SSD, 16G RAM, 4G buffer pool.   CHECKSUM TABLE t1: 11 minutes ANALYZE TABLE t1 PERSISTENT FOR ALL: 1 hour 5 min   I got the same time for two runs regardless of whether histogram_size=0 (so it doesn't collect a histogram) or histogram_size=128, DOUBLE_PREC_HB.   In both cases, the server used 3.8G extra disk space at the end of ANALYZE.   Take-ways: - Collecting a histogram is heavier than just reading the table. in our case, 6x times - for non-utf8 columns, the time is tolerable.
            psergei Sergei Petrunia added a comment - - edited

            Comparison with MySQL 8

            TL;DR:

            • MySQL works at full table scan speed. Histogram collection is minor when compared to scanning through the entire table.
            • MariaDB is about the same when collecting histogram for one column. When collecting histograms for many columns, it's much more expensive than a full table scan.

            Machine: psergey's t480, Samsung PCI-E SSD, 16G RAM

            === MySQL 8.0 ==
            Re-run on MySQL-8.0 current, release build.
             
            Time to load: 4.43 hrs.
            Dataset size: 89G t1.ibd
            Binlog Size: ~34G.  (might have some extra data)
             
            CHECKSUM TABLE t1; -- 5 min 58 sec.
            ANALYZE TABLE t1 UPDATE HISTOGRAM ON <col1, col2, ...>; -- 6 Min
             
            ANALYZE TABLE t1 UPDATE HISTOGRAM on col1; -- 5 min 45 sec
            ANALYZE TABLE t1 UPDATE HISTOGRAM on col2; -- 6 min
            

            === MariaDB 10.3 ===
             
            Time to load: 4.08 hrs.
            Dataset size: 89G t1.ibd
             
            checksum table t1; -- 5 min 40 sec.
             
            analyze table t1 persistent for all;  -- 32 min
             
            analyze table t1 persistent for columns(col1) indexes () -- 5 min
            analyze table t1 persistent for columns(col2) indexes () -- 4 min
            

            psergei Sergei Petrunia added a comment - - edited Comparison with MySQL 8 TL;DR: MySQL works at full table scan speed. Histogram collection is minor when compared to scanning through the entire table. MariaDB is about the same when collecting histogram for one column. When collecting histograms for many columns, it's much more expensive than a full table scan. Machine: psergey's t480, Samsung PCI-E SSD, 16G RAM === MySQL 8.0 == Re-run on MySQL-8.0 current, release build.   Time to load: 4.43 hrs. Dataset size: 89G t1.ibd Binlog Size: ~34G. (might have some extra data)   CHECKSUM TABLE t1; -- 5 min 58 sec. ANALYZE TABLE t1 UPDATE HISTOGRAM ON <col1, col2, ...>; -- 6 Min   ANALYZE TABLE t1 UPDATE HISTOGRAM on col1; -- 5 min 45 sec ANALYZE TABLE t1 UPDATE HISTOGRAM on col2; -- 6 min === MariaDB 10.3 ===   Time to load: 4.08 hrs. Dataset size: 89G t1.ibd   checksum table t1; -- 5 min 40 sec.   analyze table t1 persistent for all; -- 32 min   analyze table t1 persistent for columns(col1) indexes () -- 5 min analyze table t1 persistent for columns(col2) indexes () -- 4 min
            varun Varun Gupta (Inactive) added a comment - - edited

            MariaDB 10.4

            MariaDB [test]> checksum table t1;
               +----------------------+------------+
            | Table                | Checksum   |
            +----------------------+------------+
            | test.t1              | 2149892167 |
            +----------------------+------------+
            1 row in set (8 min 59.812 sec)
            

            Without sampling

            MariaDB [test]> analyze table t1  persistent for columns(taskid) indexes ();
            +----------------------+---------+----------+-----------------------------------------+
            | Table                | Op      | Msg_type | Msg_text                                |
            +----------------------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +----------------------+---------+----------+-----------------------------------------+
            2 rows in set (8 min 17.281 sec)
             
            MariaDB [test]> analyze table t1  persistent for columns(subid) indexes ();
            +----------------------+---------+----------+-----------------------------------------+
            | Table                | Op      | Msg_type | Msg_text                                |
            +----------------------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +----------------------+---------+----------+-----------------------------------------+
            2 rows in set (6 min 4.782 sec)
             
            MariaDB [test]> analyze table t1  persistent for all;
            +----------------------+---------+----------+-----------------------------------------+
            | Table                | Op      | Msg_type | Msg_text                                |
            +----------------------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +----------------------+---------+----------+-----------------------------------------+
            2 rows in set (45 min 2.379 sec)
            

            With sampling

            MariaDB [test]> analyze table t1  persistent for columns(taskid) indexes ();
            +----------------------+---------+----------+-----------------------------------------+
            | Table                | Op      | Msg_type | Msg_text                                |
            +----------------------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +----------------------+---------+----------+-----------------------------------------+
            2 rows in set (4 min 48.113 sec)
             
            MariaDB [test]> analyze table t1  persistent for all;
            +----------------------+---------+----------+-----------------------------------------+
            | Table                | Op      | Msg_type | Msg_text                                |
            +----------------------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +----------------------+---------+----------+-----------------------------------------+
            2 rows in set (24 min 27.986 sec)
             
            MariaDB [test]> analyze table t1  persistent for columns(subid) indexes ();
            +----------------------+---------+----------+-----------------------------------------+
            | Table                | Op      | Msg_type | Msg_text                                |
            +----------------------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +----------------------+---------+----------+-----------------------------------------+
            2 rows in set (5 min 54.834 sec)
             
            MariaDB [test]> ANALYZE TABLE t1 persistent for columns( col1, col2, col3, col4, col5, col6, col7, col8) indexes ();
            +---------+---------+----------+-----------------------------------------+
            | Table   | Op      | Msg_type | Msg_text                                |
            +---------+---------+----------+-----------------------------------------+
            | test.t1 | analyze | status   | Engine-independent statistics collected |
            | test.t1 | analyze | status   | OK                                      |
            +---------+---------+----------+-----------------------------------------+
            2 rows in set (6 min 48.440 sec)
            

            varun Varun Gupta (Inactive) added a comment - - edited MariaDB 10.4 MariaDB [test]> checksum table t1; +----------------------+------------+ | Table | Checksum | +----------------------+------------+ | test.t1 | 2149892167 | +----------------------+------------+ 1 row in set (8 min 59.812 sec) Without sampling MariaDB [test]> analyze table t1 persistent for columns(taskid) indexes (); +----------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +----------------------+---------+----------+-----------------------------------------+ 2 rows in set (8 min 17.281 sec)   MariaDB [test]> analyze table t1 persistent for columns(subid) indexes (); +----------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +----------------------+---------+----------+-----------------------------------------+ 2 rows in set (6 min 4.782 sec)   MariaDB [test]> analyze table t1 persistent for all; +----------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +----------------------+---------+----------+-----------------------------------------+ 2 rows in set (45 min 2.379 sec) With sampling MariaDB [test]> analyze table t1 persistent for columns(taskid) indexes (); +----------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +----------------------+---------+----------+-----------------------------------------+ 2 rows in set (4 min 48.113 sec)   MariaDB [test]> analyze table t1 persistent for all; +----------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +----------------------+---------+----------+-----------------------------------------+ 2 rows in set (24 min 27.986 sec)   MariaDB [test]> analyze table t1 persistent for columns(subid) indexes (); +----------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +----------------------+---------+----------+-----------------------------------------+ 2 rows in set (5 min 54.834 sec)   MariaDB [test]> ANALYZE TABLE t1 persistent for columns( col1, col2, col3, col4, col5, col6, col7, col8) indexes (); +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ 2 rows in set (6 min 48.440 sec)

            histogram benchmarks mysql

            histogram benchmarks mysql
             
            mysql> CHECKSUM TABLE t1;
            +---------+------------+
            | Table   | Checksum   |
            +---------+------------+
            | test.t1 | 1553974534 |
            +---------+------------+
            1 row in set (7 min 58.96 sec)
             
             
            mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1, col2, col3, col4, col5, col6, col7, col8; +---------+-----------+----------+-------------------------------------------------+
            | Table   | Op        | Msg_type | Msg_text                                        |
            +---------+-----------+----------+-------------------------------------------------+
            | test.t1 | histogram | status   | Histogram statistics created for column 'col3'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col8'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col4'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col5'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col2'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col1'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col6'. |
            | test.t1 | histogram | status   | Histogram statistics created for column 'col7'. |
            +---------+-----------+----------+-------------------------------------------------+
            8 rows in set (8 min 38.69 sec)
             
            mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1;
            +---------+-----------+----------+---------------------------------------------------+
            | Table   | Op        | Msg_type | Msg_text                                          |
            +---------+-----------+----------+---------------------------------------------------+
            | test.t1 | histogram | status   | Histogram statistics created for column 'col1'.   |
            +---------+-----------+----------+---------------------------------------------------+
            1 row in set (7 min 24.30 sec)
             
            mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col2;
            +---------+-----------+----------+--------------------------------------------------+
            | Table   | Op        | Msg_type | Msg_text                                         |
            +---------+-----------+----------+--------------------------------------------------+
            | test.t1 | histogram | status   | Histogram statistics created for column 'col2'.  |
            +---------+-----------+----------+--------------------------------------------------+
            1 row in set (7 min 29.38 sec)
            

            varun Varun Gupta (Inactive) added a comment - histogram benchmarks mysql histogram benchmarks mysql   mysql> CHECKSUM TABLE t1; +---------+------------+ | Table | Checksum | +---------+------------+ | test.t1 | 1553974534 | +---------+------------+ 1 row in set (7 min 58.96 sec)     mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1, col2, col3, col4, col5, col6, col7, col8; +---------+-----------+----------+-------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+-----------+----------+-------------------------------------------------+ | test.t1 | histogram | status | Histogram statistics created for column 'col3'. | | test.t1 | histogram | status | Histogram statistics created for column 'col8'. | | test.t1 | histogram | status | Histogram statistics created for column 'col4'. | | test.t1 | histogram | status | Histogram statistics created for column 'col5'. | | test.t1 | histogram | status | Histogram statistics created for column 'col2'. | | test.t1 | histogram | status | Histogram statistics created for column 'col1'. | | test.t1 | histogram | status | Histogram statistics created for column 'col6'. | | test.t1 | histogram | status | Histogram statistics created for column 'col7'. | +---------+-----------+----------+-------------------------------------------------+ 8 rows in set (8 min 38.69 sec)   mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1; +---------+-----------+----------+---------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+-----------+----------+---------------------------------------------------+ | test.t1 | histogram | status | Histogram statistics created for column 'col1'. | +---------+-----------+----------+---------------------------------------------------+ 1 row in set (7 min 24.30 sec)   mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col2; +---------+-----------+----------+--------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+-----------+----------+--------------------------------------------------+ | test.t1 | histogram | status | Histogram statistics created for column 'col2'. | +---------+-----------+----------+--------------------------------------------------+ 1 row in set (7 min 29.38 sec)
            psergei Sergei Petrunia added a comment - - edited

            Summarizing Varun's results.

            MariaDB 10.4

            checksum table t1; -- 9 min
             
            = Without sampling ==
            analyze table t1 persistent for columns(col1) indexes (); - 8 min 17 sec
            analyze table t1  persistent for columns(col2) indexes (); - 6 min 4 sec
            analyze table t1  persistent for all; - 45 min
             
            == With sampling, automatic sample size ==
            analyze table t1  persistent for columns(col1) indexes (); - 4 min 48 sec
            analyze table t1  persistent for columns(col2) indexes ();  - 5 min 54 sec
            analyze table t1 persistent for columns(col1, col2, col3, col4, 
              col5, col6, col7, col8) indexes ();  - 6 min 48 sec
             
            analyze table t1  persistent for all;                        - 24 min 30 sec
            

            MySQL 8.0

            CHECKSUM TABLE t1;                              --  7 min 58 sec
             
            ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1, col2, 
              col3, col4, col5, col6, col7, col8;            -- 8 min 38 sec
            ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1;       -- 7 min 24 sec
            ANALYZE TABLE t1 UPDATE HISTOGRAM ON col2;       -- 7 min 29 sec
            

            psergei Sergei Petrunia added a comment - - edited Summarizing Varun's results. MariaDB 10.4 checksum table t1; -- 9 min   = Without sampling == analyze table t1 persistent for columns(col1) indexes (); - 8 min 17 sec analyze table t1 persistent for columns(col2) indexes (); - 6 min 4 sec analyze table t1 persistent for all; - 45 min   == With sampling, automatic sample size == analyze table t1 persistent for columns(col1) indexes (); - 4 min 48 sec analyze table t1 persistent for columns(col2) indexes (); - 5 min 54 sec analyze table t1 persistent for columns(col1, col2, col3, col4, col5, col6, col7, col8) indexes (); - 6 min 48 sec   analyze table t1 persistent for all; - 24 min 30 sec MySQL 8.0 CHECKSUM TABLE t1; -- 7 min 58 sec   ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1, col2, col3, col4, col5, col6, col7, col8; -- 8 min 38 sec ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1; -- 7 min 24 sec ANALYZE TABLE t1 UPDATE HISTOGRAM ON col2; -- 7 min 29 sec

            Takeaways:

            • With sampling, the speed of ANALYZE .. PERSISTENT FOR COLUMNS in MariaDB becomes the same as the speed of full table scan.
              ANALYZE ... PERSISTENT FOR INDEXES(...) can be more expensive due to full index scans that are performed to collect index statistics.
              That is, ANALYZE... PERSISTENT FOR ALL () can be noticeably slower than just doing ANALYZE for columns.
            psergei Sergei Petrunia added a comment - Takeaways: With sampling, the speed of ANALYZE .. PERSISTENT FOR COLUMNS in MariaDB becomes the same as the speed of full table scan. ANALYZE ... PERSISTENT FOR INDEXES(...) can be more expensive due to full index scans that are performed to collect index statistics. That is, ANALYZE... PERSISTENT FOR ALL () can be noticeably slower than just doing ANALYZE for columns.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.