Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Done
Description
This task is for writing down the results of EITS' variant of ANALYZE TABLE . That is, we want to know whether it is expensive to collect EITS histograms. (We know it is expensive as it involves a full table scan and a Unique object but we want to know the size of the overhead)
Attachments
Issue Links
- relates to
-
MDEV-18758 Test histograms precision
-
- Open
-
Activity
Comparison with MySQL 8
TL;DR:
- MySQL works at full table scan speed. Histogram collection is minor when compared to scanning through the entire table.
- MariaDB is about the same when collecting histogram for one column. When collecting histograms for many columns, it's much more expensive than a full table scan.
Machine: psergey's t480, Samsung PCI-E SSD, 16G RAM
=== MySQL 8.0 ==
|
Re-run on MySQL-8.0 current, release build.
|
|
Time to load: 4.43 hrs.
|
Dataset size: 89G t1.ibd
|
Binlog Size: ~34G. (might have some extra data)
|
|
CHECKSUM TABLE t1; -- 5 min 58 sec.
|
ANALYZE TABLE t1 UPDATE HISTOGRAM ON <col1, col2, ...>; -- 6 Min
|
|
ANALYZE TABLE t1 UPDATE HISTOGRAM on col1; -- 5 min 45 sec
|
ANALYZE TABLE t1 UPDATE HISTOGRAM on col2; -- 6 min
|
=== MariaDB 10.3 ===
|
|
Time to load: 4.08 hrs.
|
Dataset size: 89G t1.ibd
|
|
checksum table t1; -- 5 min 40 sec.
|
|
analyze table t1 persistent for all; -- 32 min
|
|
analyze table t1 persistent for columns(col1) indexes () -- 5 min
|
analyze table t1 persistent for columns(col2) indexes () -- 4 min
|
MariaDB 10.4
MariaDB [test]> checksum table t1;
|
+----------------------+------------+
|
| Table | Checksum |
|
+----------------------+------------+
|
| test.t1 | 2149892167 |
|
+----------------------+------------+
|
1 row in set (8 min 59.812 sec)
|
Without sampling
MariaDB [test]> analyze table t1 persistent for columns(taskid) indexes ();
|
+----------------------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+----------------------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+----------------------+---------+----------+-----------------------------------------+
|
2 rows in set (8 min 17.281 sec)
|
|
MariaDB [test]> analyze table t1 persistent for columns(subid) indexes ();
|
+----------------------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+----------------------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+----------------------+---------+----------+-----------------------------------------+
|
2 rows in set (6 min 4.782 sec)
|
|
MariaDB [test]> analyze table t1 persistent for all;
|
+----------------------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+----------------------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+----------------------+---------+----------+-----------------------------------------+
|
2 rows in set (45 min 2.379 sec)
|
With sampling
MariaDB [test]> analyze table t1 persistent for columns(taskid) indexes ();
|
+----------------------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+----------------------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+----------------------+---------+----------+-----------------------------------------+
|
2 rows in set (4 min 48.113 sec)
|
|
MariaDB [test]> analyze table t1 persistent for all;
|
+----------------------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+----------------------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+----------------------+---------+----------+-----------------------------------------+
|
2 rows in set (24 min 27.986 sec)
|
|
MariaDB [test]> analyze table t1 persistent for columns(subid) indexes ();
|
+----------------------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+----------------------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+----------------------+---------+----------+-----------------------------------------+
|
2 rows in set (5 min 54.834 sec)
|
|
MariaDB [test]> ANALYZE TABLE t1 persistent for columns( col1, col2, col3, col4, col5, col6, col7, col8) indexes ();
|
+---------+---------+----------+-----------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+---------+---------+----------+-----------------------------------------+
|
| test.t1 | analyze | status | Engine-independent statistics collected |
|
| test.t1 | analyze | status | OK |
|
+---------+---------+----------+-----------------------------------------+
|
2 rows in set (6 min 48.440 sec)
|
histogram benchmarks mysql
histogram benchmarks mysql
|
|
mysql> CHECKSUM TABLE t1;
|
+---------+------------+
|
| Table | Checksum |
|
+---------+------------+
|
| test.t1 | 1553974534 |
|
+---------+------------+
|
1 row in set (7 min 58.96 sec)
|
|
|
mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1, col2, col3, col4, col5, col6, col7, col8; +---------+-----------+----------+-------------------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+---------+-----------+----------+-------------------------------------------------+
|
| test.t1 | histogram | status | Histogram statistics created for column 'col3'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col8'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col4'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col5'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col2'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col1'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col6'. |
|
| test.t1 | histogram | status | Histogram statistics created for column 'col7'. |
|
+---------+-----------+----------+-------------------------------------------------+
|
8 rows in set (8 min 38.69 sec)
|
|
mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1;
|
+---------+-----------+----------+---------------------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+---------+-----------+----------+---------------------------------------------------+
|
| test.t1 | histogram | status | Histogram statistics created for column 'col1'. |
|
+---------+-----------+----------+---------------------------------------------------+
|
1 row in set (7 min 24.30 sec)
|
|
mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON col2;
|
+---------+-----------+----------+--------------------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
+---------+-----------+----------+--------------------------------------------------+
|
| test.t1 | histogram | status | Histogram statistics created for column 'col2'. |
|
+---------+-----------+----------+--------------------------------------------------+
|
1 row in set (7 min 29.38 sec)
|
Summarizing Varun's results.
MariaDB 10.4
checksum table t1; -- 9 min
|
|
= Without sampling ==
|
analyze table t1 persistent for columns(col1) indexes (); - 8 min 17 sec
|
analyze table t1 persistent for columns(col2) indexes (); - 6 min 4 sec
|
analyze table t1 persistent for all; - 45 min
|
|
== With sampling, automatic sample size ==
|
analyze table t1 persistent for columns(col1) indexes (); - 4 min 48 sec
|
analyze table t1 persistent for columns(col2) indexes (); - 5 min 54 sec
|
analyze table t1 persistent for columns(col1, col2, col3, col4,
|
col5, col6, col7, col8) indexes (); - 6 min 48 sec
|
|
analyze table t1 persistent for all; - 24 min 30 sec
|
MySQL 8.0
CHECKSUM TABLE t1; -- 7 min 58 sec
|
|
ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1, col2,
|
col3, col4, col5, col6, col7, col8; -- 8 min 38 sec
|
ANALYZE TABLE t1 UPDATE HISTOGRAM ON col1; -- 7 min 24 sec
|
ANALYZE TABLE t1 UPDATE HISTOGRAM ON col2; -- 7 min 29 sec
|
Takeaways:
- With sampling, the speed of ANALYZE .. PERSISTENT FOR COLUMNS in MariaDB becomes the same as the speed of full table scan.
ANALYZE ... PERSISTENT FOR INDEXES(...) can be more expensive due to full index scans that are performed to collect index statistics.
That is, ANALYZE... PERSISTENT FOR ALL () can be noticeably slower than just doing ANALYZE for columns.
An example of EITS data load speed
Hi, so I was messing with a big table with real customer data for an optimizer issue.
Thought I would use that opportunity to check EITS speed.
Table name: changed to t1
Table rows: 740M
Table size on disk: 89G
Loading data from mysqldump: 151 minutes (this is with flush_log_at_trx_commit=0, etc)
Machine: AWS with SSD, 16G RAM, 4G buffer pool.
CHECKSUM TABLE t1: 11 minutes
ANALYZE TABLE t1 PERSISTENT FOR ALL: 1 hour 5 min
I got the same time for two runs regardless of whether histogram_size=0 (so
it doesn't collect a histogram) or histogram_size=128, DOUBLE_PREC_HB.
In both cases, the server used 3.8G extra disk space at the end of ANALYZE.
Take-ways:
- Collecting a histogram is heavier than just reading the table. in our case, 6x times
- for non-utf8 columns, the time is tolerable.