[MDEV-17255] New optimizer defaults and ANALYZE TABLE Created: 2018-09-20  Updated: 2018-12-11  Resolved: 2018-12-11

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.4.1

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-15253 Default optimizer setting changes for... Closed

 Description   

We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY

One of the effects of the new settings is that statement like

ANALYZE TABLE t1

after MDEV-15253 will start to collect EITS stats.

This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

However, it is not appropriate for production uses: If ANALYZE TABLE t1 collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

Solution 1: Make ANALYZE TABLE t1 not collect EITS stats

  • We will need to rollback all of the changes to .result files in MDEV-15253.
  • EITS will have few test coverage.

Solution 2: Make ANALYZE TABLE t1 collect EITS stats for MTR but not users.

  • Let use_stat_tables=preferably remain what it currently is: ANALYZE TABLE t1 collects EITS stats. MTR will run with this setting.
  • Introduce another value of use_stat_tables=preferably_for_reads (name is tentative). This will be the default for the users. It will mean that ANALYZE TABLE t1 does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

Solution 3:

Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.
We do not want to push the defaults change late in the release cycle.



 Comments   
Comment by Sergei Petrunia [ 2018-09-20 ]

(Me and Varun are leaning towards Solution #2)
igor, serg, cvicentiu elenst - any opinions?

Comment by Sergei Golubchik [ 2018-09-20 ]

I prefer Solution #3. I generally don't like adding new variables, we have too many of them already. And particularly not new variables to make mtr happy.

Sampling is not that difficult. Perhaps the sampling task keeps growing and it is difficult now, but going back to the basics, just the functionality to make EITS ANALYZE faster than old-fashioned ANALYZE can be implemented rather quickly.

Comment by Sergei Petrunia [ 2018-10-25 ]

Noting the outcome of discussions on an earlier optimizer call (didn't take notes back then): Solution #3 would created dependency, so we are going to with solution #2. (If a good sampling implementation is pushed into 10.4 release, we would have an option to change this)

Comment by Varun Gupta (Inactive) [ 2018-11-16 ]

Patch
http://lists.askmonty.org/pipermail/commits/2018-November/013111.html

Comment by Sergei Petrunia [ 2018-11-24 ]

Review input provided over email

Comment by Varun Gupta (Inactive) [ 2018-11-27 ]

After discussions with psergey, we came to the conclusion we also need COMPLEMENTARY_FOR_QUERIES
that would not be collecting EITS statistics for ANALYZE table t1.

Comment by Varun Gupta (Inactive) [ 2018-12-05 ]

Patch
http://lists.askmonty.org/pipermail/commits/2018-December/013184.html

Comment by Sergei Petrunia [ 2018-12-06 ]

Review input provided over email. More test coverage is needed.

Comment by Varun Gupta (Inactive) [ 2018-12-06 ]

Fixed the previous patch to show the correct test coverage
http://lists.askmonty.org/pipermail/commits/2018-December/013187.html

Comment by Sergei Petrunia [ 2018-12-08 ]

Ok to push.

Generated at Thu Feb 08 08:35:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.