[MDEV-17255] New optimizer defaults and ANALYZE TABLE - Jira

Sergei Petrunia created issue - 2018-09-20 14:51

Sergei Petrunia made changes - 2018-09-20 14:51

Field	Original Value	New Value
Link		This issue is part of ~~MDEV-15253~~ [ ~~MDEV-15253~~ ]

Sergei Petrunia made changes - 2018-09-20 14:52

Description

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity.

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: introduce

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity.

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: introduce

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

Sergei Petrunia made changes - 2018-09-20 14:59

Description

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity.

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: introduce

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity.

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

Sergei Petrunia made changes - 2018-09-20 14:59

Description

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity.

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

Sergei Petrunia made changes - 2018-09-20 15:00

Description

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that
{code:sql}
ANALYZE TABLE t1
{code}

will collect EITS stats after ~~MDEV-15253~~.
This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that statement like
{code:sql}
ANALYZE TABLE t1
{code}

after ~~MDEV-15253~~ will start to collect EITS stats.

This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

Sergei Petrunia made changes - 2018-09-20 15:01

Description

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that statement like
{code:sql}
ANALYZE TABLE t1
{code}

after ~~MDEV-15253~~ will start to collect EITS stats.

This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that statement like
{code:sql}
ANALYZE TABLE t1
{code}

after ~~MDEV-15253~~ will start to collect EITS stats.

This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

Sergei Petrunia added a comment - 2018-09-20 15:03

(Me and Varun are leaning towards Solution #2)
igor, serg, cvicentiu elenst - any opinions?

Sergei Petrunia added a comment - 2018-09-20 15:03 (Me and Varun are leaning towards Solution #2) igor , serg , cvicentiu elenst - any opinions?

Sergei Petrunia made changes - 2018-09-20 15:04

Description

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that statement like
{code:sql}
ANALYZE TABLE t1
{code}

after ~~MDEV-15253~~ will start to collect EITS stats.

This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.

We have ~~MDEV-15253~~, which changes optimizer defaults to include using the histograms and their selectivity:

{noformat}
optimizer_use_condition_selectivity=4
use_stat_tables=PREFERABLY
{noformat}

One of the effects of the new settings is that statement like
{code:sql}
ANALYZE TABLE t1
{code}

after ~~MDEV-15253~~ will start to collect EITS stats.

This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

Possible ways out:

h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
* We will need to rollback all of the changes to .result files in ~~MDEV-15253~~.
* EITS will have few test coverage.

h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
* Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
* Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

(One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

h2. Solution 3:
Wait until Vicentiu and Teodor are done with EITS-via-sampling.
This is bad as it creates a dependency between these two tasks.
We do not want to push the defaults change late in the release cycle.

Sergei Golubchik added a comment - 2018-09-20 15:25

I prefer Solution #3. I generally don't like adding new variables, we have too many of them already. And particularly not new variables to make mtr happy.

Sampling is not that difficult. Perhaps the sampling task keeps growing and it is difficult now, but going back to the basics, just the functionality to make EITS ANALYZE faster than old-fashioned ANALYZE can be implemented rather quickly.

Sergei Golubchik added a comment - 2018-09-20 15:25 I prefer Solution #3. I generally don't like adding new variables, we have too many of them already. And particularly not new variables to make mtr happy. Sampling is not that difficult. Perhaps the sampling task keeps growing and it is difficult now, but going back to the basics, just the functionality to make EITS ANALYZE faster than old-fashioned ANALYZE can be implemented rather quickly.

Elena Stepanova made changes - 2018-09-20 23:37

Issue Type

Bug [ 1 ]

Task [ 3 ]

Sergei Petrunia made changes - 2018-09-27 19:45

Assignee

Varun Gupta [ varun ]

Sergei Petrunia made changes - 2018-09-27 19:46

Fix Version/s

10.4 [ 22408 ]

Sergei Petrunia added a comment - 2018-10-25 07:21

Noting the outcome of discussions on an earlier optimizer call (didn't take notes back then): Solution #3 would created dependency, so we are going to with solution #2. (If a good sampling implementation is pushed into 10.4 release, we would have an option to change this)

Sergei Petrunia added a comment - 2018-10-25 07:21 Noting the outcome of discussions on an earlier optimizer call (didn't take notes back then): Solution #3 would created dependency, so we are going to with solution #2. (If a good sampling implementation is pushed into 10.4 release, we would have an option to change this)

Varun Gupta (Inactive) added a comment - 2018-11-16 18:25

Patch
http://lists.askmonty.org/pipermail/commits/2018-November/013111.html

Varun Gupta (Inactive) added a comment - 2018-11-16 18:25 Patch http://lists.askmonty.org/pipermail/commits/2018-November/013111.html

Varun Gupta (Inactive) made changes - 2018-11-16 18:26

Status

Open [ 1 ]

In Progress [ 3 ]

Varun Gupta (Inactive) made changes - 2018-11-16 18:26

Assignee	Varun Gupta [ varun ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Petrunia added a comment - 2018-11-24 18:49

Review input provided over email

Sergei Petrunia added a comment - 2018-11-24 18:49 Review input provided over email

Sergei Petrunia made changes - 2018-11-24 18:49

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Sergei Petrunia made changes - 2018-11-24 18:49

Assignee

Sergei Petrunia [ psergey ]

Varun Gupta [ varun ]

Varun Gupta (Inactive) added a comment - 2018-11-27 10:49

After discussions with psergey, we came to the conclusion we also need COMPLEMENTARY_FOR_QUERIES
that would not be collecting EITS statistics for ANALYZE table t1.

Varun Gupta (Inactive) added a comment - 2018-11-27 10:49 After discussions with psergey , we came to the conclusion we also need COMPLEMENTARY_FOR_QUERIES that would not be collecting EITS statistics for ANALYZE table t1.

Varun Gupta (Inactive) made changes - 2018-12-04 11:46

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Varun Gupta (Inactive) added a comment - 2018-12-05 14:10 - edited

Patch
http://lists.askmonty.org/pipermail/commits/2018-December/013184.html

Varun Gupta (Inactive) added a comment - 2018-12-05 14:10 - edited Patch http://lists.askmonty.org/pipermail/commits/2018-December/013184.html

Varun Gupta (Inactive) made changes - 2018-12-05 14:10

Assignee	Varun Gupta [ varun ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Petrunia added a comment - 2018-12-06 19:15

Review input provided over email. More test coverage is needed.

Sergei Petrunia added a comment - 2018-12-06 19:15 Review input provided over email. More test coverage is needed.

Sergei Petrunia made changes - 2018-12-06 19:15

Assignee	Sergei Petrunia [ psergey ]	Varun Gupta [ varun ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Varun Gupta (Inactive) added a comment - 2018-12-06 21:27

Fixed the previous patch to show the correct test coverage
http://lists.askmonty.org/pipermail/commits/2018-December/013187.html

Varun Gupta (Inactive) added a comment - 2018-12-06 21:27 Fixed the previous patch to show the correct test coverage http://lists.askmonty.org/pipermail/commits/2018-December/013187.html

Varun Gupta (Inactive) made changes - 2018-12-06 21:27

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Varun Gupta (Inactive) made changes - 2018-12-06 21:27

Assignee	Varun Gupta [ varun ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Petrunia added a comment - 2018-12-08 12:41

Ok to push.

Sergei Petrunia added a comment - 2018-12-08 12:41 Ok to push.

Sergei Petrunia made changes - 2018-12-08 12:41

Assignee	Sergei Petrunia [ psergey ]	Varun Gupta [ varun ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Varun Gupta (Inactive) made changes - 2018-12-11 13:48

Component/s		Optimizer [ 10200 ]
Fix Version/s		10.4.1 [ 23228 ]
Fix Version/s	10.4 [ 22408 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2021-12-06 21:23

Workflow

MariaDB v3 [ 89690 ]

MariaDB v4 [ 133684 ]

MariaDB Server

New optimizer defaults and ANALYZE TABLE

Details

Description

Solution 1: Make `ANALYZE TABLE t1` not collect EITS stats

Solution 2: Make `ANALYZE TABLE t1` collect EITS stats for MTR but not users.

Solution 3:

Attachments

Issue Links

Activity

People

Dates

Git Integration

MariaDB Server

Details

Description

Solution 1: Make ANALYZE TABLE t1 not collect EITS stats

Solution 2: Make ANALYZE TABLE t1 collect EITS stats for MTR but not users.

Solution 3:

Attachments

Issue Links

Activity

People

Dates

Git Integration

Solution 1: Make `ANALYZE TABLE t1` not collect EITS stats

Solution 2: Make `ANALYZE TABLE t1` collect EITS stats for MTR but not users.