Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17255

New optimizer defaults and ANALYZE TABLE

Details

    Description

      We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

      optimizer_use_condition_selectivity=4
      use_stat_tables=PREFERABLY
      

      One of the effects of the new settings is that statement like

      ANALYZE TABLE t1
      

      after MDEV-15253 will start to collect EITS stats.

      This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

      However, it is not appropriate for production uses: If ANALYZE TABLE t1 collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

      Possible ways out:

      Solution 1: Make ANALYZE TABLE t1 not collect EITS stats

      • We will need to rollback all of the changes to .result files in MDEV-15253.
      • EITS will have few test coverage.

      Solution 2: Make ANALYZE TABLE t1 collect EITS stats for MTR but not users.

      • Let use_stat_tables=preferably remain what it currently is: ANALYZE TABLE t1 collects EITS stats. MTR will run with this setting.
      • Introduce another value of use_stat_tables=preferably_for_reads (name is tentative). This will be the default for the users. It will mean that ANALYZE TABLE t1 does not collect EITS stats.

      (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

      Solution 3:

      Wait until Vicentiu and Teodor are done with EITS-via-sampling.
      This is bad as it creates a dependency between these two tasks.
      We do not want to push the defaults change late in the release cycle.

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            psergei Sergei Petrunia made changes -
            Description We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity.

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: introduce

            We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity.

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: introduce

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            psergei Sergei Petrunia made changes -
            Description We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity.

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: introduce

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity.

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            psergei Sergei Petrunia made changes -
            Description We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity.

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            psergei Sergei Petrunia made changes -
            Description We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that
            {code:sql}
            ANALYZE TABLE t1
            {code}

            will collect EITS stats after MDEV-15253.
            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).
            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it will take much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that statement like
            {code:sql}
            ANALYZE TABLE t1
            {code}

            after MDEV-15253 will start to collect EITS stats.

            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            psergei Sergei Petrunia made changes -
            Description We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that statement like
            {code:sql}
            ANALYZE TABLE t1
            {code}

            after MDEV-15253 will start to collect EITS stats.

            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predicatable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that statement like
            {code:sql}
            ANALYZE TABLE t1
            {code}

            after MDEV-15253 will start to collect EITS stats.

            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.

            (Me and Varun are leaning towards Solution #2)
            igor, serg, cvicentiu elenst - any opinions?

            psergei Sergei Petrunia added a comment - (Me and Varun are leaning towards Solution #2) igor , serg , cvicentiu elenst - any opinions?
            psergei Sergei Petrunia made changes -
            Description We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that statement like
            {code:sql}
            ANALYZE TABLE t1
            {code}

            after MDEV-15253 will start to collect EITS stats.

            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            We have MDEV-15253, which changes optimizer defaults to include using the histograms and their selectivity:

            {noformat}
            optimizer_use_condition_selectivity=4
            use_stat_tables=PREFERABLY
            {noformat}

            One of the effects of the new settings is that statement like
            {code:sql}
            ANALYZE TABLE t1
            {code}

            after MDEV-15253 will start to collect EITS stats.

            This was enabled in MTR and was instrumental in finding a lot of bugs related to EITS (Good).

            However, it is not appropriate for production uses: If {{ANALYZE TABLE t1}} collects EITS stats, it takes much more time (my measurement: 10x time the full table scan). This is BAD.

            Possible ways out:

            h2. Solution 1: Make {{ANALYZE TABLE t1}} not collect EITS stats
            * We will need to rollback all of the changes to .result files in MDEV-15253.
            * EITS will have few test coverage.

            h2. Solution 2: Make {{ANALYZE TABLE t1}} collect EITS stats for MTR but not users.
            * Let {{use_stat_tables=preferably}} remain what it currently is: {{ANALYZE TABLE t1}} collects EITS stats. MTR will run with this setting.
             * Introduce another value of {{use_stat_tables=preferably_for_reads}} (name is tentative). This will be the default for the users. It will mean that {{ANALYZE TABLE t1}} does not collect EITS stats.

            (One may argue that this is bad as MTR will run in an environment that's not like the users have. On the other hand, MTR will run with predictable statistical data. MTR used to run with sampled, non-predictable stats which made `rows` column and query plans unstable)

            h2. Solution 3:
            Wait until Vicentiu and Teodor are done with EITS-via-sampling.
            This is bad as it creates a dependency between these two tasks.
            We do not want to push the defaults change late in the release cycle.

            I prefer Solution #3. I generally don't like adding new variables, we have too many of them already. And particularly not new variables to make mtr happy.

            Sampling is not that difficult. Perhaps the sampling task keeps growing and it is difficult now, but going back to the basics, just the functionality to make EITS ANALYZE faster than old-fashioned ANALYZE can be implemented rather quickly.

            serg Sergei Golubchik added a comment - I prefer Solution #3. I generally don't like adding new variables, we have too many of them already. And particularly not new variables to make mtr happy. Sampling is not that difficult. Perhaps the sampling task keeps growing and it is difficult now, but going back to the basics, just the functionality to make EITS ANALYZE faster than old-fashioned ANALYZE can be implemented rather quickly.
            elenst Elena Stepanova made changes -
            Issue Type Bug [ 1 ] Task [ 3 ]
            psergei Sergei Petrunia made changes -
            Assignee Varun Gupta [ varun ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.4 [ 22408 ]

            Noting the outcome of discussions on an earlier optimizer call (didn't take notes back then): Solution #3 would created dependency, so we are going to with solution #2. (If a good sampling implementation is pushed into 10.4 release, we would have an option to change this)

            psergei Sergei Petrunia added a comment - Noting the outcome of discussions on an earlier optimizer call (didn't take notes back then): Solution #3 would created dependency, so we are going to with solution #2. (If a good sampling implementation is pushed into 10.4 release, we would have an option to change this)
            varun Varun Gupta (Inactive) added a comment - Patch http://lists.askmonty.org/pipermail/commits/2018-November/013111.html
            varun Varun Gupta (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Review input provided over email

            psergei Sergei Petrunia added a comment - Review input provided over email
            psergei Sergei Petrunia made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]

            After discussions with psergey, we came to the conclusion we also need COMPLEMENTARY_FOR_QUERIES
            that would not be collecting EITS statistics for ANALYZE table t1.

            varun Varun Gupta (Inactive) added a comment - After discussions with psergey , we came to the conclusion we also need COMPLEMENTARY_FOR_QUERIES that would not be collecting EITS statistics for ANALYZE table t1.
            varun Varun Gupta (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            varun Varun Gupta (Inactive) added a comment - - edited Patch http://lists.askmonty.org/pipermail/commits/2018-December/013184.html
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Review input provided over email. More test coverage is needed.

            psergei Sergei Petrunia added a comment - Review input provided over email. More test coverage is needed.
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            Fixed the previous patch to show the correct test coverage
            http://lists.askmonty.org/pipermail/commits/2018-December/013187.html

            varun Varun Gupta (Inactive) added a comment - Fixed the previous patch to show the correct test coverage http://lists.askmonty.org/pipermail/commits/2018-December/013187.html
            varun Varun Gupta (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Ok to push.

            psergei Sergei Petrunia added a comment - Ok to push.
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            varun Varun Gupta (Inactive) made changes -
            Component/s Optimizer [ 10200 ]
            Fix Version/s 10.4.1 [ 23228 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 89690 ] MariaDB v4 [ 133684 ]

            People

              varun Varun Gupta (Inactive)
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.