[MDEV-23704] Optimizer does not use semijoin optimization for some WHERE (pk1, pk2, pk3) IN ((1,2,3), ...) queries - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Not a Bug
Affects Version/s: 10.3.22
Fix Version/s: N/A
Component/s: Optimizer
Labels:
- in-to-subquery

Description

For query like this:

select * from t where (PK1, PK2, PK3) in ((v1,v2,v3), (v4,v5,v6), ...)

with long enough list of tuples for multi-column PK values there are 3 possible plans presented by quotes from ANALYZE FORMAT=JSON outputs):

1. Range scan, can be forced in some cases with FORCE INDEX(PRIMARY):

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 19.184,

    "table": {

      "table_name": "t",

      "access_type": "range",

      "possible_keys": ["PRIMARY"],

      "key": "PRIMARY",

      "key_length": "74",

      "used_key_parts": [

        "PK1",

        "PK2",

        "PK3"

],

      "r_loops": 1,

      "rows": 1000,

      "r_rows": 1000,

      "r_total_time_ms": 15.165,

      "filtered": 100,

      "r_filtered": 100,

      "attached_condition": "(t.PK1,t.PK2,t.PK3) in (<cache>((349,'*********','01')),<cache>((349,'*********','01')),)"

2. For the same IN list in other environment we end up with full table scan that is very slow:

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 51060,

    "table": {

      "table_name": "t",

      "access_type": "ALL",

      "possible_keys": [

        "PRIMARY",

...

],

      "r_loops": 1,

      "rows": 7957643,

      "r_rows": 7.96e6,

      "r_total_time_ms": 48324,

      "filtered": 100,

      "r_filtered": 0.0126,

      "attached_condition": "(t.PK1,t.PK2,t.PK3) in (<cache>((***,'*******','01')),<cache>((***,'*******','01')),.......)"

3. Finally in other environment with very similar data for the same IN list we end up with semijoin optimization applied:

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 11.565,

    "duplicates_removal": {

      "table": {

        "table_name": "<derived3>",

        "access_type": "ALL",

        "r_loops": 1,

        "rows": 1000,

        "r_rows": 1000,

        "r_total_time_ms": 0.1065,

        "filtered": 100,

        "r_filtered": 100,

        "materialized": {

          "query_block": {

            "union_result": {

              "table_name": "<unit3>",

              "access_type": "ALL",

              "r_loops": 0,

              "r_rows": null,

              "query_specifications": [

                  "query_block": {

                    "select_id": 3,

                    "table": {

                      "message": "No tables used"

},

      "table": {

        "table_name": "t",

        "access_type": "eq_ref",

        "possible_keys": [

          "PRIMARY",

...

],

        "key": "PRIMARY",

        "key_length": "74",

        "used_key_parts": [

          "PK1",

          "PK2",

          "PK3"

],

        "ref": ["tvc_0._col_1", "func", "func"],

        "r_loops": 1000,

        "rows": 1,

        "r_rows": 0.998,

        "r_total_time_ms": 7.7333,

        "filtered": 100,

        "r_filtered": 100,

        "attached_condition": "t.PK2 = convert(tvc_0._col_2 using utf8) and t.PK3 = convert(tvc_0._col_3 using utf8)"

This plan is the fastest even comparing to range join, but there is no way to force it.

So, I think there is a bug when semijoin optimization is not used even when "range" is extended to full table scan as an alternative. I'd also like to have a way to force the best plan.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

screenshot-1.png
29 kB
2020-10-07 10:08
a2.test
21 kB
2020-10-01 14:07

Issue Links

relates to

MDEV-20900 IN predicate to IN subquery conversion causes performance regression

Closed

MDEV-21265 IN predicate conversion to IN subquery should be allowed for a broader set of datatype comparison

Closed

Activity

People

Assignee:: Sergei Petrunia

Reporter:: Valerii Kravchuk

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2020-09-09 09:20

Updated:: 2024-07-07 21:57

Resolved:: 2020-10-07 13:47

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.