[MDEV-26976] Optimization problem unless LIMIT is included (any LIMIT) Created: 2021-11-04  Updated: 2021-12-26

Status: Open
Project: MariaDB Server
Component/s: Server
Fix Version/s: None

Type: Task Priority: Major
Reporter: Nuno Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File image-2021-11-04-19-52-16-764.png     PNG File image-2021-11-04-19-59-06-320.png     PNG File image-2021-11-04-20-00-09-412.png     PNG File screenshot-1.png    

 Description   

Hello,

I wonder if this is a bug I found here...

Say I have a `posts` table, and I want to query this `posts` table for the latest Posts by my Friends.

Here's an "obvious" query (my User ID being 12345, and reducing the lookup by RID > 49251041):

SELECT `posts`.`rid`
FROM `posts`
STRAIGHT_JOIN `friends` ON `id1`="12345" AND `id2`=`posts`.`from`
WHERE `posts`.`rid`>"49251041" ORDER BY `posts`.`rid` DESC LIMIT 5000

This is taking 3.6s

Looks "optimized" enough...

Since I found this really strange, I tried an alternative:

SELECT `posts`.`rid`
FROM `posts`
STRAIGHT_JOIN (SELECT id2 FROM `friends` WHERE `id1`="12345") AS `f` ON `f`.`id2`=`posts`.`from`
WHERE `posts`.`rid`>"49251041" ORDER BY `posts`.`rid` DESC LIMIT 5000

This is also taking 3.6s

The EXPLAIN looks pretty much the same, with a few more "rows" on the posts table.

HOWEVER, by putting *any* LIMIT inside the subquery, no matter how big or small it is,

SELECT `posts`.`rid`
FROM `posts`
STRAIGHT_JOIN (SELECT id2 FROM `friends` WHERE `id1`="12345" LIMIT 5000) AS `f` ON `f`.`id2`=`posts`.`from`
WHERE `posts`.`rid`>"49251041" ORDER BY `posts`.`rid` DESC LIMIT 5000

It straight away makes the query instant...
0.0387s

In the EXPLAIN, I see an extra "step" that I'm not sure what it means:

Even if I have something like:

LEFT JOIN `friends` ON `id1`="12345" AND `frid`=`posts`.`from`
...
WHERE ... (`friends`.`frid` IS NOT NULL OR (...)) ...

It's still taking its time...
As soon as I remove this "friends" bit, it becomes instant.

In terms of amount of rows...

SELECT COUNT(*) FROM `posts` WHERE `rid`>"49251041";

27754

SELECT COUNT(*) FROM `friends` WHERE `uid`="12345";

649

In terms of indexes,

posts:

friends:

Any ideas?

Why do I need to put a LIMIT, and the basic STRAIGHT_JOIN isn't efficient by default?

Note - this is on a much more complex query, but the above is a minimal example where the issue happens.

Using MariaDB 10.5.12 on CentOS 7
If it makes any difference, my "optimizer_use_condition_selectivity" = 1



 Comments   
Comment by Nuno [ 2021-11-05 ]

After some investigation, I understand now it has to do with the "Derived Table Merge" ("derived_merge") option of the optimizer, and putting "LIMIT" stops the merge from happening.

https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
https://mariadb.com/kb/en/derived-table-with-key-optimization/

However, I still don't understand why the first version of the query isn't efficient, despites the EXPLAIN saying it does use the indexes as expected.

Comment by Daniel Black [ 2021-12-01 ]

So what does your new found tool of EXPLAIN FORMAT=JSON show?

Comment by Nuno [ 2021-12-26 ]

danblack Very sorry for the delay. Great idea about trying `ANALYZE FORMAT=JSON` on this

I see that in the inefficient queries there is an:

"attached_condition": "friends.id2 = posts.`from`"

But I don't understand why it's bad.. both those columns are BIGINT(18) UNSIGNED NOT NULL.

ANALYZE FORMAT=JSON SELECT `posts`.`rid`
FROM `posts`
STRAIGHT_JOIN `friends` ON `id1`="2" AND `id2`=`posts`.`from`
WHERE `posts`.`rid`>"49251041" ORDER BY `posts`.`rid` DESC LIMIT 5000

{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 3896.429325,
    "table": {
      "table_name": "posts",
      "access_type": "range",
      "possible_keys": ["PRIMARY", "from_rid"],
      "key": "PRIMARY",
      "key_length": "8",
      "used_key_parts": ["rid"],
      "r_loops": 1,
      "rows": 1539408,
      "r_rows": 902574,
      "r_table_time_ms": 409.6966756,
      "r_other_time_ms": 132.2373258,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "posts.rid > 49251041"
    },
    "table": {
      "table_name": "friends",
      "access_type": "ref",
      "possible_keys": ["PRIMARY", "id1_time", "id2_id1", "id2_time"],
      "key": "id1_time",
      "key_length": "8",
      "used_key_parts": ["id1"],
      "ref": ["const"],
      "r_loops": 902574,
      "rows": 8,
      "r_rows": 8,
      "r_table_time_ms": 2959.085262,
      "r_other_time_ms": 395.4044702,
      "filtered": 100,
      "r_filtered": 0.049982051,
      "attached_condition": "friends.id2 = posts.`from`",
      "using_index": true
    }
  }
}

ANALYZE FORMAT=JSON SELECT `posts`.`rid`
FROM `posts`
STRAIGHT_JOIN (SELECT `id2` FROM `friends` WHERE `id1`="2") AS `f` ON `f`.`id2`=`posts`.`from`
WHERE `posts`.`rid`>"49251041" ORDER BY `posts`.`rid` DESC LIMIT 5000

{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 3914.359764,
    "table": {
      "table_name": "posts",
      "access_type": "range",
      "possible_keys": ["PRIMARY", "from_rid"],
      "key": "PRIMARY",
      "key_length": "8",
      "used_key_parts": ["rid"],
      "r_loops": 1,
      "rows": 1545328,
      "r_rows": 902577,
      "r_table_time_ms": 421.4094127,
      "r_other_time_ms": 140.2081879,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "posts.rid > 49251041"
    },
    "table": {
      "table_name": "friends",
      "access_type": "ref",
      "possible_keys": ["PRIMARY", "id1_time", "id2_id1", "id2_time"],
      "key": "id1_time",
      "key_length": "8",
      "used_key_parts": ["id1"],
      "ref": ["const"],
      "r_loops": 902577,
      "rows": 8,
      "r_rows": 8,
      "r_table_time_ms": 2897.589133,
      "r_other_time_ms": 455.1335689,
      "filtered": 100,
      "r_filtered": 0.049981885,
      "attached_condition": "friends.id2 = posts.`from`",
      "using_index": true
    }
  }
}

ANALYZE FORMAT=JSON SELECT `posts`.`rid`
FROM `posts`
STRAIGHT_JOIN (SELECT `id2` FROM `friends` WHERE `id1`="2" LIMIT 5000) AS `f` ON `f`.`id2`=`posts`.`from`
WHERE `posts`.`rid`>"49251041" ORDER BY `posts`.`rid` DESC LIMIT 5000

{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 679.9025479,
    "table": {
      "table_name": "posts",
      "access_type": "range",
      "possible_keys": ["PRIMARY", "from_rid"],
      "key": "PRIMARY",
      "key_length": "8",
      "used_key_parts": ["rid"],
      "r_loops": 1,
      "rows": 1547302,
      "r_rows": 902578,
      "r_table_time_ms": 423.5110086,
      "r_other_time_ms": 127.9726824,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "posts.rid > 49251041"
    },
    "table": {
      "table_name": "<derived2>",
      "access_type": "ref",
      "possible_keys": ["key0"],
      "key": "key0",
      "key_length": "8",
      "used_key_parts": ["id2"],
      "ref": ["mydb.posts.from"],
      "r_loops": 902578,
      "rows": 2,
      "r_rows": 0.003998546,
      "r_table_time_ms": 86.41143028,
      "r_other_time_ms": 41.97640026,
      "filtered": 100,
      "r_filtered": 100,
      "materialized": {
        "query_block": {
          "select_id": 2,
          "r_loops": 1,
          "r_total_time_ms": 0.02674722,
          "table": {
            "table_name": "friends",
            "access_type": "ref",
            "possible_keys": ["PRIMARY", "id1_time"],
            "key": "id1_time",
            "key_length": "8",
            "used_key_parts": ["id1"],
            "ref": ["const"],
            "r_loops": 1,
            "rows": 8,
            "r_rows": 8,
            "r_table_time_ms": 0.013238321,
            "r_other_time_ms": 0.005742284,
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          }
        }
      }
    }
  }
}

Generated at Thu Feb 08 09:49:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.