[MDEV-33971] Using NAME_CONST() (or executing query from the stored procedure and referring to a local variable) changes the plan and may make execution slower - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.21, 10.6.17, 10.6.18
Fix Version/s: 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3
Component/s: Optimizer
Labels:
None

Description

Consider the following simple table:

Yuliyas-Air:maria10.6 Valerii$ bin/mysql test

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 3

Server version: 10.6.18-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select * from tt1;

+------+------+------+

| id   | c1   | c2   |

+------+------+------+

|    1 |    3 | 1    |

|    2 |    2 | 3    |

+------+------+------+

2 rows in set (0.014 sec)

MariaDB [test]> show create table tt1\G

*************************** 1. row ***************************

       Table: tt1

Create Table: CREATE TABLE `tt1` (

  `id` int(11) DEFAULT NULL,

  `c1` int(11) DEFAULT NULL,

  `c2` varchar(100) DEFAULT NULL,

  KEY `c1` (`c1`),

  KEY `c2` (`c2`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci

1 row in set (0.004 sec)

MariaDB [test]> insert into tt1 select * from tt1;

Query OK, 2 rows affected (0.016 sec)

Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into tt1 select * from tt1;

Query OK, 4 rows affected (0.003 sec)

Records: 4  Duplicates: 0  Warnings: 0

...

MariaDB [test]> insert into tt1 select * from tt1;

Query OK, 32768 rows affected (0.297 sec)

Records: 32768  Duplicates: 0  Warnings: 0

MariaDB [test]> update tt1 set c1 = 4 limit 1;

Query OK, 1 row affected (0.012 sec)

Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [test]> select c1, count(*) from tt1 group by c1;

+------+----------+

| c1   | count(*) |

+------+----------+

|    2 |    32768 |

|    3 |    32767 |

|    4 |        1 |

+------+----------+

3 rows in set (0.042 sec)

The following queries get expected execution plans:

MariaDB [test]> explain select * from tt1 where c1 = 3;

+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

|    1 | SIMPLE      | tt1   | ALL  | c1            | NULL | NULL    | NULL | 65758 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

1 row in set (0.000 sec)

MariaDB [test]> explain select * from tt1 where c1 = 4;

+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |

+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

|    1 | SIMPLE      | tt1   | ref  | c1            | c1   | 5       | const | 1    |       |

+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

1 row in set (0.003 sec)

MariaDB [test]> explain select * from tt1 where c1 = '4';

+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |

+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

|    1 | SIMPLE      | tt1   | ref  | c1            | c1   | 5       | const | 1    |       |

+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

1 row in set (0.000 sec)

but if we add name_const() as it happens when the query is executed from the stored routine the plan changes to use index condition pushdown:

MariaDB [test]> explain select * from tt1 where c1 = name_const('a',4);

+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                 |

+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

|    1 | SIMPLE      | tt1   | ref  | c1            | c1   | 5       | const | 1    | Using index condition |

+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

1 row in set (0.004 sec)

MariaDB [test]> explain select * from tt1 where c1 = name_const('a','4');

+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                 |

+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

|    1 | SIMPLE      | tt1   | ref  | c1            | c1   | 5       | const | 1    | Using index condition |

+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

1 row in set (0.000 sec)

which is strange and not expected. Moreover, we may note somewhat slower execution as a result:

MariaDB [test]> analyze format=json select * from tt1 where c1 = 4\G

*************************** 1. row ***************************

ANALYZE: {

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 2.356,

    "table": {

      "table_name": "tt1",

      "access_type": "ref",

      "possible_keys": ["c1"],

      "key": "c1",

      "key_length": "5",

      "used_key_parts": ["c1"],

      "ref": ["const"],

      "r_loops": 1,

      "rows": 1,

      "r_rows": 1,

      "r_table_time_ms": 0.203,

      "r_other_time_ms": 0.01,

      "r_engine_stats": {

        "pages_accessed": 4

},

      "filtered": 100,

      "r_filtered": 100

1 row in set (0.004 sec)

MariaDB [test]> analyze format=json select * from tt1 where c1 = name_const('a',4)\G

*************************** 1. row ***************************

ANALYZE: {

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 2.423,

    "table": {

      "table_name": "tt1",

      "access_type": "ref",

      "possible_keys": ["c1"],

      "key": "c1",

      "key_length": "5",

      "used_key_parts": ["c1"],

      "ref": ["const"],

      "r_loops": 1,

      "rows": 1,

      "r_rows": 1,

      "r_table_time_ms": 2.396,

      "r_other_time_ms": 0.02,

      "r_engine_stats": {

        "pages_accessed": 4

},

      "filtered": 100,

      "r_filtered": 100,

      "index_condition": "tt1.c1 = 4"

1 row in set (0.003 sec)

Why all these happens and can we force the plans to be the same for such cases?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdev-33971.numbers
210 kB
2024-04-23 14:50

Activity

Ascending order - Click to sort in descending order

Valerii Kravchuk created issue - 2024-04-23 08:30

Dave Gosselin made changes - 2024-04-23 13:13

Field	Original Value	New Value
Assignee		Dave Gosselin [ JIRAUSER52216 ]

Dave Gosselin made changes - 2024-04-23 13:13

Fix Version/s

10.5.25 [ 29626 ]

Dave Gosselin made changes - 2024-04-23 13:13

Status

Open [ 1 ]

In Progress [ 3 ]

Dave Gosselin made changes - 2024-04-23 14:50

Attachment

mdev-33971.numbers [ 73438 ]

Dave Gosselin added a comment - 2024-04-23 14:54 - edited

I attached a spreadsheet wherein I ran the following queries, named "index" and "NAME_CONST", respectively:

analyze format=json select * from tt1 where c1 = 3\G

analyze format=json select * from tt1 where c1 = name_const('a', 3)\G

I ran this pair of queries five times under a Debug build, capturing the result of r_total_time_ms for each query, and placing it in the corresponding column of the spreadsheet. I then repeated the five queries, but reversing their order, so that "NAME_CONST" was run first, then "index" second, capturing the same information in the spreadsheet. I repeated the entire experiment again for a Release build. I'm running the latest 10.5, git sha e73181112f684debb7246a298fabc4cbce04eb9b which is a pre-10.5.25 version. The results are then used to compute median and average run times.

What we see is that overwhelmingly the order of the queries matters far more than whether NAME_CONST is present. Unless there is some configuration or server variables that have also been modified, there is no bug here as far as I can measure.

I should add, I used a table with 500K rows generated according to the algorithm in the description.

Dave Gosselin added a comment - 2024-04-23 14:54 - edited I attached a spreadsheet wherein I ran the following queries, named "index" and "NAME_CONST", respectively: analyze format=json select * from tt1 where c1 = 3\G analyze format=json select * from tt1 where c1 = name_const('a', 3)\G I ran this pair of queries five times under a Debug build, capturing the result of r_total_time_ms for each query, and placing it in the corresponding column of the spreadsheet. I then repeated the five queries, but reversing their order, so that "NAME_CONST" was run first, then "index" second, capturing the same information in the spreadsheet. I repeated the entire experiment again for a Release build. I'm running the latest 10.5, git sha e73181112f684debb7246a298fabc4cbce04eb9b which is a pre-10.5.25 version. The results are then used to compute median and average run times. What we see is that overwhelmingly the order of the queries matters far more than whether NAME_CONST is present. Unless there is some configuration or server variables that have also been modified, there is no bug here as far as I can measure. I should add, I used a table with 500K rows generated according to the algorithm in the description.

Valerii Kravchuk added a comment - 2024-04-23 15:06

Do you see different plans and if you do, why are they different? Where is this documented?

Valerii Kravchuk added a comment - 2024-04-23 15:06 Do you see different plans and if you do, why are they different? Where is this documented?

Dave Gosselin added a comment - 2024-04-23 15:13

First 'analyze' output, then 'explain' output. Apart from timings, the analyze outputs appear to be identical. I'll repeat this work on 10.5.19 and see if that makes a difference.

MariaDB [test]> analyze format=json select * from tt1 where c1 = 3\G

*************************** 1. row ***************************

ANALYZE: {

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 161.4513492,

    "table": {

      "table_name": "tt1",

      "access_type": "ALL",

      "possible_keys": ["c1"],

      "r_loops": 1,

      "rows": 1046577,

      "r_rows": 1048584,

      "r_table_time_ms": 125.2281306,

      "r_other_time_ms": 36.21630051,

      "filtered": 49.99995041,

      "r_filtered": 50,

      "attached_condition": "tt1.c1 = 3"

1 row in set (0.162 sec)

MariaDB [test]> analyze format=json select * from tt1 where c1 = name_const('a', 3)\G

*************************** 1. row ***************************

ANALYZE: {

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 137.0009049,

    "table": {

      "table_name": "tt1",

      "access_type": "ALL",

      "possible_keys": ["c1"],

      "r_loops": 1,

      "rows": 1046577,

      "r_rows": 1048584,

      "r_table_time_ms": 105.7870582,

      "r_other_time_ms": 31.21084605,

      "filtered": 49.99995041,

      "r_filtered": 50,

      "attached_condition": "tt1.c1 = 3"

1 row in set (0.137 sec)

The explain outputs look identical as well:

MariaDB [test]> explain select * from tt1 where c1 = 3;

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

|    1 | SIMPLE      | tt1   | ALL  | c1            | NULL | NULL    | NULL | 1046577 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

1 row in set (0.000 sec)

MariaDB [test]>

MariaDB [test]> explain select * from tt1 where c1 = '3';

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

|    1 | SIMPLE      | tt1   | ALL  | c1            | NULL | NULL    | NULL | 1046577 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

1 row in set (0.001 sec)

MariaDB [test]>

MariaDB [test]> explain select * from tt1 where c1 = name_const('a', '3');

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

|    1 | SIMPLE      | tt1   | ALL  | c1            | NULL | NULL    | NULL | 1046577 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

1 row in set (0.001 sec)

MariaDB [test]>

MariaDB [test]> explain select * from tt1 where c1 = name_const('a', 3);

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

|    1 | SIMPLE      | tt1   | ALL  | c1            | NULL | NULL    | NULL | 1046577 | Using where |

+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

1 row in set (0.000 sec)

Dave Gosselin added a comment - 2024-04-23 15:13 First 'analyze' output, then 'explain' output. Apart from timings, the analyze outputs appear to be identical. I'll repeat this work on 10.5.19 and see if that makes a difference. MariaDB [test]> analyze format=json select * from tt1 where c1 = 3\G *************************** 1. row *************************** ANALYZE: { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 161.4513492, "table": { "table_name": "tt1", "access_type": "ALL", "possible_keys": ["c1"], "r_loops": 1, "rows": 1046577, "r_rows": 1048584, "r_table_time_ms": 125.2281306, "r_other_time_ms": 36.21630051, "filtered": 49.99995041, "r_filtered": 50, "attached_condition": "tt1.c1 = 3" } } } 1 row in set (0.162 sec) MariaDB [test]> analyze format=json select * from tt1 where c1 = name_const('a', 3)\G *************************** 1. row *************************** ANALYZE: { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 137.0009049, "table": { "table_name": "tt1", "access_type": "ALL", "possible_keys": ["c1"], "r_loops": 1, "rows": 1046577, "r_rows": 1048584, "r_table_time_ms": 105.7870582, "r_other_time_ms": 31.21084605, "filtered": 49.99995041, "r_filtered": 50, "attached_condition": "tt1.c1 = 3" } } } 1 row in set (0.137 sec) The explain outputs look identical as well: MariaDB [test]> explain select * from tt1 where c1 = 3; +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | tt1 | ALL | c1 | NULL | NULL | NULL | 1046577 | Using where | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.000 sec) MariaDB [test]> MariaDB [test]> explain select * from tt1 where c1 = '3'; +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | tt1 | ALL | c1 | NULL | NULL | NULL | 1046577 | Using where | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.001 sec) MariaDB [test]> MariaDB [test]> explain select * from tt1 where c1 = name_const('a', '3'); +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | tt1 | ALL | c1 | NULL | NULL | NULL | 1046577 | Using where | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.001 sec) MariaDB [test]> MariaDB [test]> explain select * from tt1 where c1 = name_const('a', 3); +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | tt1 | ALL | c1 | NULL | NULL | NULL | 1046577 | Using where | +------+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.000 sec)

Valerii Kravchuk added a comment - 2024-04-23 15:18

Please, check also on 10.6.17.

Valerii Kravchuk added a comment - 2024-04-23 15:18 Please, check also on 10.6.17.

Dave Gosselin added a comment - 2024-04-24 20:52

I repeated my prior experiment on both 10.6.17 and 10.5.19. I considered 'Release' type builds only this time. We continue to see that the first query of the two takes longer to execute than the second, regardless of which one is first.

On 10.5.19, the "NAME_CONST" query first runs in a median time of 163ms, then the "index" query runs in a median time of 128ms.
On 10.5.19, the "index" query first runs in a median time of 153ms, then the "NAME_CONST" query runs in a median time of 139ms.
On 10.6.17, the "NAME_CONST" query first runs in a median time of 192ms, then the "index" query runs in a median time of 171ms.
On 10.6.17, the "index" query first runs in a median time of 193ms, then the "NAME_CONST" query runs in a median time of 170ms.

Interestingly and in this latest set of comparisons, we see that when running the "NAME_CONST" query second, it never runs faster than when we run the "index" query second; 139ms < 128ms, and 170ms < 171ms. This wasn't clear at all in the previous experiment, partly because we had less data and looked at one version only.

I have another spreadsheet locally with these data if you're interested in seeing it.

Dave Gosselin added a comment - 2024-04-24 20:52 I repeated my prior experiment on both 10.6.17 and 10.5.19. I considered 'Release' type builds only this time. We continue to see that the first query of the two takes longer to execute than the second, regardless of which one is first. On 10.5.19, the "NAME_CONST" query first runs in a median time of 163ms, then the "index" query runs in a median time of 128ms. On 10.5.19, the "index" query first runs in a median time of 153ms, then the "NAME_CONST" query runs in a median time of 139ms. On 10.6.17, the "NAME_CONST" query first runs in a median time of 192ms, then the "index" query runs in a median time of 171ms. On 10.6.17, the "index" query first runs in a median time of 193ms, then the "NAME_CONST" query runs in a median time of 170ms. Interestingly and in this latest set of comparisons, we see that when running the "NAME_CONST" query second, it never runs faster than when we run the "index" query second; 139ms < 128ms, and 170ms < 171ms. This wasn't clear at all in the previous experiment, partly because we had less data and looked at one version only. I have another spreadsheet locally with these data if you're interested in seeing it.

Valerii Kravchuk added a comment - 2024-04-25 05:38

Let's execute each query many times to remove that "first" vs "second" execution impact. With my original test data mysqlslap of 10000 executions give the following over 10 iterations:

Yuliyas-Air:maria10.6 Valerii$ bin/mysqlslap --concurrency=1 --create-schema=test --no-drop --number-of-queries=10000 --iterations=10 --query='select * from tt1 where c1 = 4;'

Benchmark

	Average number of seconds to run all queries: 1.053 seconds

	Minimum number of seconds to run all queries: 1.037 seconds

	Maximum number of seconds to run all queries: 1.085 seconds

	Number of clients running queries: 1

	Average number of queries per client: 10000

Yuliyas-Air:maria10.6 Valerii$ bin/mysqlslap --concurrency=1 --create-schema=test --no-drop --number-of-queries=10000 --iterations=10 --query="select * from tt1 where c1 = name_const('a',4)";

Benchmark

	Average number of seconds to run all queries: 1.127 seconds

	Minimum number of seconds to run all queries: 1.110 seconds

	Maximum number of seconds to run all queries: 1.163 seconds

	Number of clients running queries: 1

	Average number of queries per client: 10000

The difference is clearly visible at this scale/number of runs. The server is started with --no-defaults (if you think any settings matter).

Also the questions remain for 10.6.17+:

"Do you see different plans and if you do, why are they different? Where is this documented?"

Valerii Kravchuk added a comment - 2024-04-25 05:38 Let's execute each query many times to remove that "first" vs "second" execution impact. With my original test data mysqlslap of 10000 executions give the following over 10 iterations: Yuliyas-Air:maria10.6 Valerii$ bin/mysqlslap --concurrency=1 --create-schema=test --no-drop --number-of-queries=10000 --iterations=10 --query='select * from tt1 where c1 = 4;' Benchmark Average number of seconds to run all queries: 1.053 seconds Minimum number of seconds to run all queries: 1.037 seconds Maximum number of seconds to run all queries: 1.085 seconds Number of clients running queries: 1 Average number of queries per client: 10000 Yuliyas-Air:maria10.6 Valerii$ bin/mysqlslap --concurrency=1 --create-schema=test --no-drop --number-of-queries=10000 --iterations=10 --query="select * from tt1 where c1 = name_const('a',4)"; Benchmark Average number of seconds to run all queries: 1.127 seconds Minimum number of seconds to run all queries: 1.110 seconds Maximum number of seconds to run all queries: 1.163 seconds Number of clients running queries: 1 Average number of queries per client: 10000 The difference is clearly visible at this scale/number of runs. The server is started with --no-defaults (if you think any settings matter). Also the questions remain for 10.6.17+: "Do you see different plans and if you do, why are they different? Where is this documented?"

Dave Gosselin added a comment - 2024-04-25 12:35

Interesting, I haven't heard of mysqlslap before. I've been running the queries over five iterations and gathering the data manually. I agree at this scale it's pretty clear there's a difference. I'll return to your question regarding different plans/documentation today. Please tell me where I can find mysqlslap because I don't see it in the CE server codebase. Thanks.

Dave Gosselin added a comment - 2024-04-25 12:35 Interesting, I haven't heard of mysqlslap before. I've been running the queries over five iterations and gathering the data manually. I agree at this scale it's pretty clear there's a difference. I'll return to your question regarding different plans/documentation today. Please tell me where I can find mysqlslap because I don't see it in the CE server codebase. Thanks.

Valerii Kravchuk added a comment - 2024-04-25 12:48

This is a standard client a.k.a. mariadb-slap:

https://mariadb.com/kb/en/mariadb-slap/

that should be in the basedir/bin. Here is the source code:

https://github.com/MariaDB/server/blob/10.6/client/mysqlslap.c

Valerii Kravchuk added a comment - 2024-04-25 12:48 This is a standard client a.k.a. mariadb-slap: https://mariadb.com/kb/en/mariadb-slap/ that should be in the basedir/bin. Here is the source code: https://github.com/MariaDB/server/blob/10.6/client/mysqlslap.c

Sergei Petrunia added a comment - 2024-04-26 08:13 - edited

Discussion take-aways:

How NAME_CONST adds overhead

NAME_CONST can indeed add some overhead:

In case of full table scan, without NAME_CONST we have the WHERE condition c1=3 represented as:

Item_func_eq( Item_field(c1),  Item_int(3))

with NAME_CONST it becomes:

Item_func_eq( Item_name_const( Item_field(c1)),  Item_int(3) )

Computing that means adding one extra virtual function call. It is surprising that this is visible in performance numbers.

In case of ref access: c1=3 is guaranteed to be true for all rows returned by the index lookup. The optimizer can figure this out and remove the check (see sql_select.cc:test_if_ref()).
When c1= NAME_CONST(...) is used, the optimizer is apparently unable to figure out that the condition will be always true and the check remains.
This means there's some extra CPU overhead.

Is NAME_CONST needed?

NAME_CONST used instead of SP vars. It is necessary in the SELECT list. Is it necessary to use it in the WHERE clause? Names of items in the WHERE clause do not matter. Some attributes might matter (e.g. coercibility? take coercibility of an SP var reference, a string literal and of NAME_CONST(). ) At execution phase, NAME_CONST is definitely not necessary.

Suggested thing to try

What if Item_name_const::fix_fields() substituted itself with the constant it is wrapping? It can assign ((Item*)constant)->name.

Sergei Petrunia added a comment - 2024-04-26 08:13 - edited Discussion take-aways: How NAME_CONST adds overhead NAME_CONST can indeed add some overhead: In case of full table scan, without NAME_CONST we have the WHERE condition c1=3 represented as: Item_func_eq( Item_field(c1), Item_int(3)) with NAME_CONST it becomes: Item_func_eq( Item_name_const( Item_field(c1)), Item_int(3) ) Computing that means adding one extra virtual function call. It is surprising that this is visible in performance numbers. In case of ref access: c1=3 is guaranteed to be true for all rows returned by the index lookup. The optimizer can figure this out and remove the check (see sql_select.cc:test_if_ref()). When c1= NAME_CONST(...) is used, the optimizer is apparently unable to figure out that the condition will be always true and the check remains. This means there's some extra CPU overhead. Is NAME_CONST needed? NAME_CONST used instead of SP vars. It is necessary in the SELECT list. Is it necessary to use it in the WHERE clause? Names of items in the WHERE clause do not matter. Some attributes might matter (e.g. coercibility? take coercibility of an SP var reference, a string literal and of NAME_CONST(). ) At execution phase, NAME_CONST is definitely not necessary. Suggested thing to try What if Item_name_const::fix_fields() substituted itself with the constant it is wrapping? It can assign ((Item*)constant)->name.

Dave Gosselin added a comment - 2024-04-26 15:21

Hi valerii, using mariadb-slap I can clearly see the performance differences between using ref access with and without NAME_CONST. In response to your earlier question ("Do you see different plans and if you do, why are they different? Where is this documented?"), there are no material plan differences, but in 10.6.17 we now include a new metric, r_engine_stats, which was not present in 10.5.19. With respect to the benchmark measurements, I noted that NAME_CONST is slower than direct ref access in both 10.5.19 and 10.6.17, but in the latter version it's yet even slower, by an additional 100ms. With mariadb-slap, I ran 50,000 queries over three iterations in order to clearly demonstrate the difference.

Dave Gosselin added a comment - 2024-04-26 15:21 Hi valerii , using mariadb-slap I can clearly see the performance differences between using ref access with and without NAME_CONST. In response to your earlier question ("Do you see different plans and if you do, why are they different? Where is this documented?"), there are no material plan differences, but in 10.6.17 we now include a new metric, r_engine_stats, which was not present in 10.5.19. With respect to the benchmark measurements, I noted that NAME_CONST is slower than direct ref access in both 10.5.19 and 10.6.17, but in the latter version it's yet even slower, by an additional 100ms. With mariadb-slap, I ran 50,000 queries over three iterations in order to clearly demonstrate the difference.

Sergei Golubchik made changes - 2024-04-29 11:50

Fix Version/s		10.5 [ 23123 ]
Fix Version/s	10.5.25 [ 29626 ]

Dave Gosselin made changes - 2024-04-30 20:27

Assignee	Dave Gosselin [ JIRAUSER52216 ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Petrunia added a comment - 2024-05-14 11:00 - edited

A bit more detail about where/how NAME_CONST is generated:

Example:

create table t1 (a int, b int);

insert into t1 values (1,1),(2,2);

delimiter //;

create procedure p1()

begin

  declare var1 int;

  update t1 set b=var1 where a=1;

end

//;

delimiter ;//

Shows NAME_CONST is produced here:

  #0  Item_splocal::append_for_log (this=0x7fffa096e838, thd=0x7fffa0000d78, str=0x7ffff01c4ac0) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:140

  #1  0x0000555555dd765a in Copy_query_with_rewrite::append (this=0x7ffff01c4a90, p=0x7fffa096e8c8) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/item.h:614

  #2  0x0000555555dc9170 in subst_spvars (thd=0x7fffa0000d78, instr=0x7fffa09740a8, query_str=0x7fffa09740e0) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:1129

  #3  0x0000555555dd06cf in sp_instr_stmt::execute (this=0x7fffa09740a8, thd=0x7fffa0000d78, nextp=0x7ffff01c4edc) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:3737

  #4  0x0000555555dc9e44 in sp_head::execute (this=0x7fffa096cd00, thd=0x7fffa0000d78, merge_da_on_success=true) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:1442

  #5  0x0000555555dccc4c in sp_head::execute_procedure (this=0x7fffa096cd00, thd=0x7fffa0000d78, args=0x7fffa0006088) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:2485

  #6  0x0000555555ecdd31 in do_execute_sp (thd=0x7fffa0000d78, sp=0x7fffa096cd00) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:3064

  #7  0x0000555555ece964 in Sql_cmd_call::execute (this=0x7fffa0016ce8, thd=0x7fffa0000d78) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:3310

  #8  0x0000555555ed8a3a in mysql_execute_command (thd=0x7fffa0000d78, is_called_from_prepared_stmt=false) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:6124

  #9  0x0000555555edeabe in mysql_parse (thd=0x7fffa0000d78, rawbuf=0x7fffa0016c40 "call p1()", length=9, parser_state=0x7ffff01c6360) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:8146

subst_spvars does this:

    /* Find rewritable Items used in this statement */

    for (Item *item= instr->free_list; item; item= item->next)

      Rewritable_query_parameter *rqp= item->get_rewritable_query_parameter();

get_rewritable_query_parameter() is non-NULL for Item_splocal and Item_param.
Rewritable_query_parameter has

    my_ptrdiff_t pos_in_query;

    uint len_in_query;

    virtual bool append_for_log(THD *thd, String *str) = 0;

That is, Items use it to replace themselves in the original query text (which is just String).

At the time the substitution is made, Items that have no idea where in the query they are located (and whether it's the same part of the query that they were in the original query text).

Because of this, fixing the MDEV by "do not produce NAME_CONST(...) calls where it is not necessary" is hard.

Sergei Petrunia added a comment - 2024-05-14 11:00 - edited A bit more detail about where/how NAME_CONST is generated: Example: create table t1 (a int , b int ); insert into t1 values (1,1),(2,2); delimiter //; create procedure p1() begin declare var1 int ; update t1 set b=var1 where a=1; end //; delimiter ;// Shows NAME_CONST is produced here: #0 Item_splocal::append_for_log (this=0x7fffa096e838, thd=0x7fffa0000d78, str=0x7ffff01c4ac0) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:140 #1 0x0000555555dd765a in Copy_query_with_rewrite::append (this=0x7ffff01c4a90, p=0x7fffa096e8c8) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/item.h:614 #2 0x0000555555dc9170 in subst_spvars (thd=0x7fffa0000d78, instr=0x7fffa09740a8, query_str=0x7fffa09740e0) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:1129 #3 0x0000555555dd06cf in sp_instr_stmt::execute (this=0x7fffa09740a8, thd=0x7fffa0000d78, nextp=0x7ffff01c4edc) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:3737 #4 0x0000555555dc9e44 in sp_head::execute (this=0x7fffa096cd00, thd=0x7fffa0000d78, merge_da_on_success=true) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:1442 #5 0x0000555555dccc4c in sp_head::execute_procedure (this=0x7fffa096cd00, thd=0x7fffa0000d78, args=0x7fffa0006088) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sp_head.cc:2485 #6 0x0000555555ecdd31 in do_execute_sp (thd=0x7fffa0000d78, sp=0x7fffa096cd00) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:3064 #7 0x0000555555ece964 in Sql_cmd_call::execute (this=0x7fffa0016ce8, thd=0x7fffa0000d78) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:3310 #8 0x0000555555ed8a3a in mysql_execute_command (thd=0x7fffa0000d78, is_called_from_prepared_stmt=false) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:6124 #9 0x0000555555edeabe in mysql_parse (thd=0x7fffa0000d78, rawbuf=0x7fffa0016c40 "call p1()", length=9, parser_state=0x7ffff01c6360) at /home/psergey/dev-git2/10.6-review-mdev33971-name_const/sql/sql_parse.cc:8146 subst_spvars does this: /* Find rewritable Items used in this statement */ for (Item *item= instr->free_list; item; item= item->next) { Rewritable_query_parameter *rqp= item->get_rewritable_query_parameter(); get_rewritable_query_parameter() is non-NULL for Item_splocal and Item_param. Rewritable_query_parameter has my_ptrdiff_t pos_in_query; uint len_in_query; virtual bool append_for_log(THD *thd, String *str) = 0; That is, Items use it to replace themselves in the original query text (which is just String). At the time the substitution is made, Items that have no idea where in the query they are located (and whether it's the same part of the query that they were in the original query text). Because of this, fixing the MDEV by "do not produce NAME_CONST(...) calls where it is not necessary" is hard.

Sergei Petrunia added a comment - 2024-05-14 11:11

Not producing NAME_CONST vs removing NAME_CONST on execution.

Not producing NAME_CONST when it is not necessary

+: is cleaner.
-: Is a more intrusive fix. If we skip NAME_CONST where we should not have, this will produce broken binary logs.

Removing NAME_CONST when we've got a query with it.

+: Is a more conservative fix, although not as clean.

Sergei Petrunia added a comment - 2024-05-14 11:11 Not producing NAME_CONST vs removing NAME_CONST on execution. Not producing NAME_CONST when it is not necessary +: is cleaner. -: Is a more intrusive fix. If we skip NAME_CONST where we should not have, this will produce broken binary logs. Removing NAME_CONST when we've got a query with it. +: Is a more conservative fix, although not as clean.

Sergei Petrunia made changes - 2024-05-14 12:19

Assignee	Sergei Petrunia [ psergey ]	Dave Gosselin [ JIRAUSER52216 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Sergei Petrunia added a comment - 2024-05-14 12:20

Review input provided in https://github.com/MariaDB/server/pull/3236

Sergei Petrunia added a comment - 2024-05-14 12:20 Review input provided in https://github.com/MariaDB/server/pull/3236

Dave Gosselin made changes - 2024-05-14 15:39

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Dave Gosselin made changes - 2024-05-14 20:31

Assignee	Dave Gosselin [ JIRAUSER52216 ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Petrunia added a comment - 2024-05-17 14:28

When trying to see if we could always remove NAME_CONST, I've stumbled on a case where removing it creates issues even in the WHERE clause:

create procedure p1(var1 varchar(10))

select coercibility(var1);

call p1('abc');

coercibility(var1)

SP variable has coercibility =2.

COERCIBILITY(NAME_CONST('name','test'))

NAME_CONST has the same, good.
String literal has:

SELECT COERCIBILITY('test');

COERCIBILITY('test')

Moreover, one can get NAME_CONST with string literals with explicit collation (which have COERCIBILITY=0):

--source include/have_log_bin.inc

create table t1 (

  a varchar(100) collate utf8_unicode_ci,

  b int

);

insert into t1 values ('foo', 1),('bar', 1);

create procedure p1(var1 varchar(10))

  update t1 set b=b+1 where a=var1;

call p1('foo');

drop procedure p1;

drop table t1;

show binlog events;

gives

master-bin.000001       1003    Query   1       1158    use `test`;

update t1 set b=b+1 where a= NAME_CONST('var1',_latin1'foo' COLLATE 'latin1_swedish_ci')

Trying to see how replication slave would work:

--source include/have_log_bin.inc

create table t1 (

  a varchar(100) collate utf8_unicode_ci,

  b int

);

insert into t1 values ('foo', 1),('bar', 1);

explain format=json

update t1 set b=b+1 where a= NAME_CONST('var1',_latin1'foo' COLLATE 'latin1_swedish_ci');

drop table t1;

before this MDEV, I get:

      "attached_condition": "t1.a = convert(NAME_CONST('var1',_latin1'foo' collate latin1_swedish_ci) using utf8mb3)"

after this MDEV, I get:

mysqltest: At line 8: query 'explain format=json

update t1 set b=b+1 where a= NAME_CONST('var1',_latin1'foo' COLLATE 'latin1_swedish_ci')' failed: ER_CANT_AGGREGATE_2COLLATIONS (1267): Illegal mix of collations (utf8mb3_unicode_ci,IMPLICIT) and (latin1_swedish_ci,EXPLICIT) for operation '='

Gosselin, any thoughts?

Sergei Petrunia added a comment - 2024-05-17 14:28 When trying to see if we could always remove NAME_CONST, I've stumbled on a case where removing it creates issues even in the WHERE clause: create procedure p1(var1 varchar (10)) select coercibility(var1); call p1( 'abc' ); coercibility(var1) 2 SP variable has coercibility =2. COERCIBILITY(NAME_CONST('name','test')) 2 NAME_CONST has the same, good. String literal has: SELECT COERCIBILITY('test'); COERCIBILITY('test') 4 Moreover, one can get NAME_CONST with string literals with explicit collation (which have COERCIBILITY=0): --source include/have_log_bin.inc create table t1 ( a varchar (100) collate utf8_unicode_ci, b int ); insert into t1 values ( 'foo' , 1),( 'bar' , 1); create procedure p1(var1 varchar (10)) update t1 set b=b+1 where a=var1; call p1( 'foo' ); drop procedure p1; drop table t1; show binlog events; gives master-bin.000001 1003 Query 1 1158 use `test`; update t1 set b=b+1 where a= NAME_CONST('var1',_latin1'foo' COLLATE 'latin1_swedish_ci') Trying to see how replication slave would work: --source include/have_log_bin.inc create table t1 ( a varchar (100) collate utf8_unicode_ci, b int ); insert into t1 values ( 'foo' , 1),( 'bar' , 1); explain format=json update t1 set b=b+1 where a= NAME_CONST( 'var1' ,_latin1 'foo' COLLATE 'latin1_swedish_ci' ); drop table t1; before this MDEV, I get: "attached_condition": "t1.a = convert(NAME_CONST('var1',_latin1'foo' collate latin1_swedish_ci) using utf8mb3)" after this MDEV, I get: mysqltest: At line 8: query 'explain format=json update t1 set b=b+1 where a= NAME_CONST('var1',_latin1'foo' COLLATE 'latin1_swedish_ci')' failed: ER_CANT_AGGREGATE_2COLLATIONS (1267): Illegal mix of collations (utf8mb3_unicode_ci,IMPLICIT) and (latin1_swedish_ci,EXPLICIT) for operation '=' Gosselin , any thoughts?

Sergei Petrunia made changes - 2024-05-17 14:32

Assignee	Sergei Petrunia [ psergey ]	Dave Gosselin [ JIRAUSER52216 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Dave Gosselin made changes - 2024-05-21 19:49

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Dave Gosselin made changes - 2024-05-22 15:51

Assignee	Dave Gosselin [ JIRAUSER52216 ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Julien Fritsch made changes - 2024-05-23 08:23

Labels

triage

Julien Fritsch made changes - 2024-05-23 08:23

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Petrunia added a comment - 2024-05-27 09:26

Got questions on the updated pull request.

Sergei Petrunia added a comment - 2024-05-27 09:26 Got questions on the updated pull request.

Sergei Petrunia made changes - 2024-05-27 09:56

Assignee	Sergei Petrunia [ psergey ]	Dave Gosselin [ JIRAUSER52216 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Julien Fritsch made changes - 2024-05-29 14:03

Fix Version/s

10.6 [ 24028 ]

Dave Gosselin made changes - 2024-05-31 17:52

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Sergei Petrunia added a comment - 2024-06-07 12:17 - edited

The patch uses

   thd->change_item_tree(ref, value_item);

call.
Should it do this or just do ?

  *ref=value_item;

instead?

Discussed this with sanja_byelkin. Take-aways:
change_item_tree() is to be used when we want/need the change to be rolled back at the end of execution.
In our case, this seems to be unnecessary. Although it's not harmful, either.

or wait.. what if our argument is Item_func_set_collation...

Sergei Petrunia added a comment - 2024-06-07 12:17 - edited The patch uses thd->change_item_tree( ref , value_item); call. Should it do this or just do ? * ref =value_item; instead? Discussed this with sanja_byelkin . Take-aways: change_item_tree() is to be used when we want/need the change to be rolled back at the end of execution. In our case, this seems to be unnecessary. Although it's not harmful, either. or wait.. what if our argument is Item_func_set_collation...

Dave Gosselin added a comment - 2024-06-07 12:42

If there's no harm in leaving it as is, then let's leave it, especially if that's the more general case anyhow.

Dave Gosselin added a comment - 2024-06-07 12:42 If there's no harm in leaving it as is, then let's leave it, especially if that's the more general case anyhow.

Dave Gosselin made changes - 2024-06-14 12:16

Assignee	Dave Gosselin [ JIRAUSER52216 ]	Sergei Petrunia [ psergey ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Jira Automation (IT) made changes - 2024-07-04 00:11

Zendesk Related Tickets		201680
Zendesk active tickets		201680

Sergei Petrunia added a comment - 2024-07-10 12:41

Review input provided. Almost there.

Sergei Petrunia added a comment - 2024-07-10 12:41 Review input provided. Almost there.

Sergei Petrunia made changes - 2024-07-10 12:41

Assignee	Sergei Petrunia [ psergey ]	Dave Gosselin [ JIRAUSER52216 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Dave Gosselin made changes - 2024-07-11 17:11

Fix Version/s		10.6.19 [ 29833 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

JiraAutomate made changes - 2024-07-11 17:11

Fix Version/s		10.11.9 [ 29834 ]
Fix Version/s		11.1.6 [ 29835 ]
Fix Version/s		11.2.5 [ 29836 ]
Fix Version/s		11.4.3 [ 29837 ]

Ralf Gebhardt made changes - 2024-07-18 07:24

Link

This issue blocks TODO-4803 [ TODO-4803 ]

Jira Automation (IT) made changes - 2024-07-19 00:59

Zendesk Related Tickets	201680
Zendesk active tickets	201680

Julien Fritsch made changes - 2024-08-08 12:42

Labels

triage

People

Assignee:: Dave Gosselin

Reporter:: Valerii Kravchuk

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2024-04-23 08:30

Updated:: 2024-08-08 12:42

Resolved:: 2024-07-11 17:11

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.