[MDEV-32907] spider incorrectly translates sum() query to a column Created: 2023-11-29  Updated: 2023-12-18

Status: Confirmed
Project: MariaDB Server
Component/s: Embedded Server, Storage Engine - Spider
Affects Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3

Type: Bug Priority: Major
Reporter: Iacovos Botsaris Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: spider, spider-gbh

Issue Links:
Relates
relates to MDEV-32273 Broken queries are passed to the grou... Open

 Description   

Hello,
I am running a MariaDB Standalone node, which acts as a Spider proxy to multiple back end Data Nodes.

These are the spider configuration options I have in place:

[mysqld]
plugin_load_add = "ha_spider"
# Spider engine settings
spider_skip_default_condition=0
spider_quick_mode=0
spider_direct_dup_insert=0
spider_direct_order_limit=1
spider_sts_bg_mode=1
spider_crd_sync=2
spider_crd_interval=0
spider_bka_mode=0
spider_bgs_mode=0
spider_multi_split_read=1
spider_bka_engine=memory
spider_index_hint_pushdown=1
spider_use_pushdown_udf=1
spider_remote_autocommit=1
spider_connect_timeout=1800
spider_net_read_timeout=1800
spider_net_write_timeout=1800
spider_local_lock_table=0
spider_bulk_update_mode=2
spider_general_log=ON
spider_log_result_errors=2
spider_log_result_error_with_sql=2
spider_conn_recycle_mode=1
use_stat_tables=preferably
spider_auto_increment_mode=2
# spider_sync_trx_isolation=
# log-error = /var/lib/mysql/error_spider.log
# general_log_file = /var/lib/mysql/query_spider.log

The problem I have encountered, is related with combining the IFNULL() function with the SUM() function.

A simple example is the following:

MariaDB [testing]>  SELECT target_1, NVL(SUM(target_2),0) AS TARGET_2 FROM testing.metrics limit 1;
ERROR 1054 (42S22): Unknown column 'SUM(target_2)' in 'field list'

Whereas, while using COALESCE ,on the spider, this works as expected:

MariaDB [testing]> SELECT target_1, SUM(COALESCE(CASE WHEN target_2 IS NOT NULL THEN target_2 ELSE 0 END)) AS TARGET_2 FROM testing.metrics limit 1;
+-------------+-----------------+
| target_1 | TARGET_2 |
+-------------+-----------------+
|     5000    |    30474       |
+--------------+----------------+

Essentially, the spider tries to find a column 'SUM(target_2)' instead of evaluating the function.

Here is the create table statement on the Spider:

CREATE TABLE `metrics` (
  `id` int(3) NOT NULL AUTO_INCREMENT,
  `code` varchar(20) NOT NULL DEFAULT '',
  `description` varchar(64) NOT NULL,
  `target_1` int(5) NOT NULL,
  `target_2` int(5) NOT NULL,
  `target_3` int(5) NOT NULL,
  `target_4` int(5) NOT NULL,
  `target_5` int(3) unsigned DEFAULT 0,
  `branch` varchar(4) NOT NULL,
  `enabled` tinyint(1) NOT NULL DEFAULT 1,
  `datetime_created` timestamp(3) NOT NULL DEFAULT current_timestamp(3),
  `datetime_updated` timestamp(3) NOT NULL DEFAULT current_timestamp(3) ON UPDATE current_timestamp(3),
  PRIMARY KEY (`id`),
  UNIQUE KEY `uq_description_branch` (`description`,`branch`)
) ENGINE=SPIDER AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci COMMENT='wrapper "mysql", srv "dataNode2", database "testing", table "metrics"'

And here is the create table statement on the dataNode:

CREATE TABLE `metrics` (
  `id` int(3) NOT NULL AUTO_INCREMENT,
  `code` varchar(20) NOT NULL DEFAULT '',
  `description` varchar(64) NOT NULL,
  `target_1` int(5) NOT NULL,
  `target_2` int(5) NOT NULL,
  `target_3` int(5) NOT NULL,
  `target_4` int(5) NOT NULL,
  `target_5` int(3) unsigned DEFAULT 0,
  `branch` varchar(4) NOT NULL,
  `enabled` tinyint(1) NOT NULL DEFAULT 1,
  `datetime_created` timestamp(3) NOT NULL DEFAULT current_timestamp(3),
  `datetime_updated` timestamp(3) NOT NULL DEFAULT current_timestamp(3) ON UPDATE current_timestamp(3),
  PRIMARY KEY (`id`),
  UNIQUE KEY `uq_description_branch` (`description`,`branch`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci

Of course, when I run the query directly against the dataNode, it executes successfully:

SELECT target_1, NVL(SUM(target_2),0) AS TARGET_2 FROM testing.metrics limit 1;
+-------------+-----------------+
| target_1 | TARGET_2 |
+-------------+-----------------+
|     5000    |    30474       |
+--------------+----------------+

If removing SUM(), or IFNULL(), the query executes; the error occurs in the combination of both, whatever the query.

IFNULL(SUM...) fails in all queries.
charsets and collation are identical across the nodes, if relevant.

Thank you



 Comments   
Comment by Yuchen Pei [ 2023-11-30 ]

Hi, thanks for reporting. I could reproduce it in a simple case in
10.11 3e90efe4c98f00e96e081bd1c571ea1e25b06258 with the following
case:

--echo #
--echo # MDEV-32907
--echo #
--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
evalp CREATE SERVER srv FOREIGN DATA WRAPPER mysql
OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
 
create table t2 (c int);
create table t1 (c int) ENGINE=Spider
COMMENT='WRAPPER "mysql", srv "srv",TABLE "t2"';
 
insert into t1 values (3), (NULL);
select nvl(sum(c), 0) from t1;
# select sum(c) from t1;
# select sum(nvl(c, 0)) from t1;
drop table t1, t2;
drop server srv;
 
--disable_query_log
--disable_result_log
--source ../../t/test_deinit.inc
--enable_result_log
--enable_query_log
--echo #
--echo # end of test mdev_32907
--echo #

However, sum(nvl(c, 0)) works, have you tried to swap NVL with SUM in
your case, i.e. SUM(NVL(target_2,0)) ibotsaris?

Comment by Yuchen Pei [ 2023-11-30 ]

I could reproduce it at 10.5 387b92df97e70680641ad0bcaed83b44373f13c5.
It is caused by the query generated by the group by handler of course:

select sum(t0.`c`) `sum(c)`,(ifnull(`sum(c)` , 0)) `nvl(sum(c), 0)` from `test`.`t2` t0

Comment by Yuchen Pei [ 2023-11-30 ]

However, when trying to build in 10.4
cd79f102110a3543bc78ef4bec4dbeadaf1f1a83 I get strange mtr failures
(for any test I tried, including the one in the previous comment, not
just the one below)

Could not execute 'check-testcase' before testcase 'spider/bugfix.mdev_29562' (res: 1):
mysqltest: Logging to '/home/ycp/source/mariadb-server/10.4/build/mysql-test/var/tmp/check-mysqld_2_1.log'.
mysqltest: Results saved in '/home/ycp/source/mariadb-server/10.4/build/mysql-test/var/tmp/check-mysqld_2_1.result'.
=================================================================
==313014==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5614259cc098 at pc 0x7ff4d2c5e061 bp 0x7fff379f5180 sp 0x7fff379f4930
WRITE of size 64 at 0x5614259cc098 thread T0
    #0 0x7ff4d2c5e060 in __interceptor_regcomp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:7909
    #1 0x5614253c79ba in init_re_comp(regex_t*, char const*) /home/ycp/source/mariadb-server/10.4/src/client/mysqltest.cc:9391
    #2 0x5614253c7adb in init_re /home/ycp/source/mariadb-server/10.4/src/client/mysqltest.cc:9480
    #3 0x5614253c984e in main /home/ycp/source/mariadb-server/10.4/src/client/mysqltest.cc:9864
    #4 0x7ff4d26461c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #5 0x7ff4d2646284 in __libc_start_main_impl ../csu/libc-start.c:360
    #6 0x5614253980f0 in _start (/home/ycp/source/mariadb-server/10.4/build/client/mysqltest+0x10e0f0)

Comment by Roel Van de Paar [ 2023-11-30 ]

Confirmed, all versions 10.4+ dbg+opt

Comment by Iacovos Botsaris [ 2023-11-30 ]

Hello Yuchen Pei, thank you for the prompt response.
Yes, I did try inverting the functions, and the queries get executed.
Nevertheless, if I were to rewrite all queries, I would opt for COALESCE.
Resolving the issue would be greatly appreciated.
Thank you

Comment by Yuchen Pei [ 2023-11-30 ]

> Nevertheless, if I were to rewrite all queries, I would opt for COALESCE.

May I ask why?

Comment by Iacovos Botsaris [ 2023-12-02 ]

Hello,
Just to be consistent across queries - technologies etc.
Majority of queries in our case were written using CASE, so if I were to rewrite everything I would try and standardize it somehow.
Nevertheless, reality is that rewriting all queries is not easily achievable, so again, if this can be resolved it will be greatly appreciated.
Thank you

Comment by Yuchen Pei [ 2023-12-14 ]

I did some further analysis. The problem is that at exec stage, when
init_scan() is called on the spider group by handler, the optimizer
has already transformed the query to an invalid one (the `sum(c)`
inside ifnull is an invalid column):

(rr) dbp select_lex
$13 = 0x56302c8019c0 <dbug_item_print_buf> "select ifnull(`sum(c)`,0) AS `nvl(sum(c), 0)` from t1"

However, at optimizer stage, when the spider group by handler is
created, the query is still ok:

(rr) dbp select_lex
$12 = 0x56302c8019c0 <dbug_item_print_buf> "select ifnull(sum(t1.c),0) AS `nvl(sum(c), 0)` from t1"

So, this is an example of MDEV-32273.

Generated at Thu Feb 08 10:34:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.