[MDEV-14907] FEDERATEDX doesn't respect DISTINCT Created: 2018-01-09  Updated: 2023-01-30  Resolved: 2022-01-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Federated
Affects Version/s: 10.0, 10.1, 10.2.6, 10.2.12, 10.2
Fix Version/s: 10.4.23, 10.5.14, 10.6.6, 10.7.2

Type: Bug Priority: Major
Reporter: Brian Grossman Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None


 Description   

The FEDERATEDX engine doesn't seem to respect DISTINCT. I see the following behavior in 10.2.6 and 10.2.12. (Names and values redacted.)

Here's a table FOO in bar.

CREATE TABLE `FOO` (
`foo_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`foo_name` varchar(255) DEFAULT NULL,
`parent_foo_id` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`foo_id`),
KEY `foo_name` (`foo_name`),
KEY `parent_foo_id` (`parent_foo_id`)
) ENGINE=InnoDB AUTO_INCREMENT=978304 DEFAULT CHARSET=utf8

There's another table bar_FOO in another mariadb instance, defined the same except for using FEDERATEDX:

CREATE TABLE `bar_FOO` (
...
) ENGINE=FEDERATED DEFAULT CHARSET=utf8 CONNECTION='mysql://user:pw@bar-host:3306/bar/FOO'

The FEDERATEDX plugin is installed. From show plugins:

+-------------------------------+----------+--------------------+------------------+---------+
| Name                          | Status   | Type               | Library          | License |
+-------------------------------+----------+--------------------+------------------+---------+
| ...                           |          |                    |                  |         |
| FEDERATED                     | ACTIVE   | STORAGE ENGINE     | ha_federatedx.so | GPL     |
+-------------------------------+----------+--------------------+------------------+---------+

Here's the query run against the InnoDB table:

MariaDB [bar]> select DISTINCT foo_id,parent_foo_id,foo_name from FOO where parent_foo_id = 822857 or foo_name like 'STRING%' order by foo_id ;
+------------+---------------+------------+
| foo_id     | parent_foo_id | foo_name   |
+------------+---------------+------------+
|     822857 |          NULL | STRING     |
|     968903 |        822857 | STRING - 0 |
|     968953 |        822857 | STRING - 1 |
|     971603 |        822857 | STRING - 2 |
|     971803 |        822857 | STRING - 3 |
|     975103 |        822857 | STRING - 4 |
+------------+---------------+------------+
6 rows in set (0.01 sec)

Here's the query run against the FEDERATEDX table. The result is the same with or without DISTINCT.

MariaDB [rab]> select DISTINCT foo_id,parent_foo_id,foo_name from bar_FOO where parent_foo_id = 822857 or foo_name like 'STRING%' order by foo_id ;
+--------+---------------+------------+
| foo_id | parent_foo_id | foo_name   |
+--------+---------------+------------+
| 822857 |          NULL | STRING     |
| 968903 |        822857 | STRING - 0 |
| 968903 |        822857 | STRING - 0 |
| 968953 |        822857 | STRING - 1 |
| 968953 |        822857 | STRING - 1 |
| 971603 |        822857 | STRING - 2 |
| 971603 |        822857 | STRING - 2 |
| 971803 |        822857 | STRING - 3 |
| 971803 |        822857 | STRING - 3 |
| 975103 |        822857 | STRING - 4 |
| 975103 |        822857 | STRING - 4 |
+--------+---------------+------------+
11 rows in set (0.00 sec)



 Comments   
Comment by Elena Stepanova [ 2018-01-09 ]

Thanks for the report and test case, reproducible as described.

MTR-friendly version of test case

INSTALL SONAME 'ha_federatedx';
 
CREATE TABLE t1 (
  `foo_id` bigint(20) unsigned NOT NULL,
  `foo_name` varchar(255) DEFAULT NULL,
  `parent_foo_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`foo_id`),
  KEY `foo_name` (`foo_name`),
  KEY `parent_foo_id` (`parent_foo_id`)
) DEFAULT CHARSET=utf8;
 
eval CREATE TABLE `fed_t1` ENGINE=FEDERATED DEFAULT CHARSET=utf8 CONNECTION='mysql://root@127.0.0.1:$MASTER_MYPORT/test/t1';
 
INSERT INTO t1 VALUES (968903, 'STRING - 0', 822857);
INSERT INTO t1 VALUES (968953, 'STRING - 1', 822857);
INSERT INTO t1 VALUES (971603, 'STRING - 2', 822857);
INSERT INTO t1 VALUES (971803, 'STRING - 3', 822857);
INSERT INTO t1 VALUES (975103, 'STRING - 4', 822857);
INSERT INTO t1 VALUES (822857, 'STRING', NULL);
 
select foo_id,parent_foo_id,foo_name from t1 where parent_foo_id = 822857 or foo_name like 'STRING%';
select foo_id,parent_foo_id,foo_name from fed_t1 where parent_foo_id = 822857 or foo_name like 'STRING%';
 
# Cleanup
DROP TABLE fed_t1, t1;
UNINSTALL SONAME 'ha_federatedx';

The queries internally produced by Federated are awful, unfortunately JIRA cannot handle them pasted as they are, so I replaced strings of \0 and FF below.

5 Query     SELECT `foo_id`, `foo_name`, `parent_foo_id` FROM `t1` WHERE  ( (`parent_foo_id` = 822857) ) AND ( (1=1) )
5 Query     SELECT `foo_id`, `foo_name`, `parent_foo_id` FROM `t1` WHERE  ( (`foo_name` >= 'STRING <bunch of \0s>') ) AND ( (`foo_name` <= 'STRING<bunch of FFs>') )

Comment by Sergei Golubchik [ 2018-07-14 ]

This is happening because

  • there's an index on parent_foo_id and on foo_name
  • the query has OR, and the optimizer decides to do an index merge
  • meaning, it does two index searches, one for parent_foo_id and one for foo_name.
  • It merges two result sets by looking at row positions
  • In FederatedX a row "position" is the address of MYSQL_ROW result structure in memory
  • So the same row in two different result sets will have a different "position" and the server thinks these are two different rows
Comment by Michael Widenius [ 2022-01-07 ]

Federated::position() and Federatex::position() is storing in 'ref' a
pointer into a local result set buffer. This means that one cannot
compare 'ref' from different handler instances to see if they point to the
same physical record.

This bug caused federated.federatedx to return wrong results when the
optimizer tried to use index_merge to resolve some queries.

Fixed by introducing table flag HA_NON_COMPARABLE_ROWID and using this
with the above handlers.

Future TODO (another task) to fix multi-table updates and deletes with Federated:

  • Fix multi_delete(), multi_update and read_records() to use primary key
    instead of 'ref' if case HA_NON_COMPARABLE_ROWID is set. The current
    code only works if we have only one range (like table scan) for the
    tables that will be updated in the second pass.
  • Enable DBUG_ASSERT() in ha_federated::cmp_ref() and
    ha_federatedx::cmp_ref().
Generated at Thu Feb 08 08:17:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.