[MDEV-24204] inconsistent query results using a LEFT JOIN Created: 2020-11-12  Updated: 2022-04-06  Resolved: 2022-04-06

Status: Closed
Project: MariaDB Server
Component/s: N/A
Affects Version/s: 10.3.25, 10.3, 10.4, 10.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Patrice Duroux Assignee: Oleg Smirnov
Resolution: Duplicate Votes: 1
Labels: None
Environment:

Ubuntu and Debian


Attachments: File 1_my.test     File 3DstructureDB_ANC.sql.gz     File cnf.tgz     File db.sql.7z     File taxonomy.sql.gz    
Issue Links:
Duplicate
duplicates MDEV-25128 Wrong result from join with material... Closed

 Description   

Hi,

After trying without success to post my trouble on the mariadb-discuss mailing list, so I posted it here:
https://alioth-lists.debian.net/pipermail/pkg-mysql-maint/2020-November/014430.html
with one follow up here:
https://alioth-lists.debian.net/pipermail/pkg-mysql-maint/2020-November/014445.html

This is strange because the same query works fine both on MySQL 8.0.21 (Ubuntu) and MySQL 5.7.26 (Debian).

I don't know how to start with this and I am not familiar with the EXPLAIN output.

Thanks,
Patrice



 Comments   
Comment by Patrice Duroux [ 2020-11-12 ]

One may observe this effect using our public website.

Here the result is complete (including values for the broken case):
http://www.imgt.org/3Dstructure-DB/cgi/3Dquery.cgi?type-entry=PDB

Here again the result is complete:
http://www.imgt.org/3Dstructure-DB/cgi/3Dquery.cgi?type-entry=PDB&ReceptorType=TR

Here the 3 LEFT JOIN columns (molecule name, species, receptor description) are all empty:
http://www.imgt.org/3Dstructure-DB/cgi/3Dquery.cgi?type-entry=PDB&ReceptorType=IG

Comment by Alice Sherepa [ 2020-11-13 ]

Could you please add SHOW CREATE TABLE and the problematic query?

Comment by Patrice Duroux [ 2020-11-13 ]

Hi,

The real query is more complex but even reduced to the following one, the problem presists:

SELECT pdbcode AS code,name,description,scientific_name,typcode,complex_name
FROM PDB
LEFT JOIN (
SELECT pdbcode,GROUP_CONCAT(DISTINCT CASE name_INN WHEN '' THEN CASE name_commercial WHEN '' THEN name_common ELSE CONCAT(name_common,'; ',name_commercial) END ELSE CASE name_commercial WHEN '' THEN CONCAT(name_INN,'; ',name_common) ELSE CONCAT(name_INN,'; ',name_common,'; ',name_commercial) END END ORDER BY name_common SEPARATOR '; ') AS name,GROUP_CONCAT(DISTINCT StrucQuat.description ORDER BY name_common SEPARATOR '; ') AS description,GROUP_CONCAT(DISTINCT scientific_name SEPARATOR ', ') AS scientific_name
FROM StrucQuat
LEFT JOIN Chain USING (quatcode)
LEFT JOIN taxonomy.organism_new co ON co.organism_id=Chain.especeid
JOIN PDB USING (pdbcode)
WHERE (Immuno=1 OR typid!=1)
GROUP BY pdbcode
) AS n USING (pdbcode)
LEFT JOIN TypeEntry USING (typid)
WHERE typcode='INN' AND pdbcode IN (SELECT DISTINCT pdbcode FROM StrucQuat WHERE typeprot='IG')

Only the final clause WHERE may change regarding the previous given cases.
Which tables would you like to 'SHOW' ?

Comment by Alice Sherepa [ 2020-11-13 ]

SHOW CREATE TABLE PDB;
SHOW CREATE TABLE StrucQuat;
SHOW CREATE TABLE CHAIN;
SHOW CREATE TABLE TypeEntry;
SHOW CREATE TABLE taxonomy.organism_new;
show variables like 'optimizer_switch';

+.cnf file(s)

Comment by Patrice Duroux [ 2020-11-13 ]

CREATE TABLE `PDB` (
`pdbcode` varchar(6) COLLATE utf8_bin NOT NULL,
`refid` int(10) unsigned DEFAULT NULL,
`experiment` varchar(50) COLLATE utf8_bin DEFAULT NULL,
`resolution` varchar(5) COLLATE utf8_bin DEFAULT NULL,
`release_date` varchar(10) COLLATE utf8_bin DEFAULT NULL,
`UID` varchar(30) COLLATE utf8_bin DEFAULT NULL,
`remarques` mediumtext COLLATE utf8_bin DEFAULT NULL,
`typid` int(10) unsigned NOT NULL,
`defid` int(10) unsigned DEFAULT NULL,
`complex_name` varchar(400) COLLATE utf8_bin DEFAULT NULL,
`IMGT_release_date` varchar(10) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`pdbcode`),
KEY `refid` (`refid`),
KEY `FK` (`typid`),
KEY `defid` (`defid`),
CONSTRAINT `PDB_ibfk_1` FOREIGN KEY (`typid`) REFERENCES `TypeEntry` (`typid`),
CONSTRAINT `PDB_ibfk_3` FOREIGN KEY (`refid`) REFERENCES `referenceNEW` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

CREATE TABLE `StrucQuat` (
`quatcode` varchar(8) COLLATE utf8_bin NOT NULL,
`name_common` varchar(250) CHARACTER SET utf8 DEFAULT NULL,
`name_INN` varchar(100) CHARACTER SET utf8 NOT NULL,
`name_commercial` varchar(100) CHARACTER SET utf8 NOT NULL,
`typeprot` varchar(30) COLLATE utf8_bin DEFAULT NULL,
`Immuno` tinyint(1) NOT NULL,
`pdbcode` varchar(6) COLLATE utf8_bin NOT NULL,
`complexcode` varchar(6) COLLATE utf8_bin DEFAULT NULL,
`description` varchar(255) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`quatcode`),
KEY `pdbcode` (`pdbcode`),
KEY `complexcode` (`complexcode`),
KEY `typeprot` (`typeprot`),
KEY `Immuno` (`Immuno`),
KEY `description` (`description`),
CONSTRAINT `StrucQuat_ibfk_1` FOREIGN KEY (`pdbcode`) REFERENCES `PDB` (`pdbcode`) ON DELETE CASCADE,
CONSTRAINT `StrucQuat_ibfk_2` FOREIGN KEY (`complexcode`) REFERENCES `Complex` (`complexcode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

CREATE TABLE `Chain` (
`chaincode` varchar(7) COLLATE utf8_bin NOT NULL,
`quatcode` varchar(8) COLLATE utf8_bin NOT NULL,
`espece` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`remarques` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`especeid` int(11) DEFAULT NULL,
`seqextent` varchar(9) COLLATE utf8_bin DEFAULT NULL,
`extent` varchar(9) COLLATE utf8_bin DEFAULT NULL,
`description` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`dbref` varchar(10) COLLATE utf8_bin DEFAULT NULL,
`conflictsdbref` varchar(100) COLLATE utf8_bin DEFAULT NULL,
`accnumdbref` varchar(20) COLLATE utf8_bin DEFAULT NULL,
`iddbref` varchar(20) COLLATE utf8_bin DEFAULT NULL,
`conflictsres` varchar(100) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`chaincode`),
KEY `quatcode` (`quatcode`),
KEY `especeid` (`especeid`),
KEY `description` (`description`),
CONSTRAINT `Chain_ibfk_1` FOREIGN KEY (`quatcode`) REFERENCES `StrucQuat` (`quatcode`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Chain_ibfk_2` FOREIGN KEY (`especeid`) REFERENCES `taxonomy`.`organism_new` (`organism_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

CREATE TABLE `TypeEntry` (
`typid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`typcode` varchar(3) COLLATE utf8_bin NOT NULL DEFAULT '',
`typdata` varchar(20) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`typid`),
UNIQUE KEY `typcode` (`typcode`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

CREATE TABLE `organism_new` (
`organism_id` int(11) NOT NULL AUTO_INCREMENT,
`scientific_name` varchar(255) COLLATE utf8_bin NOT NULL,
`common_name` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`taxon_id` int(11) DEFAULT NULL,
`version` int(11) NOT NULL DEFAULT 0,
`rank` varchar(255) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`organism_id`),
UNIQUE KEY `UK_orga_taxon_id` (`taxon_id`),
KEY `IDX_orga_sci_name` (`scientific_name`),
KEY `IDX_orga_com_name` (`common_name`),
KEY `rank` (`rank`)
) ENGINE=InnoDB AUTO_INCREMENT=2257661 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_con
dition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=o
n,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_wi
th_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=
on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on

Comment by Patrice Duroux [ 2020-11-13 ]

cnf.tgz

Comment by Patrice Duroux [ 2020-11-16 ]

Hi,

I have done a test on a Debian Sid system using the recent push of the mariadb server version 10.5.8 and the situation is worst now because the problem occurs more often. By that I mean for the same query changing some constant values (like 'IG' to 'TR') in its WHERE clause. In a way it gained some consistency.

Comment by Alice Sherepa [ 2020-11-16 ]

Could you please provide the test case to demonstrate the problem? I tried to reproduce but failed

Comment by Patrice Duroux [ 2020-11-16 ]

That exactly also my point. Currently I didn't manage to get a « small » test case that I can exchange with you.
I just have done some tests using different systems by dumping all the related databases.
Just I was expecting another way to see what going on with this. But now I will work on this.

Comment by Patrice Duroux [ 2020-11-16 ]

Here are the dump of the two databases. 3DstructureDB_ANC.sql.gz taxonomy.sql.gz

Comment by Alice Sherepa [ 2020-11-16 ]

Thanks a lot! Now I can repeat it, trying to simplify it a little bit.
Please try to set optimizer_switch='materialization=off' or 'semijoin=off'; as a temporary workaround.

MariaDB [test]> set optimizer_switch='semijoin=off';
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> SELECT scientific_name
    -> FROM pdb
    -> LEFT JOIN (
    -> SELECT pdbcode, group_concat(scientific_name) AS scientific_name
    -> FROM strucquat
    ->  JOIN chain USING (quatcode)
    -> LEFT JOIN organism_new co ON co.organism_id=chain.especeid
    -> GROUP BY pdbcode) AS n USING (pdbcode)
    -> WHERE pdbcode IN (SELECT  pdbcode FROM strucquat)
    -> limit 5;
+----------------------------------------------------------+
| scientific_name                                          |
+----------------------------------------------------------+
| Mus musculus,Mus musculus,Mus musculus,Mus musculus      |
| Mus musculus,Mus musculus                                |
| Mus musculus,Mus musculus                                |
| Mus musculus,Mus musculus,Influenza A virus              |
| Homo sapiens,Homo sapiens,Human immunodeficiency virus 2 |
+----------------------------------------------------------+
5 rows in set (0.294 sec)
 
MariaDB [test]> set optimizer_switch='semijoin=on';
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> SELECT scientific_name FROM pdb LEFT JOIN ( SELECT pdbcode, group_concat(scientific_name) AS scientific_name FROM strucquat  JOIN chain USING (quatcode) LEFT JOIN organism_new co ON co.organism_id=chain.especeid GROUP BY pdbcode) AS n USING (pdbcode) WHERE pdbcode IN (SELECT  pdbcode FROM strucquat) limit 5;
+-----------------+
| scientific_name |
+-----------------+
| NULL            |
| NULL            |
| NULL            |
| NULL            |
| NULL            |
+-----------------+
5 rows in set (0.293 sec)
 
MariaDB [test]> set optimizer_switch='materialization=off';
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> SELECT scientific_name FROM pdb LEFT JOIN ( SELECT pdbcode, group_concat(scientific_name) AS scientific_name FROM strucquat  JOIN chain USING (quatcode) LEFT JOIN organism_new co ON co.organism_id=chain.especeid GROUP BY pdbcode) AS n USING (pdbcode) WHERE pdbcode IN (SELECT  pdbcode FROM strucquat) limit 5;
+----------------------------------------------------------+
| scientific_name                                          |
+----------------------------------------------------------+
| Mus musculus,Mus musculus,Mus musculus,Mus musculus      |
| Mus musculus,Mus musculus                                |
| Mus musculus,Mus musculus                                |
| Mus musculus,Mus musculus,Influenza A virus              |
| Homo sapiens,Homo sapiens,Human immunodeficiency virus 2 |
+----------------------------------------------------------+
5 rows in set (0.012 sec)

Comment by Alice Sherepa [ 2020-11-17 ]

Repeatable on 10.3-10.5 with InnoDB/MyIsam.
I attached db.sql - test with the initial data -to be checked after the fix - and 1_my.test - slightly shorter version of the test

Comment by Alice Sherepa [ 2022-04-06 ]

This was fixed in 480a06718d137c9ee7784012cc by Igor Babaev (10.3.29, MDEV-25128)

Generated at Thu Feb 08 09:28:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.