[MDEV-32348] Random missing rows from result with jemalloc Created: 2023-10-04  Updated: 2024-02-06

Status: Confirmed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.6.15, 10.11.5
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Major
Reporter: Cuchac Assignee: Igor Babaev
Resolution: Unresolved Votes: 2
Labels: wrong_result
Environment:

official docker image


Attachments: File mariadb_jemalloc-1.sql    

 Description   

Hello,
after enabling jemalloc, we started to get random missing rows from result - sometimes 4 rows, sometimes 3 rows, sometime 2 rows ....

I was able to create minimal reproducible that shows error both with mariadb:10.6.15 and current master quay.io/mariadb-foundation/mariadb-devel:10.6 with jemalloc turned on. Without jemalloc everything works.

This is preview of repro: https://dbfiddle.uk/rXJGvk_h - e-shop with order items, every item has link to product and each product has more sections

The main cause seems to be this part of query. We use subselect in JOIN to join always only one section with highest priority:

...
LEFT JOIN sections s ON s.id = (
SELECT s2.id
FROM sections s2
LEFT JOIN products_sections ps ON s2.id = ps.id_section
WHERE ps.id_product = p.id
ORDER BY s2.priority DESC
LIMIT 1
)
...

when I remove ORDER BY it behaves normally. The problem seems to be that subselect matches two rows with same priority, so it picks random row. And sometimes it leads to completely missing row in result - it is an error, LEFT JOIN should never remove rows from result. When I run this query 10 times, I get 5 times 4 rows, 5 times only 3 rows.

I'm able to test this on newer Mariadb versions if it will help.



 Comments   
Comment by Cuchac [ 2023-10-04 ]

I can reproduce this using official mariadb:10.11.5 with LD_PRELOAD: /usr/lib/x86_64-linux-gnu/libjemalloc.so.2

Comment by Tomas Leypold [ 2023-10-04 ]

I can reproduce it using the official MariaDB docker images for versions: 10.5, 10.6, and 10.11. Versions 10.4 and below are not affected.

1. docker run --name some-mariadb -e LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 -e MYSQL_ROOT_PASSWORD=pwd -d mariadb:10.6
2. Import the mariadb_jemmaloc-1.sql file attached by the OP.
3. Run the last select in the file multiple times."

MariaDB [test]> SELECT p.id, s.id, s.name FROM order_items oi LEFT JOIN products p ON p.id = oi.id_product LEFT JOIN sections s ON s.id = (     SELECT sec.id     FROM sections sec     LEFT JOIN products_sections ps ON sec.id = ps.id_section     WHERE ps.id_product = p.id     ORDER BY sec.priority DESC     LIMIT 1 ) WHERE oi.id_order = '1';
+------+------+-----------+
| id   | id   | name      |
+------+------+-----------+
|    1 |    1 | section 1 |
|    2 | NULL | NULL      |
|    4 | NULL | NULL      |
+------+------+-----------+
3 rows in set (0.000 sec)
 
MariaDB [test]> SELECT p.id, s.id, s.name FROM order_items oi LEFT JOIN products p ON p.id = oi.id_product LEFT JOIN sections s ON s.id = (     SELECT sec.id     FROM sections sec     LEFT JOIN products_sections ps ON sec.id = ps.id_section     WHERE ps.id_product = p.id     ORDER BY sec.priority DESC     LIMIT 1 ) WHERE oi.id_order = '1';
+------+------+-----------+
| id   | id   | name      |
+------+------+-----------+
|    1 |    1 | section 1 |
|    2 | NULL | NULL      |
|    3 | NULL | NULL      |
|    4 | NULL | NULL      |
+------+------+-----------+
4 rows in set (0.000 sec)

Comment by MG [ 2023-10-04 ]

I see the same unstable results with glibc and jemalloc, with tcmalloc seeming to have stable results:

[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl cat mariadb | grep malloc
# Library substitutions. previously [mysqld_safe] malloc-lib with explicit paths
#Environment="LD_PRELOAD=/usr/lib64/libjemalloc.so.2"
#Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4"
[root@localhost ~]# systemctl restart mysqld
[root@localhost ~]# for i in {1..300}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
    298 39f5b6078d1ed519d3cba1b24ea7abe5  -
      1 429c173badd1b5d3264065fc4bbb68ab  -
      1 c8eb0b6192029990c6a57b31116e5853  -

[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl cat mariadb | grep malloc
# Library substitutions. previously [mysqld_safe] malloc-lib with explicit paths
Environment="LD_PRELOAD=/usr/lib64/libjemalloc.so.2"
#Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4"
[root@localhost ~]# systemctl restart mysqld
[root@localhost ~]# for i in {1..300}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
    127 39f5b6078d1ed519d3cba1b24ea7abe5  -
      8 429c173badd1b5d3264065fc4bbb68ab  -
     56 a8bae478aa5d1f637fab4c8d6fee9381  -
    109 ba883c3c02ae91d4570b948336a7f3b3  -

[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl cat mariadb | grep malloc
# Library substitutions. previously [mysqld_safe] malloc-lib with explicit paths
#Environment="LD_PRELOAD=/usr/lib64/libjemalloc.so.2"
Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4"
[root@localhost ~]# systemctl restart mysqld
[root@localhost ~]# for i in {1..300}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
    300 39f5b6078d1ed519d3cba1b24ea7abe5  -

Comment by MG [ 2023-10-05 ]

I can only reproduce this intermittently on glibc malloc, which is used throughout this comment.

Environment:

[root@localhost ~]# grep MHz /proc/cpuinfo
cpu MHz		: 2593.454
cpu MHz		: 2593.454
[root@localhost ~]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           2
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel
CPU family:          6
Model:               45
Model name:          Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Stepping:            7
CPU MHz:             2593.454
BogoMIPS:            5187.02
Hypervisor vendor:   Xen
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            20480K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp xsaveopt md_clear flush_l1d
 
[root@localhost ~]# lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	Rocky
Description:	Rocky Linux release 8.8 (Green Obsidian)
Release:	8.8
Codename:	GreenObsidian
 
[root@localhost ~]# uname -r
4.18.0-477.27.1.el8_8.x86_64

I was trying to repeat this and noticed that I had a different test running, which I stopped:

[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl cat mariadb | grep malloc
# Library substitutions. previously [mysqld_safe] malloc-lib with explicit paths
#Environment="LD_PRELOAD=/usr/lib64/libjemalloc.so.2"
#Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4"
[root@localhost ~]# systemctl restart mariadb
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -

I restarted that other workload:

[root@localhost ~]# while true; do mysql test -sse 'insert into t3 select null, null'; done;
ERROR 1146 (42S02) at line 1: Table 'test.t3' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'test.t3' doesn't exist
[...]

[root@localhost ~]# ### start insert to test.t3
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   2953 39f5b6078d1ed519d3cba1b24ea7abe5  -
      1 429c173badd1b5d3264065fc4bbb68ab  -
     46 76ce1f2daaf0c347c05cbbdf6c77029e  -

I stopped the other workload again:

[root@localhost ~]# ### stop insert to test.t3
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -

Start different workloads, revert to insert to test.t3:

[root@localhost ~]# while true; do mysql test -sse 'insert into t3 select null, null'; done;
ERROR 1146 (42S02) at line 1: Table 'test.t3' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'test.t3' doesn't exist
[...]

Now I can't reproduce:

[root@localhost ~]# ### start insert to test.t3
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -

Finally, restart mariadb for a second time while test.t3 workload already running

[root@localhost ~]# systemctl restart mariadb
[root@localhost ~]# ### insert to test.t3 already running
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   3000 39f5b6078d1ed519d3cba1b24ea7abe5  -
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   2415 39f5b6078d1ed519d3cba1b24ea7abe5  -
    585 429c173badd1b5d3264065fc4bbb68ab  -

Reproduces after stopping insert test.t3 and start insert test.t1:

ERROR 1136 (21S01) at line 1: Column count doesn't match value count at row 1
ERROR 1136 (21S01) at line 1: Column count doesn't match value count at row 1
ERROR 1136 (21S01) at line 1: Column count doesn't match value count at row 1

Still inserting to test.t1 where Column count doesn't match value count

[root@localhost ~]# tuned-adm active
Current active profile: virtual-guest
[root@localhost ~]# tuned-adm profile latency-performance
[root@localhost ~]# tuned-adm active
Current active profile: latency-performance

[root@localhost ~]# ## changed tuned
[root@localhost ~]# for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | md5sum; done | sort | uniq -c
   1528 429c173badd1b5d3264065fc4bbb68ab  -
   1472 76ce1f2daaf0c347c05cbbdf6c77029e  -

Unable to test rr:

https://lists.xen.org/archives/html/xen-devel/2017-07/msg02242.html

I know it's not great, but maybe this helps

Comment by Cuchac [ 2023-10-11 ]

Hello,
is there anything I can do to help you with debugging this problem?

I can install debug symbols and provide some debug log or stack trace from query with error or .... ?
I can debug C code but trying to debug this is too much for me without any mariadb internals knowledge.

What about https://sourceware.org/gdb/wiki/ProcessRecord/Tutorial from thread with wrong result? Never used it before but I can try to produce the log.

Comment by Cuchac [ 2023-10-11 ]

Hello again,

I just realized one hopefully important detail. I believe your diagnosis is too strict. Database is allowed to pick "random" row order when sorting and many rows have same sort value. This is exactly the example I created. So using md5sum on result is too strict. It is perfectly correct to have different results.

What database should NOT do is to throw away row. That is an error. So I believe you should use command:

for i in {1..3000}; do mysql mdev32348 < MDEV-32348.txt.query | wc -l; done | sort | uniq -c

to detect the error I'm reporting.

I'm able to reproduce this wrong behavior (different rows count) only using jemalloc.

$ ## Using glibc
$ for i in {1..3000}; do mysql jemalloc < jemalloc.sql | wc -l; done | sort | uniq -c
   3000 5
 
$ ## Using jemalloc
$ for i in {1..3000}; do mysql jemalloc < jemalloc.sql | wc -l; done | sort | uniq -c
    606 4
   2394 5

Comment by Sergei Golubchik [ 2023-10-12 ]

Thanks, I've repeated it. The following test file (your test case, columns and tables renamed for brevity):

source include/have_innodb.inc;
#select @@version_malloc_library;
create table o (o1 int(11) not null, o2 int(11) not null, key o2 (o2)) engine=innodb;
insert o values (1, 1), (1, 2), (1, 3), (1, 4);
create table p (p1 int(11) primary key) engine=innodb;
insert p values (1), (2), (3), (4);
create table ps (ps1 int(11) not null, ps2 int(11) not null, primary key (ps1,ps2), key ps2 (ps2)) engine=innodb;
insert ps  values (1, 1), (1, 3), (2, 1), (2, 3), (3, 3), (4, 3);
create table s (s1 int(11) primary key, s2 varchar(100) not null, s3 int(11) not null) engine=innodb;
insert s values (1, 's:1', 1), (2, 's:2', 1), (3, 's:3', 1), (4, 's:4', 1);
 
replace_column 2 # 3 #;
SELECT p1, s1, s2
FROM o
LEFT JOIN p ON p1 = o2
LEFT JOIN s ON s1 = (
    SELECT s2.s1
    FROM s s2
    LEFT JOIN ps ON s2.s1 = ps1
    WHERE ps2 = p1
    ORDER BY s2.s3 DESC
    LIMIT 1
)
WHERE o1 = '1';
 
drop tables ps,s,o,p;

run as

$ LD_PRELOAD=/usr/lib64/libjemalloc.so  ./mtr main.b --repeat 200

fails as

main.b 'innodb'                          [ pass ]    129
main.b 'innodb'                          [ 2 pass ]    102
main.b 'innodb'                          [ 3 fail ]
        Test ended at 2023-10-12 15:14:28
 
CURRENT_TEST: main.b
--- mysql-test/main/b.result       2023-10-12 15:09:01.226194632 +0200
+++ mysql-test/main/b.reject       2023-10-12 15:14:27.350211773 +0200
@@ -34,7 +34,6 @@
 )
 WHERE o1 = '1';
 p1     s1      s2
-1      #       #
 2      #       #
 3      #       #
 4      #       #
 
mysqltest: Result length mismatch

Happens on 10.4 too, but not on 11.3

Comment by Tomas Leypold [ 2023-10-12 ]

I just tried to run your mtr test on 10.4.31 (on Ubuntu 20.04) to be sure that 10.4 is really affected, but I can't reproduce it on this version, what patch version of mariadb and OS did you use @Sergei?

main.b 'innodb'                          [ 1998 pass ]     84
main.b 'innodb'                          [ 1999 pass ]     78
main.b 'innodb'                          [ 2000 pass ]    113
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 167.609 of 317 seconds executing testcases
 
Completed: All 2000 tests were successful.

Then I tested on two other versions (10.6, 11.2) to be sure that I tested correctly:

main.b 'innodb'                          [ 196 pass ]     49
main.b 'innodb'                          [ 197 fail ]
        Test ended at 2023-10-12 17:05:48
 
CURRENT_TEST: main.c
--- /usr/share/mysql/mysql-test/main/b.result   2023-10-12 16:56:55.047760483 +0200
+++ /usr/share/mysql/mysql-test/main/b.reject   2023-10-12 17:05:47.912103398 +0200
@@ -21,6 +21,5 @@
 p1     s1      s2
 1      #       #
 2      #       #
-3      #       #
 4      #       #
 drop tables ps,s,o,p;
 
mysqltest: Result length mismatch
 
 - skipping '/usr/share/mysql/mysql-test/var/log/main.b-innodb/'
main.b 'innodb'                          [ 198 pass ]     53
main.b 'innodb'                          [ 199 pass ]     50
main.b 'innodb'                          [ 200 pass ]     58
--------------------------------------------------------------------------
The servers were restarted 1 times
Spent 9.841 of 50 seconds executing testcases
 
Completed: Failed 35/200 tests, 82.50% were successful.

Comment by Sergei Golubchik [ 2023-10-12 ]

I had the latest 10.4 from github. And a debug build — that's just what I happened to have ready.

May be it depends on the compiler? I've got gcc 13.2.1 20230826.

It doesn't matter much where it repeats, when we'll have the fix, we'll see if it applies to 10.4 too or only to 10.5 and up

Comment by Cuchac [ 2023-11-01 ]

Hello,
is there anything I can prepare to help you debug this issue?

Comment by Bruno Bear [ 2024-02-06 ]

We really need to try jemalloc to avoid OOMs on our 10.6.12 MariaDB Servers.
Now we are scared to use it in production, because of this bug. Less because of this specific query, but more of the possibility that there are more querys that give wrong results if we use jemalloc.
Did maybe someone from MariaDB already look into this issue, found the reason and can therefore say if its really only this specific edge case query with left join, subselect, order by and limit?

Generated at Thu Feb 08 10:30:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.