[MDEV-30274] Massive trigger performance drop after update Created: 2022-12-19  Updated: 2022-12-20  Resolved: 2022-12-20

Status: Closed
Project: MariaDB Server
Component/s: Stored routines, Triggers
Affects Version/s: 10.6.11
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Andreas Pillath Assignee: Sergei Golubchik
Resolution: Duplicate Votes: 2
Labels: regression
Environment:

Official MariaDB 10.6.10/11 Docker Images and MariaDB 10.6.11 client on Ubuntu 22.4


Attachments: File test.sql     File testv2.sql    
Issue Links:
Duplicate
duplicates MDEV-29988 Major performance regression with 10.... Closed

 Description   

We discovered a performance degration after updating from 10.6.10 to 10.6.11:

  • 10.6.10 executes the update of 4Mio rows with an attached trigger in about 8 seconds.
  • 10.6.11 needs nearly 10 minutes for the same update.

The issue is reproducible with the attached 'test.sql' script. The update without a trigger is not affected.

// 10.6.10
$ time mariadb -uroot -proot --port=3303 --protocol=TCP test < test.sql 
info	rtime
UPDATE t1	00:00:14
info	rtime
UPDATE t2	00:00:08
 
real	1m30,441s
user	0m0,002s
sys	0m0,015s{quote}

// 10.6.11
$ time mariadb -uroot -proot --port=3303 --protocol=TCP test < test.sql 
info	rtime
UPDATE t1	00:00:13
info	rtime
UPDATE t2	00:09:50
 
real	11m12,014s
user	0m0,012s
sys	0m0,007s

The server was started in a container with

docker run -p 3303:3306 --name maridadb -e MARIADB_DATABASE=test -e MARIADB_ROOT_PASSWORD=root -d mariadb:10.6.10
// and
docker run -p 3303:3306 --name maridadb -e MARIADB_DATABASE=test -e MARIADB_ROOT_PASSWORD=root -d mariadb:10.6.11

Addtional observation: when looking at the strace output of the mariadb-process, there seem to be many futex-ETIMEDOUT errors with a duration of 0.1-0.4secs.

[pid  1290] futex(0x55b52c067c2c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1670854395, tv_nsec=849093000}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT <0.400039>
[pid  1290] futex(0x55b52c067c40, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000016>
[pid  1290] futex(0x55b52c067c2c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1670854396, tv_nsec=17567000}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT  <0.168297>
[pid  1290] futex(0x55b52dbac238, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000016>
[pid  1290] futex(0x55b52c067c40, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000014>
[pid  1290] futex(0x55b52c067c2c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1670854396, tv_nsec=249321000}, FUTEX_BITSET_MATCH_ANY <unfinished ...>



 Comments   
Comment by Sergei Golubchik [ 2022-12-19 ]

I don't seem to be able to repeat this issue.
On 10.6.10 I've got

UPDATE t1       00:00:38
UPDATE t2       00:00:06

And on 10.6.11 it was

UPDATE t1       00:00:45
UPDATE t2       00:00:08

t1 is slower, because you've created an index there. If I remove ALTER TABLE, I can the timing within ±1sec.

May be it's something specific to your setup? Like, depending on your my.cnf settings?

Comment by Andreas Pillath [ 2022-12-20 ]

@serg Sorry, there have been two flaws(wrong index and nondeterministic) in the script, hopefully you will now also be able to reproduce the problem with the attached testv2.sql script. It is now also printing the affected rows which have been random before.
Do you have some special my.cnf settings? I have used the docker images to hopefully have the default ones.

10.6.10

info	rtime	row_ct
UPDATE t1	00:00:08	333334
info	rtime	row_ct
UPDATE t2	00:00:11	333334

10.6.11

info	rtime	row_ct
UPDATE t1	00:00:09	333334
info	rtime	row_ct
UPDATE t2	00:20:03	333334

When looking at the mariadbd process during the update with gdb, it seems to be stuck at the following location:

#0  0x0000563c67c9614e in Item::cleanup() ()
#1  0x0000563c67a46a61 in cleanup_items(Item*) ()
#2  0x0000563c6799d34d in sp_head::execute(THD*, bool) ()
#3  0x0000563c6799e1b6 in sp_head::execute_trigger(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, st_grant_info*) ()
#4  0x0000563c67b00da4 in Table_triggers_list::process_triggers(THD*, trg_event_type, trg_action_time_type, bool) ()
#5  0x0000563c679e6f79 in fill_record_n_invoke_before_triggers(THD*, TABLE*, List<Item>&, List<Item>&, bool, trg_event_type) ()
#6  0x0000563c67b1bb99 in mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*) ()
#7  0x0000563c67a51efe in mysql_execute_command(THD*, bool) ()
#8  0x0000563c67a40b07 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
#9  0x0000563c67a4d0a5 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) ()
#10 0x0000563c67a4f1ae in do_command(THD*, bool) ()
#11 0x0000563c67b614d7 in do_handle_one_connection(CONNECT*, bool) ()
#12 0x0000563c67b6182d in handle_one_connection ()
#13 0x0000563c67ec86e6 in ?? ()
#14 0x00007f4371786609 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f4371372133 in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Comment by Valerii Kravchuk [ 2022-12-20 ]

If the time is spent in cleanup_items(), it is probably a yet another duplicate of MDEV-29988...

Comment by Andreas Pillath [ 2022-12-20 ]

one minute of perf

  99.72%     0.00%             0  mariadbd  libpthread-2.31.so   [.] start_thread
            |
            ---start_thread
               |          
                --99.72%--MyCTX_nopad::finish(unsigned char*, unsigned int*)
                          handle_one_connection
                          |          
                           --99.72%--do_handle_one_connection(CONNECT*, bool)
                                     |          
                                      --99.72%--do_command(THD*, bool)
                                                dispatch_command(enum_server_command, THD*, char*, unsigned int, bool)
                                                mysql_parse(THD*, char*, unsigned int, Parser_state*)
                                                mysql_execute_command(THD*, bool)
                                                |          
                                                 --99.68%--mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*)
                                                           |          
                                                           |--97.81%--fill_record_n_invoke_before_triggers(THD*, TABLE*, List<Item>&, List<Item>&, bool, trg_event_type)
                                                           |          |          
                                                           |           --97.69%--Table_triggers_list::process_triggers(THD*, trg_event_type, trg_action_time_type, bool)
                                                           |                     |          
                                                           |                      --97.62%--sp_head::execute_trigger(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, st_grant_info*)
                                                           |                                |          
                                                           |                                 --97.55%--sp_head::execute(THD*, bool)
                                                           |                                           |          
                                                           |                                           |--90.39%--cleanup_items(Item*)
                                                           |                                           |          |          
                                                           |                                           |           --10.48%--Item::cleanup()
                                                           |                                           |          
                                                           |                                            --6.75%--Item::cleanup()
                                                           |          
                                                            --1.44%--handler::ha_update_row(unsigned char const*, unsigned char const*)
                                                                      |          
                                                                       --1.41%--wsrep_notify_status(wsrep::server_state::state, wsrep::view const*)
                                                                                 |          
                                                                                  --1.34%--void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag)
                                                                                            |          
                                                                                             --1.31%--void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag)
                                                                                                       |          
                                                                                                        --1.29%--void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag)
                                                                                                                  |          
                                                                                                                   --1.14%--void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag)
:
...

Comment by Sergei Golubchik [ 2022-12-20 ]

valerii, right, looks like it is. I was trying to repeat it one commit before the fix to see if it'll disappear after the fix, but failed to repeat it even on vanilla 10.6.10.

But now with the perf data, I'm fairly certain you're right. It's MDEV-29988.

pskiwi, thanks. I'll close this as a duplicate now. Note that MDEV-29988 is a blocker, it means it will be fixed in the next release. In fact, we already have a fix for it.

If you need a workaround, try to avoid implicit charset conversions in the trigger. Perhaps the connection character set (and thus '') is utf8, but txt in the table is latin1, or vice versa? An explicit CONVERT() should help to avoid the buggy code path.

Generated at Thu Feb 08 10:15:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.