MariaDB 10.3 commit 8b087c63b56408edfae21f3234bae0b5391759b6 (2018-05-09)
compiled with ASAN.
I have some rather simple RQG test containing mostly DDL.
When executing this test via combinations.pl in parallel (leads to high loaded box) with many trials than some significant share of the test runs fail with ASAN failures like
SUMMARY: AddressSanitizer: use-after-poison .../storage/innobase/row/row0upd.cc:3422 in row_upd_step(que_thr_t*)
SUMMARY: AddressSanitizer: use-after-poison .../storage/innobase/trx/trx0purge.cc:224 in trx_purge_add_undo_to_history(trx_t const*, trx_undo_t*&, mtr_t*)
SUMMARY: AddressSanitizer: use-after-poison .../storage/innobase/trx/trx0purge.cc:226 in trx_purge_add_undo_to_history(trx_t const*, trx_undo_t*&, mtr_t*)
There were >= 52 different unique ASAN Summary lines.
(grep -h 'SUMMARY: AddressSanitizer: ' last_comb_workdir/trial*.log | sort -u)
I am aware that
- a significant fraction of these ASAN failures are already reported
But these reports lack often some fast replay testcase.
- some clear decision about which part in MariaDB is "guilty" (InnoDB or the server or both) cannot be made based on the current information available
- there is some significant but not big likelihood that the failures reported during testing might be caused by
- exceeding OS/testing box resources -> server/InnoDB meet conditions they cannot handle good enough in the moment -> ....
There are at least no signs that the OS starts to "attack" the mass of perl processes because of resource shortages or similar.
- weaknesses in RQG mechanics
Basically RQG has also sometimes problems to handle slow reacting servers/processes.
Sorry in case that is valid.
The dilemma is that we need extreme CPU and memory IO load for getting a short bug replay time etc. On a system with low load the test passes nearly all time.