[MDEV-16136] Various ASAN failures when testing 10.2/10.3 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.15, 10.3.6
Fix Version/s: 10.2.18, 10.3.10
Component/s: Storage Engine - InnoDB
Labels:
- affects-tests
Environment:
Ubuntu 17.04 but I assume this is not important.

Description

MariaDB 10.3 commit 8b087c63b56408edfae21f3234bae0b5391759b6 (2018-05-09)
compiled with ASAN.

I have some rather simple RQG test containing mostly DDL.
When executing this test via combinations.pl in parallel (leads to high loaded box) with many trials than some significant share of the test runs fail with ASAN failures like
SUMMARY: AddressSanitizer: use-after-poison .../storage/innobase/row/row0upd.cc:3422 in row_upd_step(que_thr_t*)
SUMMARY: AddressSanitizer: use-after-poison .../storage/innobase/trx/trx0purge.cc:224 in trx_purge_add_undo_to_history(trx_t const*, trx_undo_t*&, mtr_t*)
SUMMARY: AddressSanitizer: use-after-poison .../storage/innobase/trx/trx0purge.cc:226 in trx_purge_add_undo_to_history(trx_t const*, trx_undo_t*&, mtr_t*)

There were >= 52 different unique ASAN Summary lines.
(grep -h 'SUMMARY: AddressSanitizer: ' last_comb_workdir/trial*.log | sort -u)

I am aware that

a significant fraction of these ASAN failures are already reported
But these reports lack often some fast replay testcase.
some clear decision about which part in MariaDB is "guilty" (InnoDB or the server or both) cannot be made based on the current information available
there is some significant but not big likelihood that the failures reported during testing might be caused by
exceeding OS/testing box resources -> server/InnoDB meet conditions they cannot handle good enough in the moment -> ....
There are at least no signs that the OS starts to "attack" the mass of perl processes because of resource shortages or similar.
weaknesses in RQG mechanics
Basically RQG has also sometimes problems to handle slow reacting servers/processes.
Sorry in case that is valid.
The dilemma is that we need extreme CPU and memory IO load for getting a short bug replay time etc. On a system with low load the test passes nearly all time.

Attachments

Issue Links

is blocked by

MDEV-16063 [Draft] ASAN use-after-poison in row_sel / row_sel_step / que_thr_step

Closed

is caused by

MDEV-15030 Add ASAN instrumentation

Closed

relates to

MDEV-16781 InnoDB: AddressSanitizer: use-after-poison during DDL

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Matthias Leich

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-05-10 13:00

Updated:: 2019-08-27 13:54

Resolved:: 2018-08-16 03:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server