[MDEV-4334] Flush DBUG output in DBUG_ASSERT() in a way that does not cause crashes - Jira

XML

Word

Printable

Details

Type: Task
Status: Open (View Workflow)
Priority: Trivial
Resolution: Unresolved
Fix Version/s: None
Component/s: None
Labels:
None

Description

This revision adds a _db_flush() call to every DBUG_ASSERT():

    revid:monty@mysql.com-20110130104144-v3osvsbyl45zd5o3

    -#define DBUG_ASSERT(A) assert(A)

    +#define DBUG_ASSERT(A) do { _db_flush_(); assert(A); } while(0)

Unfortunately this causes occasional crashes inside DBUG_ASSERT. I am starting
to hit this a lot in my ~~MDEV-26~~ tree, but it is also visible elsewhere, like
here:

http://buildbot.askmonty.org/buildbot/builders/kvm-dgcov-jaunty-i386/builds/2456/steps/test_2/logs/stdio

I suspect that the original change is just wrong and that it needs protection
against concurrent threads updating the DBUG data structures. Unfortunately
this is not entirely trivial to do. I assume adding lock/unlock to every
DBUG_ASSERT would make the test suite a lot slower, which is undesirable. On
the other hand, the obvious

  #define DBUG_ASSERT(A) do { if (!A) _db_flush_(); assert(A); } while(0)

evaluates A twice, which may cause problems, and assigning A to a temp
variable also is not good, as we want the assert() macro to have access to the
actual expression A to print it in the error log. So I am not sure how to fix
this properly.

For now, I will remove the _db_flush() from DBUG_ASSERT(), so I avoid
crashes. So I filed this task for you in case you want to bring it back in, in
a way that does not cause crash.

Here are details of my analysis of the crash:

#3  <signal handler called>

#4  0x0000000000d0aa44 in _db_flush_ () at /home/knielsen/my/10.0/work-10.0-mdev26/dbug/dbug.c:2139

#5  0x00000000005b3dd6 in next_event (rli=0x4820038) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:5316

#6  0x00000000005ad98e in exec_relay_log_event (thd=0x4859480, rli=0x4820038) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:3042

#7  0x00000000005b016f in handle_slave_sql (arg=0x481e940) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:3965

#8  0x0000000000ca07fd in pfs_spawn_thread (arg=0x4915890) at /home/knielsen/my/10.0/work-10.0-mdev26/storage/perfschema/pfs.cc:1015

#9  0x00007f98cd89cb50 in start_thread (arg=<optimized out>) at pthread_create.c:304

The crash is inside an assert:

5316	      DBUG_ASSERT(rli->cur_log_fd == -1); // foreign descriptor

#define DBUG_ASSERT(A) do { _db_flush_(); assert(A); } while(0)

2135	void _db_flush_()

2136	{

2137	  CODE_STATE *cs;

2138	  get_code_state_or_return;

2139	  (void) fflush(cs->stack->out_file->file);

2140	}

The crash happens because cs->stack->out_file is NULL, as shown by
disassembly:

Dump of assembler code for function _db_flush_:

   0x0000000000d0aa20 <+0>:	push   %rbp

   0x0000000000d0aa21 <+1>:	mov    %rsp,%rbp

   0x0000000000d0aa24 <+4>:	sub    $0x10,%rsp

   0x0000000000d0aa28 <+8>:	callq  0xd06f48 <code_state>

   0x0000000000d0aa2d <+13>:	mov    %rax,-0x8(%rbp)

   0x0000000000d0aa31 <+17>:	cmpq   $0x0,-0x8(%rbp)

   0x0000000000d0aa36 <+22>:	je     0xd0aa4f <_db_flush_+47>

   0x0000000000d0aa38 <+24>:	mov    -0x8(%rbp),%rax

   0x0000000000d0aa3c <+28>:	mov    0x20(%rax),%rax

   0x0000000000d0aa40 <+32>:	mov    0x10(%rax),%rax

=> 0x0000000000d0aa44 <+36>:	mov    (%rax),%rax

   0x0000000000d0aa47 <+39>:	mov    %rax,%rdi

   0x0000000000d0aa4a <+42>:	callq  0x584280 <fflush@plt>

   0x0000000000d0aa4f <+47>:	leaveq

   0x0000000000d0aa50 <+48>:	retq

End of assembler dump.

(gdb) info reg

rax            0x0	0

However, the cs->stack->out_file is not NULL when printed in the
debugger. Maybe because there is a race between different threads?

(gdb) p cs->stack->out_file

$19 = (sFILE *) 0x142a5c0

(gdb) p *cs->stack->out_file

$20 = {file = 0x7f98cc051880, used = 1073741823}

The "used" value seems quite suspicious, looks corrupt. Maybe it was free()d?

It looks to me that there is missing protection in _db_flush() against another
thread changing cs->stack->out_file, eg FreeState() sets it to NULL... ?

Attachments

Activity

People

Assignee:: Michael Widenius

Reporter:: Kristian Nielsen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2013-03-26 14:59

Updated:: 2013-03-26 15:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.