[MDEV-4334] Flush DBUG output in DBUG_ASSERT() in a way that does not cause crashes Created: 2013-03-26  Updated: 2013-03-26

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Trivial
Reporter: Kristian Nielsen Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: None


 Description   

This revision adds a _db_flush() call to every DBUG_ASSERT():

    revid:monty@mysql.com-20110130104144-v3osvsbyl45zd5o3
 
    -#define DBUG_ASSERT(A) assert(A)
    +#define DBUG_ASSERT(A) do { _db_flush_(); assert(A); } while(0)

Unfortunately this causes occasional crashes inside DBUG_ASSERT. I am starting
to hit this a lot in my MDEV-26 tree, but it is also visible elsewhere, like
here:

http://buildbot.askmonty.org/buildbot/builders/kvm-dgcov-jaunty-i386/builds/2456/steps/test_2/logs/stdio

I suspect that the original change is just wrong and that it needs protection
against concurrent threads updating the DBUG data structures. Unfortunately
this is not entirely trivial to do. I assume adding lock/unlock to every
DBUG_ASSERT would make the test suite a lot slower, which is undesirable. On
the other hand, the obvious

  #define DBUG_ASSERT(A) do { if (!A) _db_flush_(); assert(A); } while(0)

evaluates A twice, which may cause problems, and assigning A to a temp
variable also is not good, as we want the assert() macro to have access to the
actual expression A to print it in the error log. So I am not sure how to fix
this properly.

For now, I will remove the _db_flush() from DBUG_ASSERT(), so I avoid
crashes. So I filed this task for you in case you want to bring it back in, in
a way that does not cause crash.

Here are details of my analysis of the crash:

#3  <signal handler called>
#4  0x0000000000d0aa44 in _db_flush_ () at /home/knielsen/my/10.0/work-10.0-mdev26/dbug/dbug.c:2139
#5  0x00000000005b3dd6 in next_event (rli=0x4820038) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:5316
#6  0x00000000005ad98e in exec_relay_log_event (thd=0x4859480, rli=0x4820038) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:3042
#7  0x00000000005b016f in handle_slave_sql (arg=0x481e940) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:3965
#8  0x0000000000ca07fd in pfs_spawn_thread (arg=0x4915890) at /home/knielsen/my/10.0/work-10.0-mdev26/storage/perfschema/pfs.cc:1015
#9  0x00007f98cd89cb50 in start_thread (arg=<optimized out>) at pthread_create.c:304

The crash is inside an assert:

5316	      DBUG_ASSERT(rli->cur_log_fd == -1); // foreign descriptor

#define DBUG_ASSERT(A) do { _db_flush_(); assert(A); } while(0)

2135	void _db_flush_()
2136	{
2137	  CODE_STATE *cs;
2138	  get_code_state_or_return;
2139	  (void) fflush(cs->stack->out_file->file);
2140	}

The crash happens because cs->stack->out_file is NULL, as shown by
disassembly:

Dump of assembler code for function _db_flush_:
   0x0000000000d0aa20 <+0>:	push   %rbp
   0x0000000000d0aa21 <+1>:	mov    %rsp,%rbp
   0x0000000000d0aa24 <+4>:	sub    $0x10,%rsp
   0x0000000000d0aa28 <+8>:	callq  0xd06f48 <code_state>
   0x0000000000d0aa2d <+13>:	mov    %rax,-0x8(%rbp)
   0x0000000000d0aa31 <+17>:	cmpq   $0x0,-0x8(%rbp)
   0x0000000000d0aa36 <+22>:	je     0xd0aa4f <_db_flush_+47>
   0x0000000000d0aa38 <+24>:	mov    -0x8(%rbp),%rax
   0x0000000000d0aa3c <+28>:	mov    0x20(%rax),%rax
   0x0000000000d0aa40 <+32>:	mov    0x10(%rax),%rax
=> 0x0000000000d0aa44 <+36>:	mov    (%rax),%rax
   0x0000000000d0aa47 <+39>:	mov    %rax,%rdi
   0x0000000000d0aa4a <+42>:	callq  0x584280 <fflush@plt>
   0x0000000000d0aa4f <+47>:	leaveq 
   0x0000000000d0aa50 <+48>:	retq   
End of assembler dump.
(gdb) info reg
rax            0x0	0

However, the cs->stack->out_file is not NULL when printed in the
debugger. Maybe because there is a race between different threads?

(gdb) p cs->stack->out_file
$19 = (sFILE *) 0x142a5c0
(gdb) p *cs->stack->out_file
$20 = {file = 0x7f98cc051880, used = 1073741823}

The "used" value seems quite suspicious, looks corrupt. Maybe it was free()d?

It looks to me that there is missing protection in _db_flush() against another
thread changing cs->stack->out_file, eg FreeState() sets it to NULL... ?


Generated at Thu Feb 08 06:55:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.