[MDEV-3861] TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr<p->end' failed Created: 2012-11-16  Updated: 2013-11-29  Resolved: 2012-11-20

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.0
Fix Version/s: 10.0.1, 5.5.28a, 5.2.13, 5.1.66

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-232 Remove one fsync() inside engine's co... Closed

 Description   

I have a simple test case that replaces a row in innodb and tokudb and
then commits. Eventually, mysqld hits this assert:

mysqld: /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547:
int TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr <
p->end' failed.

The test works great when the binlog is enabled. The test crashes
when the binlog is OFF. So, maybe the problem is isolated to the
TC_LOG_MMAP logic. I noticed that the TC_LOG_MMAP::unlog function is
quite different than the mariadb-5.5 version.

What is the best way to get this fixed? Would it help if i made the
test case available?

Thanks
Rich Prohaska

Here is the stack trace:

Program received signal SIGABRT, Aborted.
0x0000003006a32885 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003006a32885 in raise () from /lib64/libc.so.6
#1 0x0000003006a34065 in abort () from /lib64/libc.so.6
#2 0x0000003006a2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000003006a2bac0 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000008862bf in TC_LOG_MMAP::log_one_transaction
(this=0x1541f00 <tc_log_mmap>, xid=142018) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547
#5 0x0000000000885442 in TC_LOG_MMAP::log_and_order (this=0x1541f00
<tc_log_mmap>, thd=0x27f0000, xid=142018, all=true,
need_prepare_ordered=false, need_commit_ordered=true) at
/home/prohaska/maria10-\
build/mariadb-10.0.0/sql/log.cc:7180
#6 0x00000000007b819f in ha_commit_trans (thd=0x27f0000, all=true) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/handler.cc:1329
#7 0x0000000000710307 in trans_commit (thd=0x27f0000) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/transaction.cc:213
#8 0x000000000060a2df in mysql_execute_command (thd=0x27f0000) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:4084
#9 0x000000000060f1a7 in mysql_parse (thd=0x27f0000,
rawbuf=0x7fff6c0050b8 "commit", length=6, parser_state=0x7fffe4112660)
at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:6056
#10 0x00000000006029a3 in dispatch_command (command=COM_QUERY,
thd=0x27f0000, packet=0x27fa341 "commit", packet_length=6) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:1216
#11 0x0000000000601bdb in do_command (thd=0x27f0000) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:945
#12 0x00000000006ff6d1 in do_handle_one_connection (thd_arg=0x27f0000)
at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_connect.cc:1254
#13 0x00000000006ff179 in handle_one_connection (arg=0x27f0000) at
/home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_connect.cc:1168
#14 0x0000003006e077f1 in start_thread () from /lib64/libpthread.so.0
#15 0x0000003006ae592d in clone () from /lib64/libc.so.6



 Comments   
Comment by Kristian Nielsen [ 2012-11-20 ]

Fix pushed to 5.1, merged to 5.2 and 5.3.
Part of the fix only exists in 10.0 - push that to 10.0-base and merge to 10.0

Comment by Laurynas Biveinis [ 2013-11-28 ]

I am looking at the 5.5 version of this fix (rev 2502.565.20 in lp:maria/5.5) and I am not sure I understand why last_pool was replaced with last_pool_ptr there? Any help would be greatly appreciated.

Thanks,

Comment by Laurynas Biveinis [ 2013-11-28 ]

Sorry, pool_last and pool_last_ptr.

Comment by Kristian Nielsen [ 2013-11-28 ]

> I am not sure I understand why last_pool was replaced with last_pool_ptr
> there?

It looks like this is to correctly handle the case of an empty list.

TC_LOG_MMAP::sync() links in a page at the end of the linked list. If the list
is non-empty, then this means changing the `next' pointer to point to the new
element. But if the list is empty, then there is no `next' pointer to change;
rather the TC_LOG::pool member must be changed to point to the new element.

The last_pool_ptr is a standard technique for linked lists to handle this
without special cases. If the list is empty, last_pool_ptr points to
TC_LOG::pool, else it points to the `next' pointer of the last element. So
linking in the new page in all cases can be done by assigning to
(*last_pool_ptr).

Note that this usage matches the variable `best_p' in function
TC_LOG_MMAP::get_active_from_pool(), which is probably why I did it this way.

Does this make things clear for you?

Comment by Laurynas Biveinis [ 2013-11-29 ]

Kristian -

Thanks for your explanation, I haven't considered the case of an
empty pool list.

But I don't see how that list can be possibly empty? TC_LOG_MMAP
requires at least three pages, at most one can be active, at most one
can be syncing, thus one page should be present in the pool at all
times, thus TC_LOG_MMAP::sync() should invariantly have at least one
page in the pool in the LOCK_pool critical section.
TC_LOG_MMAP::get_active_from_pool() is called with
active == NULL, thus there should be at least two pages in the pool
there.

Comment by Kristian Nielsen [ 2013-11-29 ]

> But I don't see how that list can be possibly empty?

Well, there were multiple problems in the pool list handling. I suppose I just
fixed all of them that I found. It could be that it was another problem that
caused the actual bug. If you check the code before the patch, you will see
that if get_active_from_pool() selected the last page in the list, then the
page was not unlinked from the pool.

So maybe the list can not be empty, but it still seems better to not leave
completely wrong code in there, doesn't it?

Another problem in the bug was a race condition that get_active_from_pool()
could select a page with no free slots.

IIRC, this bug was found because of a different bug elsewhere in the code.
The slots in the pages were never freed, so eventually the pool would run out
of free slots, thus exercising this code that was just completely wrong,
having never been tested before. Without the other bug, it may be hard to
reproduce in practice, though I think theoretically it is possible.

Out of curiosity, what makes you interested in this bug? Did you experience it
in practise?

Comment by Laurynas Biveinis [ 2013-11-29 ]

Kristian -

Now I see, thank you. I understood all the other changes by this patch fixing the problems you describe, it's only the empty list handling which I had doubts about.

I haven't experienced this in practice. My interest is caused by preparing Percona Server to work with TokuDB, which obviously requires production-quality TC_LOG_MMAP. Thus I am reviewing MariaDB changes, MySQL 5.7 fix for http://bugs.mysql.com/bug.php?id=47134, etc.

Thanks again

Generated at Thu Feb 08 06:51:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.