Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3861

TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr<p->end' failed

Details

    Description

      I have a simple test case that replaces a row in innodb and tokudb and
      then commits. Eventually, mysqld hits this assert:

      mysqld: /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547:
      int TC_LOG_MMAP::log_one_transaction(my_xid): Assertion `p->ptr <
      p->end' failed.

      The test works great when the binlog is enabled. The test crashes
      when the binlog is OFF. So, maybe the problem is isolated to the
      TC_LOG_MMAP logic. I noticed that the TC_LOG_MMAP::unlog function is
      quite different than the mariadb-5.5 version.

      What is the best way to get this fixed? Would it help if i made the
      test case available?

      Thanks
      Rich Prohaska

      Here is the stack trace:

      Program received signal SIGABRT, Aborted.
      0x0000003006a32885 in raise () from /lib64/libc.so.6
      (gdb) bt
      #0 0x0000003006a32885 in raise () from /lib64/libc.so.6
      #1 0x0000003006a34065 in abort () from /lib64/libc.so.6
      #2 0x0000003006a2b9fe in __assert_fail_base () from /lib64/libc.so.6
      #3 0x0000003006a2bac0 in __assert_fail () from /lib64/libc.so.6
      #4 0x00000000008862bf in TC_LOG_MMAP::log_one_transaction
      (this=0x1541f00 <tc_log_mmap>, xid=142018) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/log.cc:7547
      #5 0x0000000000885442 in TC_LOG_MMAP::log_and_order (this=0x1541f00
      <tc_log_mmap>, thd=0x27f0000, xid=142018, all=true,
      need_prepare_ordered=false, need_commit_ordered=true) at
      /home/prohaska/maria10-\
      build/mariadb-10.0.0/sql/log.cc:7180
      #6 0x00000000007b819f in ha_commit_trans (thd=0x27f0000, all=true) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/handler.cc:1329
      #7 0x0000000000710307 in trans_commit (thd=0x27f0000) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/transaction.cc:213
      #8 0x000000000060a2df in mysql_execute_command (thd=0x27f0000) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:4084
      #9 0x000000000060f1a7 in mysql_parse (thd=0x27f0000,
      rawbuf=0x7fff6c0050b8 "commit", length=6, parser_state=0x7fffe4112660)
      at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:6056
      #10 0x00000000006029a3 in dispatch_command (command=COM_QUERY,
      thd=0x27f0000, packet=0x27fa341 "commit", packet_length=6) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:1216
      #11 0x0000000000601bdb in do_command (thd=0x27f0000) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_parse.cc:945
      #12 0x00000000006ff6d1 in do_handle_one_connection (thd_arg=0x27f0000)
      at /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_connect.cc:1254
      #13 0x00000000006ff179 in handle_one_connection (arg=0x27f0000) at
      /home/prohaska/maria10-build/mariadb-10.0.0/sql/sql_connect.cc:1168
      #14 0x0000003006e077f1 in start_thread () from /lib64/libpthread.so.0
      #15 0x0000003006ae592d in clone () from /lib64/libc.so.6

      Attachments

        Issue Links

          Activity

            Fix pushed to 5.1, merged to 5.2 and 5.3.
            Part of the fix only exists in 10.0 - push that to 10.0-base and merge to 10.0

            knielsen Kristian Nielsen added a comment - Fix pushed to 5.1, merged to 5.2 and 5.3. Part of the fix only exists in 10.0 - push that to 10.0-base and merge to 10.0

            I am looking at the 5.5 version of this fix (rev 2502.565.20 in lp:maria/5.5) and I am not sure I understand why last_pool was replaced with last_pool_ptr there? Any help would be greatly appreciated.

            Thanks,

            laurynas Laurynas Biveinis added a comment - I am looking at the 5.5 version of this fix (rev 2502.565.20 in lp:maria/5.5) and I am not sure I understand why last_pool was replaced with last_pool_ptr there? Any help would be greatly appreciated. Thanks,

            Sorry, pool_last and pool_last_ptr.

            laurynas Laurynas Biveinis added a comment - Sorry, pool_last and pool_last_ptr.

            > I am not sure I understand why last_pool was replaced with last_pool_ptr
            > there?

            It looks like this is to correctly handle the case of an empty list.

            TC_LOG_MMAP::sync() links in a page at the end of the linked list. If the list
            is non-empty, then this means changing the `next' pointer to point to the new
            element. But if the list is empty, then there is no `next' pointer to change;
            rather the TC_LOG::pool member must be changed to point to the new element.

            The last_pool_ptr is a standard technique for linked lists to handle this
            without special cases. If the list is empty, last_pool_ptr points to
            TC_LOG::pool, else it points to the `next' pointer of the last element. So
            linking in the new page in all cases can be done by assigning to
            (*last_pool_ptr).

            Note that this usage matches the variable `best_p' in function
            TC_LOG_MMAP::get_active_from_pool(), which is probably why I did it this way.

            Does this make things clear for you?

            knielsen Kristian Nielsen added a comment - > I am not sure I understand why last_pool was replaced with last_pool_ptr > there? It looks like this is to correctly handle the case of an empty list. TC_LOG_MMAP::sync() links in a page at the end of the linked list. If the list is non-empty, then this means changing the `next' pointer to point to the new element. But if the list is empty, then there is no `next' pointer to change; rather the TC_LOG::pool member must be changed to point to the new element. The last_pool_ptr is a standard technique for linked lists to handle this without special cases. If the list is empty, last_pool_ptr points to TC_LOG::pool, else it points to the `next' pointer of the last element. So linking in the new page in all cases can be done by assigning to (*last_pool_ptr). Note that this usage matches the variable `best_p' in function TC_LOG_MMAP::get_active_from_pool(), which is probably why I did it this way. Does this make things clear for you?

            Kristian -

            Thanks for your explanation, I haven't considered the case of an
            empty pool list.

            But I don't see how that list can be possibly empty? TC_LOG_MMAP
            requires at least three pages, at most one can be active, at most one
            can be syncing, thus one page should be present in the pool at all
            times, thus TC_LOG_MMAP::sync() should invariantly have at least one
            page in the pool in the LOCK_pool critical section.
            TC_LOG_MMAP::get_active_from_pool() is called with
            active == NULL, thus there should be at least two pages in the pool
            there.

            laurynas Laurynas Biveinis added a comment - Kristian - Thanks for your explanation, I haven't considered the case of an empty pool list. But I don't see how that list can be possibly empty? TC_LOG_MMAP requires at least three pages, at most one can be active, at most one can be syncing, thus one page should be present in the pool at all times, thus TC_LOG_MMAP::sync() should invariantly have at least one page in the pool in the LOCK_pool critical section. TC_LOG_MMAP::get_active_from_pool() is called with active == NULL, thus there should be at least two pages in the pool there.

            > But I don't see how that list can be possibly empty?

            Well, there were multiple problems in the pool list handling. I suppose I just
            fixed all of them that I found. It could be that it was another problem that
            caused the actual bug. If you check the code before the patch, you will see
            that if get_active_from_pool() selected the last page in the list, then the
            page was not unlinked from the pool.

            So maybe the list can not be empty, but it still seems better to not leave
            completely wrong code in there, doesn't it?

            Another problem in the bug was a race condition that get_active_from_pool()
            could select a page with no free slots.

            IIRC, this bug was found because of a different bug elsewhere in the code.
            The slots in the pages were never freed, so eventually the pool would run out
            of free slots, thus exercising this code that was just completely wrong,
            having never been tested before. Without the other bug, it may be hard to
            reproduce in practice, though I think theoretically it is possible.

            Out of curiosity, what makes you interested in this bug? Did you experience it
            in practise?

            knielsen Kristian Nielsen added a comment - > But I don't see how that list can be possibly empty? Well, there were multiple problems in the pool list handling. I suppose I just fixed all of them that I found. It could be that it was another problem that caused the actual bug. If you check the code before the patch, you will see that if get_active_from_pool() selected the last page in the list, then the page was not unlinked from the pool. So maybe the list can not be empty, but it still seems better to not leave completely wrong code in there, doesn't it? Another problem in the bug was a race condition that get_active_from_pool() could select a page with no free slots. IIRC, this bug was found because of a different bug elsewhere in the code. The slots in the pages were never freed, so eventually the pool would run out of free slots, thus exercising this code that was just completely wrong, having never been tested before. Without the other bug, it may be hard to reproduce in practice, though I think theoretically it is possible. Out of curiosity, what makes you interested in this bug? Did you experience it in practise?

            Kristian -

            Now I see, thank you. I understood all the other changes by this patch fixing the problems you describe, it's only the empty list handling which I had doubts about.

            I haven't experienced this in practice. My interest is caused by preparing Percona Server to work with TokuDB, which obviously requires production-quality TC_LOG_MMAP. Thus I am reviewing MariaDB changes, MySQL 5.7 fix for http://bugs.mysql.com/bug.php?id=47134, etc.

            Thanks again

            laurynas Laurynas Biveinis added a comment - Kristian - Now I see, thank you. I understood all the other changes by this patch fixing the problems you describe, it's only the empty list handling which I had doubts about. I haven't experienced this in practice. My interest is caused by preparing Percona Server to work with TokuDB, which obviously requires production-quality TC_LOG_MMAP. Thus I am reviewing MariaDB changes, MySQL 5.7 fix for http://bugs.mysql.com/bug.php?id=47134 , etc. Thanks again

            People

              knielsen Kristian Nielsen
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.