[MDEV-10695] Numerous failures in valgrind builder: void MYSQL_BIN_LOG::cleanup(): Assertion `!binlog_xid_count_list.head()' failed Created: 2016-08-28  Updated: 2016-12-05  Resolved: 2016-12-05

Status: Closed
Project: MariaDB Server
Component/s: Compiling, Tests
Affects Version/s: 5.5, 10.0, 10.1
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-10671 10.1.17 release Closed

 Description   

It first happened on push
http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/9256
But buildbot skipped three previous ones, so it was a combination of four.
Also, this test run suffered from disk space issues. The next one is cleaner, but still has the same assertion failures:
http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/9261/steps/test/logs/stdio

rpl.rpl_special_charset 'mix'            w4 [ pass ]   7305
worker[4] > Restart [mysqld.1 - pid: 27531, winpid: 27531] - running with different options '--binlog-format=mixed --ignore-builtin-innodb --plugin-load-add=ha_innodb.so --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-tables --innodb-metrics --log-bin=master-bin --log-bin=master-bin --innodb-flush-log-at-trx-commit=2 --loose-skip-innodb-use-sys-malloc' != '--binlog-format=mixed --log-bin=master-bin --character-set-server=utf16 --loose-skip-innodb-use-sys-malloc'
worker[4] > Restart [mysqld.2 - pid: 27533, winpid: 27533] - running with different options '--binlog-format=mixed --ignore-builtin-innodb --plugin-load-add=ha_innodb.so --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-tables --innodb-metrics --log-bin=slave-bin --log-bin=slave-bin --innodb-flush-log-at-trx-commit=2 --loose-skip-innodb-use-sys-malloc' != '--binlog-format=mixed --log-bin=slave-bin --character-set-server=utf16 --loose-skip-innodb-use-sys-malloc'
***Warnings generated in error logs during shutdown after running tests: rpl.rpl_special_charset
 
mysqld: /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/sql/log.cc:3218: void MYSQL_BIN_LOG::cleanup(): Assertion `!binlog_xid_count_list.head()' failed.
Attempting backtrace. You can use the following information to find out
mysqld: /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/sql/log.cc:3218: void MYSQL_BIN_LOG::cleanup(): Assertion `!binlog_xid_count_list.head()' failed.
Attempting backtrace. You can use the following information to find out
 
rpl.rpl_parallel_partition 'innodb_plugin,mix' w1 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2016-08-28 01:05:28
line
mysqld: /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/sql/log.cc:3218: void MYSQL_BIN_LOG::cleanup(): Assertion `!binlog_xid_count_list.head()' failed.
Attempting backtrace. You can use the following information to find out
2016-08-28  1:05:06 67648848 [ERROR] mysqld: Table './mysql/gtid_slave_pos' is marked as crashed and should be repaired
2016-08-28  1:05:06 67648848 [Warning] Checking table:   './mysql/gtid_slave_pos'
2016-08-28  1:05:06 67648848 [ERROR] mysql.gtid_slave_pos: 1 client is using or hasn't closed the table properly
^ Found warnings in /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/1/log/mysqld.1.err
ok

etc.

According to cross-reference, it never happened before.



 Comments   
Comment by Elena Stepanova [ 2016-08-28 ]

ATTN nirbhay_c. I can't assign it to two people at once, but whoever sees it first, please check it.

Comment by Elena Stepanova [ 2016-08-28 ]

Please note there are other kinds of failures in that build – corruption, server startup failures. They might be related or different, it's impossible to say at this point.

Comment by Sergei Golubchik [ 2016-08-29 ]

This crash is a case of strict aliasing. Our release builds are not affected, they're compiled with -fno-strict-aliasing. But valgrind builds aren't. Here're the relevant parts of code:

sql/log.cc

  I_List<xid_count_per_binlog> binlog_xid_count_list;
  ...
  while ((b= binlog_xid_count_list.get()))
  {
    DBUG_ASSERT(b->xid_count == 0);
    DBUG_ASSERT(!binlog_xid_count_list.head());

sql/log.h

  I_List<xid_count_per_binlog> binlog_xid_count_list;

sql/sql_list.h

struct ilink
{
  struct ilink **prev,*next;
...
  inline void unlink()
  {
    if (prev) *prev= next;
    if (next) next->prev=prev;
    prev=0 ; next=0;
  }
};
...
class base_ilist
{
  struct ilink *first;
  struct ilink last;
...
  inline struct ilink *get()
  {
    struct ilink *first_link=first;
    if (first_link == &last)
      return 0;
    first_link->unlink();			// Unlink from list
    return first_link;
  }
  inline struct ilink *head()
  {
    return (first != &last) ? first : 0;
  }

What happens here, compiler evaluates the while condition, when doing that it loads binlog_xid_count_list.first into a register. Then it uses this register in evaluating the assertion. It does not notice that binlog_xid_count_list.first was modified meanwhile by ilink::unlink() called from base_ilist::get().

Comment by Sergei Golubchik [ 2016-12-05 ]

updated buildbot config to disable strict aliasing in valgrind builds

Generated at Thu Feb 08 07:44:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.