[MDEV-5618] TokuDB tests fail when building 5.5.35 in buildd at Launchpad.net Created: 2014-02-05  Updated: 2014-07-19  Resolved: 2014-07-19

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.35
Fix Version/s: 5.5.37

Type: Bug Priority: Major
Reporter: Otto Kekäläinen Assignee: Otto Kekäläinen
Resolution: Fixed Votes: 0
Labels: buildd, debian, tests


 Description   

I've been working on Debian packaging. My current latest version https://github.com/ottok/mariadb-5.5 builds OK on localhost with git-buildpackage and dpkg-buildpackage, and also on another build machine that runs git-buildpackage with pbuilder chroots.

However when I upload the same package to Launchpad.net all the versions that build TokuDB fail to successfully build because the test run has TokuDB related fails:

Only  1358  of 3485 completed.
--------------------------------------------------------------------------
The servers were restarted 533 times
Spent 1884.283 of 2996 seconds executing testcases
 
Check of testcase failed for: rpl.rpl_ddl
 
Too many failed: Failed 10/952 tests, 98.95% were successful.
 
Failing test(s): rpl-tokudb.tokudb_innodb_xa_crash tokudb_alter_table.ai_part tokudb_alter_table.drop_add_pk_part_104 tokudb_alter_table.hcad_part tokudb_alter_table.rename_column_cold_part_104

Above was from build log at https://launchpadlibrarian.net/165042892/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.35-1~trusty1~ppa3_FAILEDTOBUILD.txt.gz

More build logs at https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all

The build failure for saucy-amd64 is identical.

Launchpad uses buildd to build, and as that is the main difference to other build environments, I suspect there is some issue with 5.5.35 and buildd.



 Comments   
Comment by Elena Stepanova [ 2014-02-05 ]

Corresponding fragments in the build log look like this one:

rpl-tokudb.tokudb_innodb_xa_crash 'innodb_plugin,mix' [ fail ]
        Test ended at 2014-02-05 11:43:25
 
CURRENT_TEST: rpl-tokudb.tokudb_innodb_xa_crash
 
 
Failed to start mysqld.1
mysqltest failed but provided no output
 
 
 - saving '/build/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/log/rpl-tokudb.tokudb_innodb_xa_crash-innodb_plugin,mix/' to '/build/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/log/rpl-tokudb.tokudb_innodb_xa_crash-innodb_plugin,mix/'
 - found 'core.4720' (0/5)
 
Trying 'dbx' to get a backtrace
 
Trying 'gdb' to get a backtrace
Compressed file /build/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/log/rpl-tokudb.tokudb_innodb_xa_crash-innodb_plugin,mix/mysqld.1/data/core.4720
 - found 'core.4723' (1/5)
 
Trying 'dbx' to get a backtrace
 
Trying 'gdb' to get a backtrace
Compressed file /build/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/log/rpl-tokudb.tokudb_innodb_xa_crash-innodb_plugin,mix/mysqld.2/data/core.4723
***Warnings generated in error logs during shutdown after running tests: rpl-tokudb.tokudb_innodb_xa_crash
 
140205 11:43:25 [ERROR] mysqld got signal 11 ;
Attempting backtrace. You can use the following information to find out
140205 11:43:25 [ERROR] mysqld got signal 11 ;
Attempting backtrace. You can use the following information to find out

That is, apparently TokuDB crashes on startup in all TokuDB tests.
Buildbot runs tests on Saucy with TokuDB, I also tried to build it on my Saucy VM with the same cmake option as the build above uses and didn't get any crashes, so obviously it's not a generic problem between TokuDB and Saucy/Trusty, but something related to the particular environment.
Unfortunately there are no clues in the log to start guessing what could have gone wrong, so as discussed on IRC earlier, we'll need additional information – either a server error log, or data from the stack traces from the coredump.

Comment by Elena Stepanova [ 2014-02-14 ]

140214 13:52:22 [Note] Plugin 'ARCHIVE' is disabled.
140214 13:52:22 [Note] Plugin 'SPHINX' is disabled.
140214 13:52:22 [Note] Plugin 'FEDERATED' is disabled.
140214 13:52:22 InnoDB: The InnoDB memory heap is disabled
140214 13:52:22 InnoDB: Mutexes and rw_locks use GCC atomic builtins
140214 13:52:22 InnoDB: Compressed tables use zlib 1.2.8
140214 13:52:22 InnoDB: Using Linux native AIO
140214 13:52:22 InnoDB: Initializing buffer pool, size = 8.0M
140214 13:52:22 InnoDB: Completed initialization of buffer pool
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
140214 13:52:22  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
140214 13:52:22  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait...
140214 13:52:22  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: 127 rollback segment(s) active.
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
140214 13:52:23  InnoDB: Waiting for the background threads to start
140214 13:52:24 Percona XtraDB (http://www.percona.com) 5.5.35-MariaDB-33.0 started; log sequence number 0
140214 13:52:24 [Note] Plugin 'INNODB_RSEG' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_UNDO_LOGS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_LOCK_WAITS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_CMP' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_CMP_RESET' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_TABLES' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_INDEXES' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_COLUMNS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_FIELDS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_FOREIGN' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_FOREIGN_COLS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_SYS_STATS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_TABLE_STATS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_INDEX_STATS' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_BUFFER_POOL_PAGES' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_BUFFER_POOL_PAGES_INDEX' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_BUFFER_POOL_PAGES_BLOB' is disabled.
140214 13:52:24 [Note] Plugin 'XTRADB_ADMIN_COMMAND' is disabled.
140214 13:52:24 [Note] Plugin 'INNODB_CHANGED_PAGES' is disabled.
140214 13:52:24 [Note] Plugin 'BLACKHOLE' is disabled.
140214 13:52:24 [Note] Plugin 'QUERY_CACHE_INFO' is disabled.
140214 13:52:24 [Note] Plugin 'FEEDBACK' is disabled.
140214 13:52:24 [Note] Plugin 'partition' is disabled.
140214 13:52:24 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 5.5.35-MariaDB-1-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 62493 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(my_print_stacktrace+0x2e)[0x2b82b054866e]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(handle_fatal_signal+0x457)[0x2b82b013ec07]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x2b82b1f47330]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(+0x12fa5c)[0x2b82bab34a5c]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(+0x12eab8)[0x2b82bab33ab8]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(free+0x305)[0x2b82bab22335]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(_Z31toku_os_get_processor_frequencyPm+0x17e)[0x2b82baa7b8de]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(_Z21toku_portability_initv+0x1d)[0x2b82baa7ba5d]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(_Z18toku_ft_layer_initv+0xc)[0x2b82baaab2bc]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so(+0x2ff55)[0x2b82baa34f55]
/lib64/ld-linux-x86-64.so.2(+0x1001a)[0x2b82b14c101a]
/lib64/ld-linux-x86-64.so.2(+0x10103)[0x2b82b14c1103]
/lib64/ld-linux-x86-64.so.2(+0x14b50)[0x2b82b14c5b50]
/lib64/ld-linux-x86-64.so.2(+0xfea4)[0x2b82b14c0ea4]
/lib64/ld-linux-x86-64.so.2(+0x1429b)[0x2b82b14c529b]
/lib/x86_64-linux-gnu/libdl.so.2(+0x102b)[0x2b82b1d3402b]
/lib64/ld-linux-x86-64.so.2(+0xfea4)[0x2b82b14c0ea4]
/lib/x86_64-linux-gnu/libdl.so.2(+0x162d)[0x2b82b1d3462d]
/lib/x86_64-linux-gnu/libdl.so.2(dlopen+0x31)[0x2b82b1d340c1]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(+0x390d53)[0x2b82b0006d53]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(_Z11plugin_initPiPPci+0x725)[0x2b82b000a465]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(+0x2f5184)[0x2b82aff6b184]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(_Z11mysqld_mainiPPc+0x5fa)[0x2b82aff6e28a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2b82b2dbbed5]
/tmp/buildd/mariadb-5.5-5.5.35/builddir/sql/mysqld(+0x2ee696)[0x2b82aff64696]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file

Comment by Otto Kekäläinen [ 2014-02-14 ]

I managed to repeat this on a pbuilder instance on a machine that I control.

See logs and file listings at http://labs.seravo.fi/~otto/mariadb-repo/MDEV-5618/

The error occurs only for trusty. Sid and wheezy with exact same build script, pbuilder setup and commit id built OK.

Comment by Elena Stepanova [ 2014-02-15 ]

Hi Rich,

Could you please take a look at the stack trace above and Otto's notes to see if it looks anyhow familiar?

I am not getting the crash on my Trusty VM, so it's certainly not something general for Trusty, but specific for the particular machine/environment that Otto is using.

Comment by Elena Stepanova [ 2014-02-15 ]

I downloaded Otto's binaries from http://labs.seravo.fi/~otto/mariadb-repo/MDEV-5618/builddir-639ccb0-pbuilder.tar.bz, and with them I did get the crash:

#2  <signal handler called>
#3  extent_ad_comp (a=0x7fff22e3f930, b=0x0) at extra/jemalloc/src/extent.c:32
#4  jemalloc_internal_extent_tree_ad_search (rbtree=rbtree@entry=0x7f5ceadfb0c0 <huge>, key=key@entry=0x7fff22e3f930) at extra/jemalloc/src/extent.c:38
#5  0x00007f5ceab7fab8 in jemalloc_internal_huge_salloc (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/huge.c:229
#6  0x00007f5ceab6e335 in jemalloc_internal_isalloc (demote=false, ptr=0x7f5cedf7ce00) at include/jemalloc/internal/jemalloc_internal.h:863
#7  free (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/jemalloc.c:1267
#8  0x00007f5ceaac78de in toku_get_processor_frequency_cpuinfo (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:371
#9  toku_os_get_processor_frequency (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:409
#10 0x00007f5ceaac7a5d in toku_portability_init () at storage/tokudb/ft-index/portability/portability.cc:139
#11 0x00007f5ceaaf72bc in toku_ft_layer_init () at storage/tokudb/ft-index/ft/ft-ops.cc:6275
#12 0x00007f5ceaa80f55 in _GLOBAL__I_65535_0_libtokufractaltree_static.a_0x235798 () at storage/tokudb/ft-index/src/ydb_lib.cc:103
#13 0x00007f5cf139401a in call_init (l=<optimized out>, argc=argc@entry=17, argv=argv@entry=0x7fff22e40df8, env=env@entry=0x7fff22e40e88) at dl-init.c:78
#14 0x00007f5cf1394103 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:36
#15 _dl_init (main_map=main_map@entry=0x7f5ceec14000, argc=17, argv=0x7fff22e40df8, env=0x7fff22e40e88) at dl-init.c:126
#16 0x00007f5cf1398b50 in dl_open_worker (a=a@entry=0x7fff22e3fd38) at dl-open.c:577
#17 0x00007f5cf1393ea4 in _dl_catch_error (objname=objname@entry=0x7fff22e3fd28, errstring=errstring@entry=0x7fff22e3fd30, mallocedp=mallocedp@entry=0x7fff22e3fd20, operate=operate@entry=0x7f5cf1398880 <dl_open_worker>, args=args@entry=0x7fff22e3fd38) at dl-error.c:177
#18 0x00007f5cf139829b in _dl_open (file=0x7fff22e400a0 "/home/elenst/mariadb-5.5.35/builddir/mysql-test/var/plugins/ha_tokudb.so", mode=-2147483646, caller_dlopen=<optimized out>, nsid=-2, argc=17, argv=0x7fff22e40df8, env=0x7fff22e40e88) at dl-open.c:661
#19 0x00007f5cf0b2502b in dlopen_doit (a=a@entry=0x7fff22e3ff40) at dlopen.c:66
#20 0x00007f5cf1393ea4 in _dl_catch_error (objname=0x7f5ceeffb250, errstring=0x7f5ceeffb258, mallocedp=0x7f5ceeffb248, operate=0x7f5cf0b24fd0 <dlopen_doit>, args=0x7fff22e3ff40) at dl-error.c:177
#21 0x00007f5cf0b2562d in _dlerror_run (operate=operate@entry=0x7f5cf0b24fd0 <dlopen_doit>, args=args@entry=0x7fff22e3ff40) at dlerror.c:163
#22 0x00007f5cf0b250c1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#23 0x00007f5cf1939d53 in plugin_dl_add (report=1, dl=0x7fff22e405b0) at sql/sql_plugin.cc:750
#24 plugin_add (tmp_root=<optimized out>, name=<optimized out>, dl=0x7fff22e405b0, report=1) at sql/sql_plugin.cc:1056
#25 0x00007f5cf193d465 in plugin_load_list (list=0x0, tmp_root=0x7fff22e40570) at sql/sql_plugin.cc:1845
#26 plugin_init (argc=argc@entry=0x7f5cf25a3380 <remaining_argc>, argv=0x7f5ceec44450, flags=0) at sql/sql_plugin.cc:1633
#27 0x00007f5cf189e184 in init_server_components () at sql/mysqld.cc:4334
#28 0x00007f5cf18a128a in mysqld_main (argc=101, argv=0x7f5ceec44450) at sql/mysqld.cc:4934
#29 0x00007f5cef91ded5 in __libc_start_main (main=0x7f5cf18854d0 <main(int, char**)>, argc=17, argv=0x7fff22e40df8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff22e40de8) at libc-start.c:287

The difference hids somewhere in ha_tokudb.so. If I replace the one from Otto's binaries with the one from mine, and keep the rest of binaries the same, the test passes all right.

mine:

$ ldd storage/tokudb/ha_tokudb.so
	linux-vdso.so.1 =>  (0x00007fffd73fe000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff776fef000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff776ceb000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff776923000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff777600000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff77671f000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff776418000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff776201000)

Otto's:

	linux-vdso.so.1 =>  (0x00007fff6cbc5000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff01f1d8000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff01eed4000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff01eb0c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff01f7b9000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff01e8f3000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff01e6ef000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff01e3e7000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff01e1d1000)

Comment by Rich Prohaska [ 2014-02-15 ]

the crashes occur in the tokudb's 'toku_os_get_processor_frequency' function. there is a standalone test for this function in ft-index/portability/tests/test-cpu-freq.cc which can be used to see if the same crash occurs.

Comment by Elena Stepanova [ 2014-02-16 ]

the crashes occur in the tokudb's 'toku_os_get_processor_frequency' function. there is a standalone test for this function in ft-index/portability/tests/test-cpu-freq.cc which can be used to see if the same crash occurs.

I did, and it passed all right; but the tests are not built by default, so I had to build it on my machine, and when I build on my machine, the main library also works all right, it is the one that was built on Otto's that crashes. So, I guess this result doesn't give us any new information.

I've tried to replicate all build parameters from Otto's build, but haven't managed to build a library that crashes the same way so far. However, I see something strange in those binaries. INFO_BIN says that the build was done on Linux-3.2.0-38-generic. Is it not too old for Trusty? Trusty comes with 3.13, while 3.2 is more like Precise kernel. I'm now trying to downgrade my Trusty to make it work on 3.2, but I'm not even sure it's feasible. What kind of machine can have this combination?

Comment by Otto Kekäläinen [ 2014-02-16 ]

Please remember the title of this issue. The main goal is to figure out why and fix the tests so that we can build MariaDB on Launchpad.net. Or if it's not possible then make a argumented decision to disable tests for the Ubuntu PPA builds (Launchpad.net).

The pbuilder machine where I managed to repreat this Launchpad.net build error runs pbuilder on a Ubuntu 12.04 64 bit host, threrefore the old kernel. I don't think the kernel version is an issue here. But related to kernel the issue here might perhaps be related to some code that checks what processor is used or something like that? The chroot system in pbuilder and buildd (build system at Launchpad.net) might not show processor information the same way a real machine would show.

Comment by Elena Stepanova [ 2014-02-16 ]

Please remember the title of this issue. The main goal is to figure out why and fix the tests so that we can build MariaDB on Launchpad.net. Or if it's not possible then make a argumented decision to disable tests for the Ubuntu PPA builds (Launchpad.net).

The pbuilder machine where I managed to repreat this Launchpad.net build error runs pbuilder on a Ubuntu 12.04 64 bit host, threrefore the old kernel

We cannot do anything about building binaries on a machine we don't have access to, don't have any consistent information about, and can't even get logs from.
The only reason I've been spending time on trying to reproduce the problem on Trusty is because you stated specifically that you reproduced it in your builder on Trusty, and narrowed it down to be a Trusty issue, I quote from IRC:

Feb 14 13:56:40 <otto_> I got now https://mariadb.atlassian.net/browse/MDEV-5618 reproduced on my pbuilder, with exactly the same 10 tests failing and TokuDB crashing
Feb 14 13:57:07 <otto_> the same commit-ID was OK for sid and wheezy but crashed on trusty
Feb 14 13:57:26 <otto_> so we know it for sure has something to do with trusty (Ubuntu 14.04)

Ubuntu 12.04 Precise had never even come up in the conversation till now.

Please keep in mind that the only reason we thought you encountered the same failure on your machine was the assumption that you ran the same Trusty. If we take it off the table, your failures can just as well be completely unrelated to what you are getting on the LP machine. The fact that they happen on the same tests means nothing – they are just the first tokudb tests that are run, they fail because TokuDB fails to start on whatever reason, and we know nothing about this reason on LP machine.

I suppose we just have to discard everything we assumed to be reliable information about this issue, and start from scratch.

Comment by Otto Kekäläinen [ 2014-02-16 ]

I suppose you are right in that the root cause to the failure in the Launchpad.net and my pbuilder might be different. However at the moment I think it is likely that they are caused by the same error, as the 10 tests that fail in https://launchpadlibrarian.net/165042892/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.35-1~trusty1~ppa3_FAILEDTOBUILD.txt.gz and http://labs.seravo.fi/~otto/mariadb-repo/trusty-amd64/mariadb-5.5_5.5.35-1_amd64.build-639ccb0-pbuilder.log are exactly the same:

Failing test(s): rpl-tokudb.tokudb_innodb_xa_crash tokudb_alter_table.ai_part tokudb_alter_table.drop_add_pk_part_104 tokudb_alter_table.hcad_part tokudb_alter_table.rename_column_cold_part_104

Also in both build batches (precise and trusty on Launchpad.net/buildd vs. sid, wheezy, trusty on my pbuilder) all builds but trusty run successfully, so that indicates the issue is trusty related. Buildd and pbuilder are both systems that run the builds chroot based environments.

Comment by Elena Stepanova [ 2014-02-16 ]

the 10 tests that fail in ... and ... are exactly the same

I just explained in the previous comment why this fact is irrelevant.
Your TokuDB fails to start, on whatever reason. Because of that, every TokuDB test will fail. MTR runs tests in a particular order. MTR gives up after the first 10 tests. Thus, as long as you run MTR with the same parameters on the machines where TokuDB fails to start, you will get the exact same 10 failures.

Also in both build batches (precise and trusty on Launchpad.net/buildd vs. sid, wheezy, trusty on my pbuilder) all builds but trusty run successfully, so that indicates the issue is trusty related. Buildd and pbuilder are both systems that run the builds chroot based environments.

It is getting hard to make sense out of it.
I suspect it's not just me who has no idea what is "build batches" and how they differ from "builds". I don't know what it means that "precise build batch does not run successfully on trusty". Again, you are saying that your pbuilder is "sid, wheezy, trusty" while previously you said that your pbuilder was precise.
Could you please just drop the previous story and tell it from scratch, with accurate information on what exactly fails where, only facts without assumptions, and with accurate links if there are any?

Thanks.

Comment by Otto Kekäläinen [ 2014-02-16 ]

Ok, sorry for the confusion. I'll start explaining from scratch:

Basically what I've been doing here is that I have MariaDB 5.5.35 in a Git repository at https://github.com/ottok/mariadb-5.5.git where I've been working on the contents in debian/* to have better packaging. In debian/patches/* there are also stuff that affect code outside the debian/ directory.

For quality assurance reasons I have set up a pbuilder system (https://en.wikipedia.org/wiki/Debian_build_toolchain#Isolated_build_environments) that downloads my sources and runs git-buildpackage (which runs dpkg-buildpackage) and builds the .debs for multiple distros, in one batch run. Here is a copy of that pbuilder script: http://labs.seravo.fi/~otto/mariadb-repo/build.sh

When runs are successful it uploads binaries and build logs to http://labs.seravo.fi/~otto/mariadb-repo/ so that further testing can be done by installing the .debs from those repositories. I have also uploaded some of the failed build logs so that others can see them and help debug the failures. When the commit seems good, I push from my git repo to git.debian.org. From that repo the official Debian packages are built.

I can also push to Launchpad.net where Ubuntu builds are done. Ubuntu builds logs show up at https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all
and Ubuntu binaries (when build is successful) at https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+packages

The default build routine (as defined in debian/rules) includes a test run, which I haven't disabled, as I thought it is good to run so that we can be sure that the built binaries are OK. As the first 10 TokuDB tests all fail and apparently TokuDB crashes completely, there seems to be something wrong with how TokuDB is built by pbuilder/buildd in a Ubuntu 14.04 Trusty bootstrapped chroot.

Comment by Elena Stepanova [ 2014-02-17 ]

Thanks a lot, it is so much clearer this way.

It is definitely not just Trusty, though. At the very least you are getting the same failure on Saucy: https://launchpadlibrarian.net/165028636/buildlog_ubuntu-saucy-amd64.mariadb-5.5_5.5.35-1~saucy1~ppa3_FAILEDTOBUILD.txt.gz
You even mentioned it in the initial description and at some point on IRC, but later it somehow became Trusty-only.

Now, let me summarize the results that I see by the links.

LP builds (ubuntu, amd64) from https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all:

Trusty: TokuDB crashes on startup
Saucy: TokuDB crashes on startup
Raring: 5.5.35 was not built, 5.5.32 was built without TokuDB
Precise: built without TokuDB (expectedly so).

Local pbuilder builds per system, amd64:

Precise: built without TokuDB
Saucy: 5.5.32 build only, no TokuDB
Trusty: TokuDB crashes on startup
Sid, Wheezy: TokuDB starts

Thus, we currently don't have any successful example of building an operational TokuDB for any ubuntu in either buildd or pbuilder at all.

We also know that it is definitely about building TokuDB in these synthetic environments rather than running the tests there, since I downloaded your binaries and got the same failure on a "normal" Trusty (and Saucy) – it means that at this point you should not disable the tests, they certainly show the real problem, it is the build that needs to be fixed.

With all this, we are almost ready to ask Rich for advice (again).
But first, since you earlier mentioned that you have total control over your pbuilder machine,
can you build and run there the unit test that Rich mentioned (storage/tokudb/ft-index/portability/tests/test-cpu-freq.cc)?

Thanks.

Comment by Otto Kekäläinen [ 2014-02-17 ]

I ran the test for trusty build binaries inside a trusty chroot like this:
root@htpc:/tmp/buildd/mariadb-5.5-5.5.35/mysql-test# ./mtr storage/tokudb/ft-index/portability/tests/test-cpu-freq.cc

The CPU went 100% and the system was unresponsive for 40 minutes, until I decided to walk to the machine, which was also unresponsive and I was forced to make a power reset. There was no output in the terminal and logs are lost as the chroot was on a tmpfs mounted drive.

I'm sorry but I think my time budget for this in used for now and I'll upload without TokuDB. It can be added back later.

Comment by Elena Stepanova [ 2014-02-17 ]

I ran the test for trusty build binaries inside a trusty chroot like this:
root@htpc:/tmp/buildd/mariadb-5.5-5.5.35/mysql-test# ./mtr storage/tokudb/ft-index/portability/tests/test-cpu-freq.cc

The CPU went 100% and the system was unresponsive for 40 minutes

It is a cc unit test (hence the name), it has nothing to do with MTR. In fact, it should not have even started since its name does not meet MTR requirements; but I saw the behavior you described several times when I had my mtr script corrupted.

I'm sorry but I think my time budget for this in used for now and I'll upload without TokuDB. It can be added back later.

I see. I'm afraid without any clear information on what is happening inside those machines, there is not much we can do, but I'll keep it open in case something new comes up.

Comment by Elena Stepanova [ 2014-03-31 ]

Please comment to re-open or ping me on IRC when/if you have more information so that we can proceed with this.

Comment by Otto Kekäläinen [ 2014-04-01 ]

I intend to re-engineer how TokuDB is built/tested in Debian builds to get this one solved.

Comment by Tim Callaghan (Inactive) [ 2014-04-16 ]

From the maria-dev list:

The answer to your first question is, that's how CMake works. CMake's Cross Compiling guide says that it can't guess the target processor details, and you're supposed to provide that information either by explicitly setting the variables, or by providing a toolchain file: http://www.cmake.org/Wiki/CMake_Cross_Compiling

I would be surprised if launchpad.net's infrastructure did not include suitable toolchain files, but this really isn't my area of expertise. If you can find suitable ones to use, then you should use them, otherwise I think you should probably just add something to the rules file to set those variables explicitly.

Regarding your second problem, it sounds like your packaging scripts aren't properly linking with jemalloc as the first library, with --whole-archive. I say that because we get a failure (from Elena's stacktrace) inside jemalloc code when calling free() inside the library constructor:

#2 <signal handler called>
#3 extent_ad_comp (a=0x7fff22e3f930, b=0x0) at extra/jemalloc/src/extent.c:32
#4 jemalloc_internal_extent_tree_ad_search (rbtree=rbtree@entry=0x7f5ceadfb0c0 <huge>, key=key@entry=0x7fff22e3f930) at extra/jemalloc/src/extent.c:38
#5 0x00007f5ceab7fab8 in jemalloc_internal_huge_salloc (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/huge.c:229
#6 0x00007f5ceab6e335 in jemalloc_internal_isalloc (demote=false, ptr=0x7f5cedf7ce00) at include/jemalloc/internal/jemalloc_internal.h:863
#7 free (ptr=0x7f5cedf7ce00) at extra/jemalloc/src/jemalloc.c:1267
#8 0x00007f5ceaac78de in toku_get_processor_frequency_cpuinfo (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:371
#9 toku_os_get_processor_frequency (hzret=0x7fff22e3fa78) at storage/tokudb/ft-index/portability/portability.cc:409
#10 0x00007f5ceaac7a5d in toku_portability_init () at storage/tokudb/ft-index/portability/portability.cc:139
#11 0x00007f5ceaaf72bc in toku_ft_layer_init () at storage/tokudb/ft-index/ft/ft-ops.cc:6275
#12 0x00007f5ceaa80f55 in GLOBAL_I_65535_0_libtokufractaltree_static.a_0x235798 () at storage/tokudb/ft-index/src/ydb_lib.cc:103

I've seen this happen before when a buffer is allocated with the system allocator's malloc() (as likely happens in this call to getline(3) https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L360), and then the fractal tree tries to free it with jemalloc (https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L371).

Please make sure that jemalloc is being linked properly (as the first library, and with --whole-archive) into mysqld. It is not sufficient to only link it to ha_tokudb.so, because in that case the process (mysqld) will be using the system allocator, and there will be some possibly inlined calls to jemalloc's interface inside ha_tokudb.so. If you need help with this, please show me how your linking is being done and I'll try to give the right advice. If it is against policy to ship with the allocator statically linked in a binary, then you should make sure jemalloc isn't linked in ha_tokudb.so anywhere, but I strongly recommend against that.

On Mon, Apr 14, 2014 at 5:01 PM, Rich Prohaska <prohaska@tokutek.com> wrote:
Hello Otto,
Have not investigated these problems yet. Created a tokudb issue to track: https://github.com/Tokutek/mariadb-5.5/issues/53

On Mon, Apr 14, 2014 at 5:03 AM, Otto Kekäläinen <otto@seravo.fi> wrote:
Hello Richard,

Any chance of getting your comments on this..? Thanks!

2014-04-01 12:25 GMT+03:00 Otto Kekäläinen <otto@seravo.fi>:
> Hello Rick,
>
> Last year I spent a lot of time packaging MariaDB 5.5 for Debian and
> finally this year it has landed in Ubuntu 14.04 and Debian testing.
> Unfortunately the Debian/Ubuntu version does not include TokuDB and I
> need your help to get it there.
>
> In 5.5.35 (I think) the TokuDB plugn was added to MariaDB but I had
> issues getting it build 100% correctly and I eventually dropped it
> (added build parameter -DWITHOUT_TOKUDB=true), as getting MariaDB in
> Debian at all was a bigger priority than getting it there with every
> possible plugin.
>
> The root cause seems to be that when Debian and Ubuntu packages are
> built in chroot environments (the build systems of Debian and Ubuntu
> use pbuilder/sbuilder systems, see
> https://en.wikipedia.org/wiki/Debian_build_toolchain#Isolated_build_environments)
> the code that builds the plugin does not seem to correctly detect the
> CPU features. It seems to read the values from the build machine and
> not the inputted target values (in a cross-compile situation).
>
>
> There are two related issues that needs a solution:
>
>
> 1) Currenlty the code that checks what the architecture is
> (32-bit/64-bit) is the first lines of
> https://bazaar.launchpad.net/~maria-captains/maria/10.0/view/head:/storage/tokudb/CMakeLists.txt.
> This works well for real and virtual machhines, but it does not seem
> to work in the pbuilder/sbuilder chroots, as CMAKE_SYSTEM_PROCESSOR
> always shows the chroot host CPU, not the cross-compile target CPU.
>
> Could you please investigate pbuilder/sbuilder and search for some
> solution that works for reliable target CPU checking?
>
>
> 2) When building TokuDB in Ubuntu (amd64) sbuilder environments
> something in crashes in the 'toku_os_get_processor_frequency'
> function. For this too, could you investigate the sbuilde chroot
> environment and figure out what goes on and how to fix it?
>
> Issue 2 has a bug report with the (a bit messy) debugging history
> documented: https://mariadb.atlassian.net/browse/MDEV-5618
>
>
> Both of these issues requires learning a bit about sbuilder CPU
> things, so I assume it is most efficient if the same persons looks
> into both of these.
>
>
> Thanks!
>

Comment by Otto Kekäläinen [ 2014-07-19 ]

Later versions of TokuDB have built OK on Launchpad, e.g. https://launchpadlibrarian.net/174495437/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.37-1~trusty1~ppa6_UPLOADING.txt.gz

Thus I can close this particular issue though other issues with TokuDB builds remain (MDEV-6449).

Generated at Thu Feb 08 07:05:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.