[MDEV-5618] TokuDB tests fail when building 5.5.35 in buildd at Launchpad.net Created: 2014-02-05 Updated: 2014-07-19 Resolved: 2014-07-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 5.5.35 |
| Fix Version/s: | 5.5.37 |
| Type: | Bug | Priority: | Major |
| Reporter: | Otto Kekäläinen | Assignee: | Otto Kekäläinen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | buildd, debian, tests | ||
| Description |
|
I've been working on Debian packaging. My current latest version https://github.com/ottok/mariadb-5.5 builds OK on localhost with git-buildpackage and dpkg-buildpackage, and also on another build machine that runs git-buildpackage with pbuilder chroots. However when I upload the same package to Launchpad.net all the versions that build TokuDB fail to successfully build because the test run has TokuDB related fails:
Above was from build log at https://launchpadlibrarian.net/165042892/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.35-1~trusty1~ppa3_FAILEDTOBUILD.txt.gz More build logs at https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all The build failure for saucy-amd64 is identical. Launchpad uses buildd to build, and as that is the main difference to other build environments, I suspect there is some issue with 5.5.35 and buildd. |
| Comments |
| Comment by Elena Stepanova [ 2014-02-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Corresponding fragments in the build log look like this one:
That is, apparently TokuDB crashes on startup in all TokuDB tests. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-02-14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I managed to repeat this on a pbuilder instance on a machine that I control. See logs and file listings at http://labs.seravo.fi/~otto/mariadb-repo/MDEV-5618/ The error occurs only for trusty. Sid and wheezy with exact same build script, pbuilder setup and commit id built OK. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Rich, Could you please take a look at the stack trace above and Otto's notes to see if it looks anyhow familiar? I am not getting the crash on my Trusty VM, so it's certainly not something general for Trusty, but specific for the particular machine/environment that Otto is using. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I downloaded Otto's binaries from http://labs.seravo.fi/~otto/mariadb-repo/MDEV-5618/builddir-639ccb0-pbuilder.tar.bz, and with them I did get the crash:
The difference hids somewhere in ha_tokudb.so. If I replace the one from Otto's binaries with the one from mine, and keep the rest of binaries the same, the test passes all right. mine:
Otto's:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Rich Prohaska [ 2014-02-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
the crashes occur in the tokudb's 'toku_os_get_processor_frequency' function. there is a standalone test for this function in ft-index/portability/tests/test-cpu-freq.cc which can be used to see if the same crash occurs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I did, and it passed all right; but the tests are not built by default, so I had to build it on my machine, and when I build on my machine, the main library also works all right, it is the one that was built on Otto's that crashes. So, I guess this result doesn't give us any new information. I've tried to replicate all build parameters from Otto's build, but haven't managed to build a library that crashes the same way so far. However, I see something strange in those binaries. INFO_BIN says that the build was done on Linux-3.2.0-38-generic. Is it not too old for Trusty? Trusty comes with 3.13, while 3.2 is more like Precise kernel. I'm now trying to downgrade my Trusty to make it work on 3.2, but I'm not even sure it's feasible. What kind of machine can have this combination? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-02-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please remember the title of this issue. The main goal is to figure out why and fix the tests so that we can build MariaDB on Launchpad.net. Or if it's not possible then make a argumented decision to disable tests for the Ubuntu PPA builds (Launchpad.net). The pbuilder machine where I managed to repreat this Launchpad.net build error runs pbuilder on a Ubuntu 12.04 64 bit host, threrefore the old kernel. I don't think the kernel version is an issue here. But related to kernel the issue here might perhaps be related to some code that checks what processor is used or something like that? The chroot system in pbuilder and buildd (build system at Launchpad.net) might not show processor information the same way a real machine would show. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We cannot do anything about building binaries on a machine we don't have access to, don't have any consistent information about, and can't even get logs from.
Ubuntu 12.04 Precise had never even come up in the conversation till now. Please keep in mind that the only reason we thought you encountered the same failure on your machine was the assumption that you ran the same Trusty. If we take it off the table, your failures can just as well be completely unrelated to what you are getting on the LP machine. The fact that they happen on the same tests means nothing – they are just the first tokudb tests that are run, they fail because TokuDB fails to start on whatever reason, and we know nothing about this reason on LP machine. I suppose we just have to discard everything we assumed to be reliable information about this issue, and start from scratch. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-02-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I suppose you are right in that the root cause to the failure in the Launchpad.net and my pbuilder might be different. However at the moment I think it is likely that they are caused by the same error, as the 10 tests that fail in https://launchpadlibrarian.net/165042892/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.35-1~trusty1~ppa3_FAILEDTOBUILD.txt.gz and http://labs.seravo.fi/~otto/mariadb-repo/trusty-amd64/mariadb-5.5_5.5.35-1_amd64.build-639ccb0-pbuilder.log are exactly the same: Failing test(s): rpl-tokudb.tokudb_innodb_xa_crash tokudb_alter_table.ai_part tokudb_alter_table.drop_add_pk_part_104 tokudb_alter_table.hcad_part tokudb_alter_table.rename_column_cold_part_104 Also in both build batches (precise and trusty on Launchpad.net/buildd vs. sid, wheezy, trusty on my pbuilder) all builds but trusty run successfully, so that indicates the issue is trusty related. Buildd and pbuilder are both systems that run the builds chroot based environments. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I just explained in the previous comment why this fact is irrelevant.
It is getting hard to make sense out of it. Thanks. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-02-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok, sorry for the confusion. I'll start explaining from scratch: Basically what I've been doing here is that I have MariaDB 5.5.35 in a Git repository at https://github.com/ottok/mariadb-5.5.git where I've been working on the contents in debian/* to have better packaging. In debian/patches/* there are also stuff that affect code outside the debian/ directory. For quality assurance reasons I have set up a pbuilder system (https://en.wikipedia.org/wiki/Debian_build_toolchain#Isolated_build_environments) that downloads my sources and runs git-buildpackage (which runs dpkg-buildpackage) and builds the .debs for multiple distros, in one batch run. Here is a copy of that pbuilder script: http://labs.seravo.fi/~otto/mariadb-repo/build.sh When runs are successful it uploads binaries and build logs to http://labs.seravo.fi/~otto/mariadb-repo/ so that further testing can be done by installing the .debs from those repositories. I have also uploaded some of the failed build logs so that others can see them and help debug the failures. When the commit seems good, I push from my git repo to git.debian.org. From that repo the official Debian packages are built. I can also push to Launchpad.net where Ubuntu builds are done. Ubuntu builds logs show up at https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all The default build routine (as defined in debian/rules) includes a test run, which I haven't disabled, as I thought it is good to run so that we can be sure that the built binaries are OK. As the first 10 TokuDB tests all fail and apparently TokuDB crashes completely, there seems to be something wrong with how TokuDB is built by pbuilder/buildd in a Ubuntu 14.04 Trusty bootstrapped chroot. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks a lot, it is so much clearer this way. It is definitely not just Trusty, though. At the very least you are getting the same failure on Saucy: https://launchpadlibrarian.net/165028636/buildlog_ubuntu-saucy-amd64.mariadb-5.5_5.5.35-1~saucy1~ppa3_FAILEDTOBUILD.txt.gz Now, let me summarize the results that I see by the links. LP builds (ubuntu, amd64) from https://launchpad.net/~mysql-ubuntu/+archive/mariadb/+builds?build_text=&build_state=all: Trusty: TokuDB crashes on startup Local pbuilder builds per system, amd64: Precise: built without TokuDB Thus, we currently don't have any successful example of building an operational TokuDB for any ubuntu in either buildd or pbuilder at all. We also know that it is definitely about building TokuDB in these synthetic environments rather than running the tests there, since I downloaded your binaries and got the same failure on a "normal" Trusty (and Saucy) – it means that at this point you should not disable the tests, they certainly show the real problem, it is the build that needs to be fixed. With all this, we are almost ready to ask Rich for advice (again). Thanks. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-02-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I ran the test for trusty build binaries inside a trusty chroot like this: The CPU went 100% and the system was unresponsive for 40 minutes, until I decided to walk to the machine, which was also unresponsive and I was forced to make a power reset. There was no output in the terminal and logs are lost as the chroot was on a tmpfs mounted drive. I'm sorry but I think my time budget for this in used for now and I'll upload without TokuDB. It can be added back later. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-02-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It is a cc unit test (hence the name), it has nothing to do with MTR. In fact, it should not have even started since its name does not meet MTR requirements; but I saw the behavior you described several times when I had my mtr script corrupted.
I see. I'm afraid without any clear information on what is happening inside those machines, there is not much we can do, but I'll keep it open in case something new comes up. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-03-31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please comment to re-open or ping me on IRC when/if you have more information so that we can proceed with this. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-04-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I intend to re-engineer how TokuDB is built/tested in Debian builds to get this one solved. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tim Callaghan (Inactive) [ 2014-04-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
From the maria-dev list: The answer to your first question is, that's how CMake works. CMake's Cross Compiling guide says that it can't guess the target processor details, and you're supposed to provide that information either by explicitly setting the variables, or by providing a toolchain file: http://www.cmake.org/Wiki/CMake_Cross_Compiling I would be surprised if launchpad.net's infrastructure did not include suitable toolchain files, but this really isn't my area of expertise. If you can find suitable ones to use, then you should use them, otherwise I think you should probably just add something to the rules file to set those variables explicitly. Regarding your second problem, it sounds like your packaging scripts aren't properly linking with jemalloc as the first library, with --whole-archive. I say that because we get a failure (from Elena's stacktrace) inside jemalloc code when calling free() inside the library constructor: #2 <signal handler called> I've seen this happen before when a buffer is allocated with the system allocator's malloc() (as likely happens in this call to getline(3) https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L360), and then the fractal tree tries to free it with jemalloc (https://github.com/Tokutek/ft-index/blob/releases/tokudb-7.1/portability/portability.cc#L371). Please make sure that jemalloc is being linked properly (as the first library, and with --whole-archive) into mysqld. It is not sufficient to only link it to ha_tokudb.so, because in that case the process (mysqld) will be using the system allocator, and there will be some possibly inlined calls to jemalloc's interface inside ha_tokudb.so. If you need help with this, please show me how your linking is being done and I'll try to give the right advice. If it is against policy to ship with the allocator statically linked in a binary, then you should make sure jemalloc isn't linked in ha_tokudb.so anywhere, but I strongly recommend against that. On Mon, Apr 14, 2014 at 5:01 PM, Rich Prohaska <prohaska@tokutek.com> wrote: On Mon, Apr 14, 2014 at 5:03 AM, Otto Kekäläinen <otto@seravo.fi> wrote: Any chance of getting your comments on this..? Thanks! 2014-04-01 12:25 GMT+03:00 Otto Kekäläinen <otto@seravo.fi>: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2014-07-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Later versions of TokuDB have built OK on Launchpad, e.g. https://launchpadlibrarian.net/174495437/buildlog_ubuntu-trusty-amd64.mariadb-5.5_5.5.37-1~trusty1~ppa6_UPLOADING.txt.gz Thus I can close this particular issue though other issues with TokuDB builds remain ( |