[MDEV-24307] Crashes in ALTER TABLE IMPORT TABLESPACE, Assertion dict0dict.cc, table->get_ref_count() == 0 Created: 2020-11-30 Updated: 2024-01-16 |
|
| Status: | Needs Feedback |
| Project: | MariaDB Server |
| Component/s: | Backup, Data Definition - Alter Table, Storage Engine - InnoDB |
| Affects Version/s: | 10.5.5 |
| Fix Version/s: | 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Nilsson | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Red Hat Enterprise Linux release 8.2 (Ootpa). x86 64-bit. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
I am daily replicating some tables between two machines using .ibd file transfer. I randomly get crashes when doing this and i replicate one schema at a time. On source machine:
On target:
On target for each table:
On target:
Example log below, i tried to follow the procedure by docs but might be overlooking something? I do things like dropping indexes on large table at source when reloading to get speed but this arises also on tables where i do not do this so do not see it would be related to a specific table modification at the source. Both machines are running same version and same OS. My solution now is simply make another try after db service restarts itself and then it seem to work. It keeps me afloat but it's a really ugly "solution" as i have plenty of services connected to this provider machine. I want to have "hotswap" of data all data at once and only at a specific trigger once a day when its ready cooked in source and that's why i'am using this approach instead of using a galera cluster setup. Thank you so much in advance for having look into this. Let me know what more data i could provide to ease the review. Best Regards From log
|
| Comments |
| Comment by Marko Mäkelä [ 2021-01-13 ] | ||||||
|
dene, are you using Galera replication? The stack trace seems to hint that, but wanted to double-check. | ||||||
| Comment by Daniel Nilsson [ 2021-01-13 ] | ||||||
|
Hi Marko, No galera is not used, i also checked it's not enabled by mistake on both machines. I will simplify my code, make sure I can provoke the crash happning with that sample and provide a small self contained python script. I'll be back. Br /Daniel | ||||||
| Comment by Marko Mäkelä [ 2021-01-14 ] | ||||||
|
dene, tack, det låter bra. | ||||||
| Comment by Daniel Nilsson [ 2021-01-15 ] | ||||||
|
Thanks, i will review that ticket before reporting next trace, i have built a script that reproduces exactly what i do in my service core but using some representative sample data (approx row size and same indexes) but using synchronous calls to keep it clear and verified it transaction by transaction. It does not break even running on the production installation where i have the issue. I analyzed the logs and i have days where it crashes the db after two small tables. So I will break out the smallest scenario and then build a service (running twisted matrix as the prod and same but stripped code) to replicate the my real scenario to see if i can find out where things go bad. Above is my current test script (you would init, backup, restore on it then provoke with restore or even --loop xxx when restoring i cannot get failure) so you see the flow but it does not cause issues even when replicating from one machine into another so don't spend time trying replicating based on this. | ||||||
| Comment by Daniel Black [ 2021-02-01 ] | ||||||
|
dene, thanks for the script and information. I tried on 10.5-73c43ee9ed067dd22a85f7f8241e33b14be7dd37 (pre 10.5.9 release) and 10.4-542d769ea1a22a7a6a87c9fe76ff911a162ade44 (latest as of now) and no assert or segfault was detected. (my test environment Fedora 33 builds with clang-11.0.0-2.fc33) I did small modifications to your script to use https://mariadb.com/docs/appdev/connector-python/ (attached) as I didn't have Python2 and MySQLdb didn't seem to be updated to support it. If your data is ok to communicate to a limited number of developers there is a ftp server https://mariadb.com/kb/en/meta/mariadb-ftp-server/ where tar.xz of the datadir/coredump with binary could be uploaded. The corepattern in your original bug report indicates you have coredumpctl installed. You might have the original recorded coredump still available. Look with coredumpctl list. If still available coredumpctl [dump|debug] /usr/sbin/mysqld Use
For the `debug` option and take the output from gdb.txt (from https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-backtraces-with-gdb-on-linux). Note this will probably contain your table names, but not contents. | ||||||
| Comment by Daniel Nilsson [ 2021-02-14 ] | ||||||
|
Hi, I now have a table that always causes the receiving machine to crash but it's a different exception. I made fresh installs of MariaDB 10.5.8 on both sides. And started with fresh datadirs on both machines. I have the exact same table creations in both places. If i export the ibd file on machine 1 and then discard the table space on the same machine copies the same file back and import it works fine (so all on the same machine). The same on machine 2 works fine while exporting and importing. But if i export the table on machine 1 and import on machine 2 it breaks.. Attached the case1_gdb.txt for this and the mariadb logfile from the crash. Also attached gdb log for one case of the initial reported issue (case2_gdb.txt) and the logfile (which does not provide debugging info that time.. the only one i have visible in case2_log.log Best Regards | ||||||
| Comment by Marko Mäkelä [ 2021-11-10 ] | ||||||
|
dene, sorry, I just now noticed your update. Unfortunately, no debugging symbols are installed, so neither case1_gdb.txt The most recent release (10.5.13) included several fixes to ALTER TABLE…IMPORT TABLESPACE. Does 10.5.13 work for you? | ||||||
| Comment by Marko Mäkelä [ 2023-12-14 ] | ||||||
|
dene, did you try a more recent version of MariaDB? | ||||||
| Comment by Daniel Nilsson [ 2023-12-16 ] | ||||||
|
Hi there, Short answer: no i did not so far. I ended up with using mydumper (/myloader) compiled into the backend container and load into shadow schemas of the "provider" once load is done we are locking, dropping old tables and moving new tables into place - takes a split second. This has worked well and take an hour for our 250M rows in total which is not ideal but stable. I have today dev, qual and prod setups and it's all containerized so its good timing to retry this. Development is now on MDB 11.2.2. Let me schedule some time and have a go at it and come back. I actually kept the replication mode in my code using idb transfer so should be easy to test. I need to revisit but should just be a configuration change and reinstate the nfs share to test it. Br /Daniel | ||||||
| Comment by Marko Mäkelä [ 2024-01-16 ] | ||||||
|
Hi dene, in MariaDB 11.2 you could use a cleaner workflow that was introduced in |