|
The dump file is around 5G and consists of roughly 10 tables. It can run several times successfully before it crashes and causes the database to restart. It always crashes on the same table 'notes' . When we take that table and move it from the middle of the list to be the first in the list, the script does not cause a crash. The 'notes' table has about 19M rows in it.
options used for mysqldump...
--max_allowed_packet=1048576000 --disable-keys -t --quick --quote-names --single-transaction
The structure of the 'notes' table....
CREATE TABLE `notes` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`accid` int(255) DEFAULT NULL,
`ndesc` longtext,
`ndate` varchar(14) DEFAULT NULL,
`nbycsr` varchar(255) DEFAULT NULL,
`permanent` tinyint(1) DEFAULT '0',
`PDNoteContent` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_notes_ndate` (`ndate`),
KEY `idx_notes_permanent` (`permanent`),
KEY `idx_notes_accidpermanent` (`accid`,`permanent`),
KEY `PDNoteContent` (`PDNoteContent`)
) ENGINE=InnoDB AUTO_INCREMENT=89895601 DEFAULT CHARSET=latin1;
|
|
Thanks for the report. We are very interested in reproducing this issue.
Does the dump file contain highly confidential information? If not, maybe you could upload it to ftp.askmonty.org/private (only MariaDB developers will have access to it). 5G is quite all right, but if you can compress it, it's even better.
So, you've upgraded from 5.5 to 10.1, but since you're loading the data dump, I presume that you started 10.1 on a fresh data directory, not on the old 5.5 one, is it right?
Was there anything in the error log before the assertion failure?
Which exact binaries are you using?
Please also paste or attach your cnf file(s) (regardless of whether you can provide the dump file or not).
|
|
Does the dump file contain highly confidential information?
I don't think we will be able to upload the dump file due to privacy concerns.
So, you've upgraded from 5.5 to 10.1, but since you're loading the data dump, I presume that you started 10.1 on a fresh data directory, not on the old 5.5 one, is it right?
We upgraded the existing database from 5.5 to 10.1 via tradition upgrade of binaries and running the mysql_upgrade script. This is a data warehouse and the dump file that is causing the crash is just a nightly batch load of data from a source database.
Was there anything in the error log before the assertion failure?
There were no other errors in the error log
Which exact binaries are you using?
MariaDB-client.x86_64 10.1.32-1.el6 @mariadb10
MariaDB-common.x86_64 10.1.32-1.el6 @mariadb10
MariaDB-compat.x86_64 10.1.32-1.el6 @mariadb10
MariaDB-server.x86_64 10.1.32-1.el6 @mariadb10
MariaDB-shared.x86_64 10.1.32-1.el6 @mariadb10
|
my.cnf
|
[MYSQLD]
|
user=mysql
|
basedir=/usr/
|
datadir=/var/lib/mysql
|
socket=/tmp/mysql.sock
|
pid_file=mysqld.pid
|
port=3306
|
log_error=error.log
|
#log_output=FILE
|
#INNODB OPTIONS
|
innodb_buffer_pool_size=50000M #Set to 70% of available RAM
|
innodb_log_buffer_size=16M
|
innodb_flush_log_at_trx_commit=2
|
innodb_file_per_table=1
|
innodb_data_file_path=ibdata1:100M:autoextend
|
#You may want to tune the below depending on number of cores and disk sub
|
innodb_read_io_threads=10
|
innodb_write_io_threads=10
|
innodb_io_capacity=100
|
innodb_log_file_size=512M
|
innodb_buffer_pool_instances=5
|
innodb_log_files_in_group=2
|
innodb_thread_concurrency=0
|
innodb_buffer_pool_dump_at_shutdown=0
|
innodb_buffer_pool_load_at_startup=0
|
innodb_flush_method=O_DIRECT
|
innodb_autoinc_lock_mode=2
|
#avoid statistics update when doing e.g show tables
|
innodb_stats_on_metadata=0
|
default_storage_engine=innodb
|
long_query_time=1
|
slow_query_log=1
|
slow_query_log_file=slow.log
|
userstat=1
|
|
|
#CHARACTER SET
|
#collation_server=utf8_unicode_ci
|
#init_connect='SET NAMES utf8'
|
#character_set_server=utf8
|
|
#REPLICATION SPECIFIC _ GENERAL
|
#server_id must be unique across all mysql servers participating in replication.
|
server_id=1
|
auto_increment_increment=1
|
auto_increment_offset=1
|
expire_logs_days=5
|
max_binlog_size=500M
|
sync_binlog=0
|
gtid_domain_id=2008
|
|
#REPLICATION SPECIFIC _ MASTER
|
binlog_format=ROW
|
log_bin=/var/lib/mysqlbinlogs/mysql-bin.log
|
master_info_file=/var/lib/mysqlbinlogs/master.info
|
|
#REPLICATION SPECIFIC _ SLAVE
|
#log_slave_updates=1
|
#relay_log=/var/lib/mysqlbinlogs/mysqld-relay-bin
|
#slave_skip_errors=1007,1008,1032,1048,1050,1051,1060,1061,1062,1136
|
#relay_log_info_file=/var/lib/mysqlbinlogs/relay-log.info
|
|
#REPLICATION IGNORE DB OR TABLES
|
replicate_wild_ignore_table=mysql.%
|
|
#OTHER THINGS, BUFFERS ETC
|
key_buffer_size=16384M
|
tmp_table_size=512M
|
max_heap_table_size=512M
|
max_allowed_packet=1000M
|
#sort_buffer_size=256K
|
#read_buffer_size=256K
|
#read_rnd_buffer_size=512K
|
myisam_sort_buffer_size=8M
|
skip_name_resolve
|
memlock=0
|
sysdate_is_now=1
|
max_connections=500
|
max_connect_errors=10000
|
thread_cache_size=512
|
query_cache_type=0
|
query_cache_size=0
|
table_open_cache=10240
|
lower_case_table_names=0
|
table_definition_cache=4096
|
open_files_limit=65535
|
|
[MYSQL]
|
socket=/tmp/mysql.sock
|
[client]
|
socket=/tmp/mysql.sock
|
[mysqldump]
|
max_allowed_packet=1600M
|
|
|
joe.oreste@xpressbet.com,
Thanks for the info.
So, was the table notes originally created still on 5.5 server, or after upgrade on 10.1 server?
Do you happen to remember which exact version was it, or if not, can you check the notes.frm file? You can do it yourself and extract 4 bytes at offset 0033, or just send us the frm file itself, or hexdump -v -C notes.frm output. It's only the structure, not the data, so no confidential information.
Which 5.5 version did you upgrade from? Also, if you still have the config file from 5.5, it might help as well.
Thanks.
|
|
Below is the requested information. One reminder: When we take the 'notes' table and move it from the middle of the list to be the first in the list in the generated dump file, the crash does not occur. Normally this dump file is generated and loaded nightly. For testing of this specific crash we have it running in a continuous loop. It take about 20 minutes to complete so it runs many times in a day. It will run for 2 to 6 hours before we get a crash, sometimes longer, sometimes shorter.
We enabled core dumps so I am uploading the latest core dump 'core.9430.gz' to ftp.askmonty.org/private.
Below is the requested information from the previous post...
So, was the table notes originally created still on 5.5 server, or after upgrade on 10.1 server?
I'm not sure of what exact version the 'notes' table was originally created under.
Which 5.5 version did you upgrade from?
MariaDB-client.x86_64 5.5.52-1.el6 @xor-mariadb
MariaDB-common.x86_64 5.5.52-1.el6 @xor-mariadb
MariaDB-compat.x86_64 5.5.52-1.el6 @xor-mariadb
MariaDB-server.x86_64 5.5.52-1.el6 @xor-mariadb
MariaDB-shared.x86_64 5.5.52-1.el6 @xor-mariadb
_ just send us the frm file itself, or hexdump -v -C notes.frm output_
I uploaded the notes.frm file to ftp.askmonty.org/private
if you still have the config file from 5.5, it might help as well.
It was essentially the same. Here are notes from our upgrade for what we did with my.cnf parameters
Removed:
innodb-doublewrite (no longer exists)
engine-condition-pushdown (default is on anyways)
Add:
innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1
gtid_domain_id=<Unique per master>--Used in multi-source replication
we have also further commented out several other parameters but the process still crashed. Here is the latest my.cnf used that still produced a crash.
[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/tmp/mysql.sock
pid_file=mysqld.pid
port=3306
log_error=error.log
#log_output=FILE
#INNODB OPTIONS
innodb_buffer_pool_size=50000M #Set to 70% of available RAM
innodb_log_buffer_size=16M
#JPO innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path=ibdata1:100M:autoextend
#You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=10
innodb_write_io_threads=10
#innodb_empty_free_list_algorithm=legacy
#JPO innodb_io_capacity=200
innodb_log_file_size=512M
#JPO innodb_buffer_pool_instances=5
innodb_log_files_in_group=2
#JPO(default)innodb_thread_concurrency=0
#JPOinnodb_buffer_pool_dump_at_shutdown=0
#JOPinnodb_buffer_pool_load_at_startup=0
innodb_flush_method=O_DIRECT
#JPO innodb_autoinc_lock_mode=2
#avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
default_storage_engine=innodb
long_query_time=1
slow_query_log=1
slow_query_log_file=slow.log
userstat=1
stack-trace
core-file
disable-gdb
#CHARACTER SET
#collation_server=utf8_unicode_ci
#init_connect='SET NAMES utf8'
#character_set_server=utf8
#REPLICATION SPECIFIC _ GENERAL
#server_id must be unique across all mysql servers participating in replication.
server_id=1
auto_increment_increment=1
auto_increment_offset=1
expire_logs_days=5
max_binlog_size=500M
sync_binlog=0
gtid_domain_id=2008
#REPLICATION SPECIFIC _ MASTER
binlog_format=ROW
log_bin=/var/lib/mysqlbinlogs/mysql-bin.log
master_info_file=/var/lib/mysqlbinlogs/master.info
#REPLICATION SPECIFIC _ SLAVE
#log_slave_updates=1
#relay_log=/var/lib/mysqlbinlogs/mysqld-relay-bin
#slave_skip_errors=1007,1008,1032,1048,1050,1051,1060,1061,1062,1136
#relay_log_info_file=/var/lib/mysqlbinlogs/relay-log.info
#REPLICATION IGNORE DB OR TABLES
replicate_wild_ignore_table=mysql.%
#OTHER THINGS, BUFFERS ETC
key_buffer_size=16384M
tmp_table_size=512M
max_heap_table_size=512M
max_allowed_packet=1024M
#sort_buffer_size=256K
#read_buffer_size=256K
#read_rnd_buffer_size=512K
myisam_sort_buffer_size=8M
skip_name_resolve
#JPO(default)memlock=0
#JPOsysdate_is_now=1
max_connections=500
max_connect_errors=10000
thread_cache_size=512
#JPO query_cache_type=0
#JPO(Default)query_cache_size=0
table_open_cache=10240
lower_case_table_names=0
table_definition_cache=4096
open_files_limit=65535
[MYSQL]
socket=/tmp/mysql.sock
[client]
socket=/tmp/mysql.sock
[mysqldump]
max_allowed_packet=1024M
[MYSQLD_SAFE]
#transaction-isolation=READ-COMMITTED
pid_file=/var/lib/mysql/mysqld.pid
log_error=/var/lib/mysql/error.log
basedir=/usr/
datadir=/var/lib/mysql
core-file-size=unlimited
|
|
when you create the gdb.txt file...the sql satement that is present at the time of the crash, succeeds in being inserted in the destination location on subsequent imports. each time the db crashes it does not crash on the same insert statement. Although sometimes it does, generally it crashes on a different insert statement.
|
|
Thanks for all the information.
The frm file says 10.1.32, so the table must have been re-created or at least altered (my far-fetched suspicion was that it might have something to do with the table having been created by 5.5, while of course there are other factors, which make the failure sporadic and dependent on the sequence of events).
You said that you run the dump loading nightly, and in a loop during testing. How does it work? According to the parameters of your dump, there is no table (re-)creation, so the table stays the same, unless you drop/create it between the runs. Do you? Or if not, do you truncate it between the runs, or does it grow ~19M rows bigger every time you run the script?
I'm asking in order to attempt to re-create your exact scenario as closely as possible.
|
|
the script truncates the destination tables, generates a mysqldump of the source tables, then loads the dump file into the destination tables. Normally this script runs once per day. But we having it set to run in a loop so that we can speed up the occurrences of a crash. The process takes about 30 minutes to run. With it running in a continuous loop it last ran 23 times successfully before it crashed. Then after the db restarted, it ran 4 times before it crashed again.
|
|
It appears I didn't ask, so just in case – did you try to run CHECK TABLE on `notes` table?
|
|
joe.oreste@xpressbet.com, this is interesting:
the script truncates the destination tables, generates a mysqldump of the source tables, then loads the dump file into the destination tables.
Are there any FOREIGN KEY constraints? If yes (and if you do not SET foreign_key_checks=0), TRUNCATE TABLE would be executed as DELETE FROM (without a WHERE clause). In that case, the TRUNCATE would involve 2 operations for every index record in the table: delete-marking, and eventually deleting (purging) the row.
If an INSERT of a key arrives before the purge of that key, then it could be executed as an update of the purgeable delete-marked record (updating it to not being delete-marked). There could be some glitch in this area.
If the failure repeats reasonably often for you, maybe you could validate my hypothesis that the InnoDB change buffering has something to do with this. Does the server crash if you try one of these settings:
SET GLOBAL innodb_change_buffering=changes;
|
SET GLOBAL innodb_change_buffering=inserts;
|
SET GLOBAL innodb_change_buffering=none;
|
|
|
The assertion fails when trying to delete-unmark a secondary index record. The problem there is that the record already existed in delete-unmarked form.
Any table row operation in InnoDB involves modifying each index in a separate mini-transaction, starting from the primary key (clustered index). If any operation on any index fails, the whole row operation should be rolled back in every index. This rollback is done based on an undo log record that was written before the clustered index was modified, so the row-rollback should take place even if the server was killed and restarted in between. (Usually this row-rollback is followed by a rollback to the start of the statement, or a rollback of the whole transaction.)
We can have 'orphan' delete-marked records in secondary indexes. If an indexed column was updated N times, we could have N delete-marked secondary index records pointing to the same PK and 1 delete-unmarked record that contains the latest value of the secondary key. The delete-marked records should eventually be removed by purge, when they are no longer accessible by any old read views. An insert operation may update a delete-marked record into a not-delete-marked one. The rollback of such inserts should perform a purge-like operation (remove the delete-marked record if it is purgeable).
Normally we should never have a delete-unmarked record in a secondary index that is pointing to a non-existing primary key. Because we must already have checked that the PRIMARY KEY value did not exist in the clustered index (before we inserted or delete-unmarked it), we are expecting the record not to exist in the secondary indexes either. But, the assertion fails. Note: before the assertion we successfully asserted that the secondary index record is identical with the one that we are trying to insert (same values for the secondary index and same primary key columns).
The question is: How can we get a delete-unmarked record in a secondary index while the primary key record is deleted or delete-marked?
|
|
did you try to run CHECK TABLE on `notes` table?
After the database crashes/restarts I did a check table notes extended and the status was OK.
Are there any FOREIGN KEY constraints?
There are no foreign key constraints defined either to or from the 'notes' table. There are no foreign key constraints defined in information_schema.TABLE_CONSTRAINTS
Does the server crash if you try one of these settings:
SET GLOBAL innodb_change_buffering=changes;
SET GLOBAL innodb_change_buffering=inserts;
SET GLOBAL innodb_change_buffering=none;
The server crashes for all of the above settings.
|
|
joe.oreste@xpressbet.com, thank you for the answers. This one is the most interesting:
After the database crashes/restarts I did a check table notes extended and the status was OK.
Years ago, I remember a QA engineer mentioning something similar: an InnoDB corruption that would seem to heal itself after server restart. This makes me suspect yet another possibility: corruption of the adaptive hash index. Does the corruption occur if you disable it?
SET GLOBAL innodb_adaptive_hash_index=OFF;
|
This can also be set in the my.ini file.
The adaptive hash index aims to speed up key lookups in B-tree indexes. By design, it can deliver false negatives (fail to find a key that exists), but it should never deliver false positives (return a key that was deleted). Whenever a record is deleted (purged permanently) from a page (or moved between pages on a split or merge), it should be deleted from the adaptive hash index. If this fails to happen, that could explain such ‘transient’ corruption. Sometimes purged records may remain in the same place on the page, in the PAGE_FREE stack. For such corruption to be possible, the space occupied by the purged record should not be reused for a subsequently inserted record on the same page.
One more thing:
The server crashes for all of the above settings.
I would like you to repeat the problem in a more controlled fashion. A feature of the InnoDB change buffer is that it is persistent, and the buffered changes can be applied (merged) to secondary indexes much later. If the corruption is related to that, it would not be detected until the merge has taken place. The setting innodb_change_buffering will only control the buffering of future operations; any buffered operations may still be merged unless the index is dropped. If there still exist buffered changes (for some other index page), these could cause further corruption.
You did mention that after restart, CHECK TABLE is not reporting any issue. Still, to play it safe, I would suggest dropping and re-creating all the indexes before each retry. Unfortunately the crash message does not indicate which secondary index is being operated on, so we would have to rebuild them all:
ALTER TABLE notes DROP INDEX idx_notes_ndate, DROP INDEX idx_notes_permanent, DROP INDEX idx_notes_accidpermanent, DROP INDEX PDNoteContent, ALGORITHM=INPLACE;
|
ALTER TABLE notes ADD INDEX idx_notes_ndate (ndate), ADD INDEX idx_notes_permanent (permanent), ADD INDEX idx_notes_accidpermanent (accid,permanent), ADD INDEX (PDNoteContent), ALGORITHM=INPLACE;
|
|
|
joe.oreste@xpressbet.com,
Thanks a lot for all provided data. Out of all reports for the same problem, this was the one that eventually allowed to reproduce it in-house.
I don't have a good quick test case yet, it takes me on average 20-40 min to hit the failure and it doesn't happen every time, but now we can definitely reproduce it on the given version (10.1.32) with the given configuration. I think we can do further experimenting locally.
|
|
It appears the issue is with the innodb_adaptive_hash_index. With innodb_adaptive_hash_index=OFF and innodb_change_buffering=(all or none) there are no further crashes.
You did mention that after restart, CHECK TABLE is not reporting any issue. Still, to play it safe, I would suggest dropping and re-creating all the indexes before each retry. Unfortunately the crash message does not indicate which secondary index is being operated on, so we would have to rebuild them all:
Instead of doing the drop/recreate index as you recommend, we just commented out our script truncate statements and used flags in mysqldump to drop/create tables and indexes and then insert data. When we did this there were no crashes with innodb_adaptive_hash_index=ON and innodb_change_buffering=all.
The corruption is happening when using truncate statements with innodb_adaptive_hash_index=ON.
|
|
I brought the test case to the point when I could run bisecting. It appears that at least the non-debug assertion failure in question is gone in the current 10.1 tree, particularly after this bugfix:
commit c4eb4bcef648eb2ebdc6edc06905f39f95ef7f6b
|
Author: Marko Mäkelä <marko.makela@mariadb.com>
|
Date: Tue Jun 26 11:34:51 2018 +0300
|
|
MDEV-16515 InnoDB: Failing assertion: ++retries < 10000 in file
|
dict0dict.cc
|
|
buf_LRU_drop_page_hash_for_tablespace(): Return whether any adaptive
|
hash index entries existed. If yes, the caller should keep retrying to
|
drop the adaptive hash index.
|
|
row_import_for_mysql(), row_truncate_table_for_mysql(),
|
row_drop_table_for_mysql(): Ensure that the adaptive hash index was
|
entirely dropped for the table.
|
I'm not yet sure that all related problems are gone, as my test case doesn't reproduce all variations easily.
marko, what do you think – is it possible that the patch above really fixes the problem, or does it just mask it?
Later update
I've realized we don't have a proper stack trace anywhere in this issue, so I'm adding it to make it searchable and also for the reference, as I keep comparing this issue to other similar ones.
|
10.1.29 non-debug
|
Version: '10.1.29-MariaDB' socket: '/data/logs/vardir/mysql.sock' port: 19300 MariaDB Server
|
InnoDB: error in sec index entry update in
|
InnoDB: index `idx_c` of table `test`.`t1`
|
InnoDB: tuple DATA TUPLE: 2 fields;
|
0: len 8; hex 656d657267696e67; asc emerging;;
|
1: len 4; hex 80000076; asc v;;
|
|
InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
|
0: len 9; hex 656d657267656e6379; asc emergency;;
|
1: len 4; hex 80000f9b; asc ;;
|
|
TRANSACTION 1349, ACTIVE 0 sec updating or deleting
|
mysql tables in use 2, locked 2
|
71 lock struct(s), heap size 13864, 5193 row lock(s), undo log entries 118
|
MySQL thread id 6, OS thread handle 0x7efd1ff40b00, query id 63 localhost 127.0.0.1 root Sending data
|
REPLACE INTO `t1` SELECT * FROM `t1` /* QNO 42 CON_ID 6 */
|
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
2018-08-02 13:45:14 7efd1ff40b00 InnoDB: Assertion failure in thread 139625627912960 in file row0ins.cc line 285
|
InnoDB: Failing assertion: *cursor->index->name == TEMP_INDEX_PREFIX
|
#4 0x00007efd1e9ee3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#5 0x000055ef4c6388ad in row_ins_sec_index_entry_by_modify (mtr=0x7efd1ff3c840, thr=0x7efcf1c2b130, entry=0x7efcf1c96068, heap=0x7efcf1c96a00, offsets_heap=0x7efcf1c96500, offsets=0x7efd1ff3c6a0, cursor=0x7efd1ff3c6e0, mode=2, flags=0) at /home/buildbot/buildbot/build/storage/xtradb/row/row0ins.cc:285
|
#6 row_ins_sec_index_entry_low (flags=flags@entry=0, mode=mode@entry=2, index=index@entry=0x7efcf1c2f068, offsets_heap=offsets_heap@entry=0x7efcf1c96500, heap=heap@entry=0x7efcf1c96a00, entry=entry@entry=0x7efcf1c96068, trx_id=trx_id@entry=0, thr=thr@entry=0x7efcf1c2b130) at /home/buildbot/buildbot/build/storage/xtradb/row/row0ins.cc:2869
|
#7 0x000055ef4c63aac4 in row_ins_sec_index_entry (index=index@entry=0x7efcf1c2f068, entry=0x7efcf1c96068, thr=thr@entry=0x7efcf1c2b130) at /home/buildbot/buildbot/build/storage/xtradb/row/row0ins.cc:3063
|
#8 0x000055ef4c66641e in row_upd_sec_index_entry (node=node@entry=0x7efcf1ccfc48, thr=thr@entry=0x7efcf1c2b130) at /home/buildbot/buildbot/build/storage/xtradb/row/row0upd.cc:2043
|
#9 0x000055ef4c66a322 in row_upd_sec_step (thr=<optimized out>, node=0x7efcf1ccfc48) at /home/buildbot/buildbot/build/storage/xtradb/row/row0upd.cc:2070
|
#10 row_upd (thr=0x7efcf1c2b130, node=0x7efcf1ccfc48) at /home/buildbot/buildbot/build/storage/xtradb/row/row0upd.cc:2877
|
#11 row_upd_step (thr=thr@entry=0x7efcf1c2b130) at /home/buildbot/buildbot/build/storage/xtradb/row/row0upd.cc:2994
|
#12 0x000055ef4c648cd3 in row_update_for_mysql (mysql_rec=mysql_rec@entry=0x7efcf1c2e948 "\bv", prebuilt=0x7efcf1ccf068) at /home/buildbot/buildbot/build/storage/xtradb/row/row0mysql.cc:1875
|
#13 0x000055ef4c597ad3 in ha_innobase::update_row (this=<optimized out>, old_row=0x7efcf1c2e948 "\bv", new_row=0x7efcf1c2e820 "\374v") at /home/buildbot/buildbot/build/storage/xtradb/handler/ha_innodb.cc:9446
|
#14 0x000055ef4c37d98a in handler::ha_update_row (this=0x7efcf1c6f020, old_data=0x7efcf1c2e948 "\bv", new_data=0x7efcf1c2e820 "\374v") at /home/buildbot/buildbot/build/sql/handler.cc:6014
|
#15 0x000055ef4c1e28c4 in write_record (thd=0x7efcfaeaa008, table=0x7efcf1cf1c08, info=info@entry=0x7efcf1c24f58) at /home/buildbot/buildbot/build/sql/sql_insert.cc:1856
|
#16 0x000055ef4c1e2f80 in send_data (this=0x7efcf1c24f18, values=...) at /home/buildbot/buildbot/build/sql/sql_insert.cc:3710
|
#17 select_insert::send_data (this=0x7efcf1c24f18, values=...) at /home/buildbot/buildbot/build/sql/sql_insert.cc:3674
|
#18 0x000055ef4c2410b3 in end_send (join=0x7efcf1c24fb8, join_tab=<optimized out>, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:19547
|
#19 end_send (join=0x7efcf1c24fb8, join_tab=<optimized out>, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:19517
|
#20 0x000055ef4c22744f in evaluate_join_record (join=join@entry=0x7efcf1c24fb8, join_tab=join_tab@entry=0x7efcf1c277d0, error=error@entry=0) at /home/buildbot/buildbot/build/sql/sql_select.cc:18636
|
#21 0x000055ef4c23102b in sub_select (join=0x7efcf1c24fb8, join_tab=0x7efcf1c277d0, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:18452
|
#22 0x000055ef4c23ef4d in do_select (join=join@entry=0x7efcf1c24fb8, fields=fields@entry=0x7efcf1c25388, table=table@entry=0x0, procedure=0x0) at /home/buildbot/buildbot/build/sql/sql_select.cc:18068
|
#23 0x000055ef4c25285e in JOIN::exec_inner (this=this@entry=0x7efcf1c24fb8) at /home/buildbot/buildbot/build/sql/sql_select.cc:3225
|
#24 0x000055ef4c2547ed in JOIN::exec (this=this@entry=0x7efcf1c24fb8) at /home/buildbot/buildbot/build/sql/sql_select.cc:2512
|
#25 0x000055ef4c2511ba in mysql_select (thd=thd@entry=0x7efcfaeaa008, rref_pointer_array=rref_pointer_array@entry=0x7efcfaeae2f0, tables=<optimized out>, wild_num=<optimized out>, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489958656, result=result@entry=0x7efcf1c24f18, unit=unit@entry=0x7efcfaead948, select_lex=select_lex@entry=0x7efcfaeae048) at /home/buildbot/buildbot/build/sql/sql_select.cc:3449
|
#26 0x000055ef4c254ad5 in handle_select (thd=thd@entry=0x7efcfaeaa008, lex=lex@entry=0x7efcfaead880, result=result@entry=0x7efcf1c24f18, setup_tables_done_option=setup_tables_done_option@entry=1073741824) at /home/buildbot/buildbot/build/sql/sql_select.cc:384
|
#27 0x000055ef4c202df3 in mysql_execute_command (thd=thd@entry=0x7efcfaeaa008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:4021
|
#28 0x000055ef4c2038a5 in mysql_parse (thd=0x7efcfaeaa008, rawbuf=<optimized out>, length=<optimized out>, parser_state=0x7efd1ff3f640) at /home/buildbot/buildbot/build/sql/sql_parse.cc:7344
|
#29 0x000055ef4c20639e in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7efcfaeaa008, packet=0x7efcf1c24020 "REPLACE INTO `t1` SELECT * FROM `t1` /* QNO 42 CON_ID 6 */", packet_length=58, packet_length@entry=60) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1477
|
#30 0x000055ef4c20690e in do_command (thd=0x7efcfaeaa008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1106
|
#31 0x000055ef4c2c721f in do_handle_one_connection (thd_arg=thd_arg@entry=0x7efcfaeaa008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1349
|
#32 0x000055ef4c2c7357 in handle_one_connection (arg=0x7efcfaeaa008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1261
|
#33 0x00007efd1f3e5494 in start_thread (arg=0x7efd1ff40b00) at pthread_create.c:333
|
#34 0x00007efd1eaa293f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
|
10.1.29 debug
|
Version: '10.1.29-MariaDB-debug' socket: '/data/logs/vardir/mysql.sock' port: 19300 Source distribution
|
InnoDB: error in sec index entry update in
|
InnoDB: index `idx_c` of table `test`.`t1`
|
InnoDB: tuple DATA TUPLE: 2 fields;
|
0: len 10; hex 696e7370656374696f6e; asc inspection;;
|
1: len 4; hex 80000072; asc r;;
|
|
InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
|
0: len 6; hex 696e73697374; asc insist;;
|
1: len 4; hex 80001035; asc 5;;
|
|
TRANSACTION 1364, ACTIVE 0 sec updating or deleting
|
mysql tables in use 2, locked 2
|
71 lock struct(s), heap size 14776, 5189 row lock(s), undo log entries 114
|
MySQL thread id 6, OS thread handle 0x7fe4544b9b00, query id 73 localhost 127.0.0.1 root Sending data
|
REPLACE INTO `t1` SELECT * FROM `t1` /* QNO 52 CON_ID 6 */
|
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
2018-08-02 13:49:06 7fe4544b9b00 InnoDB: Assertion failure in thread 140618643512064 in file row0upd.cc line 1962
|
InnoDB: Failing assertion: 0
|
#5 0x00007fe4524ab3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#6 0x0000556347346425 in row_upd_sec_index_entry (node=0x7fe425127c70, thr=0x7fe42512f140) at /data/src/10.1.29/storage/xtradb/row/row0upd.cc:1962
|
#7 0x000055634734691f in row_upd_sec_step (node=0x7fe425127c70, thr=0x7fe42512f140) at /data/src/10.1.29/storage/xtradb/row/row0upd.cc:2070
|
#8 0x00005563473485ae in row_upd (node=0x7fe425127c70, thr=0x7fe42512f140) at /data/src/10.1.29/storage/xtradb/row/row0upd.cc:2877
|
#9 0x0000556347348893 in row_upd_step (thr=0x7fe42512f140) at /data/src/10.1.29/storage/xtradb/row/row0upd.cc:2994
|
#10 0x0000556347309b29 in row_update_for_mysql (mysql_rec=0x7fe42505acb0 "\244r", prebuilt=0x7fe425127078) at /data/src/10.1.29/storage/xtradb/row/row0mysql.cc:1875
|
#11 0x00005563471f9661 in ha_innobase::update_row (this=0x7fe4250bc888, old_row=0x7fe42505acb0 "\244r", new_row=0x7fe42505ab88 "\371r") at /data/src/10.1.29/storage/xtradb/handler/ha_innodb.cc:9446
|
#12 0x0000556346ec95ff in handler::ha_update_row (this=0x7fe4250bc888, old_data=0x7fe42505acb0 "\244r", new_data=0x7fe42505ab88 "\371r") at /data/src/10.1.29/sql/handler.cc:6014
|
#13 0x0000556346c60571 in write_record (thd=0x7fe42eb56070, table=0x7fe425138e70, info=0x7fe42504ffc0) at /data/src/10.1.29/sql/sql_insert.cc:1857
|
#14 0x0000556346c6548b in select_insert::send_data (this=0x7fe42504ff80, values=...) at /data/src/10.1.29/sql/sql_insert.cc:3710
|
#15 0x0000556346cec3d8 in end_send (join=0x7fe425050020, join_tab=0x7fe425052c80, end_of_records=false) at /data/src/10.1.29/sql/sql_select.cc:19547
|
#16 0x0000556346cea151 in evaluate_join_record (join=0x7fe425050020, join_tab=0x7fe425052938, error=0) at /data/src/10.1.29/sql/sql_select.cc:18636
|
#17 0x0000556346ce9bf7 in sub_select (join=0x7fe425050020, join_tab=0x7fe425052938, end_of_records=false) at /data/src/10.1.29/sql/sql_select.cc:18452
|
#18 0x0000556346ce92c6 in do_select (join=0x7fe425050020, fields=0x7fe4250503f0, table=0x0, procedure=0x0) at /data/src/10.1.29/sql/sql_select.cc:18068
|
#19 0x0000556346cc3c7e in JOIN::exec_inner (this=0x7fe425050020) at /data/src/10.1.29/sql/sql_select.cc:3225
|
#20 0x0000556346cc0f0f in JOIN::exec (this=0x7fe425050020) at /data/src/10.1.29/sql/sql_select.cc:2512
|
#21 0x0000556346cc44a5 in mysql_select (thd=0x7fe42eb56070, rref_pointer_array=0x7fe42eb5a518, tables=0x7fe42504f8f0, wild_num=1, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489958656, result=0x7fe42504ff80, unit=0x7fe42eb59b70, select_lex=0x7fe42eb5a270) at /data/src/10.1.29/sql/sql_select.cc:3449
|
#22 0x0000556346cb9eae in handle_select (thd=0x7fe42eb56070, lex=0x7fe42eb59aa8, result=0x7fe42504ff80, setup_tables_done_option=1073741824) at /data/src/10.1.29/sql/sql_select.cc:384
|
#23 0x0000556346c82ff8 in mysql_execute_command (thd=0x7fe42eb56070) at /data/src/10.1.29/sql/sql_parse.cc:4021
|
#24 0x0000556346c8d4bb in mysql_parse (thd=0x7fe42eb56070, rawbuf=0x7fe42504f088 "REPLACE INTO `t1` SELECT * FROM `t1` /* QNO 52 CON_ID 6 */", length=58, parser_state=0x7fe4544b8630) at /data/src/10.1.29/sql/sql_parse.cc:7344
|
#25 0x0000556346c7c064 in dispatch_command (command=COM_QUERY, thd=0x7fe42eb56070, packet=0x7fe4251a2071 " REPLACE INTO `t1` SELECT * FROM `t1` /* QNO 52 CON_ID 6 */ ", packet_length=60) at /data/src/10.1.29/sql/sql_parse.cc:1477
|
#26 0x0000556346c7ade9 in do_command (thd=0x7fe42eb56070) at /data/src/10.1.29/sql/sql_parse.cc:1106
|
#27 0x0000556346db3af1 in do_handle_one_connection (thd_arg=0x7fe42eb56070) at /data/src/10.1.29/sql/sql_connect.cc:1349
|
#28 0x0000556346db3855 in handle_one_connection (arg=0x7fe42eb56070) at /data/src/10.1.29/sql/sql_connect.cc:1261
|
#29 0x00007fe4541a6494 in start_thread (arg=0x7fe4544b9b00) at pthread_create.c:333
|
#30 0x00007fe45255f93f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
|
|
elenst, your test case uses TRUNCATE TABLE and innodb_adaptive_hash_index=ON. I think that before the MDEV-16515 fix, this can lead to corruption of the adaptive hash index (AHI). Given that the TRUNCATE would reuse the same dict_index_t objects, which buf_block_t::index is pointing to for the AHI, the corruption will not lead to an immediate crash. In DROP INDEX or DROP TABLE or similar, we would free the dict_index_t objects, and the server could crash in more spectacular ways if the freed memory was reused for something else before AHI tries to dereference a stale block->index.
The failure mechanism would be that TRUNCATE is leaving some stale adaptive hash index entries for pages that are freed. Later, as more data is inserted, the page numbers will be reused for something else. Given that the inserted data is similar to the existing contents of the table, the byte addresses to the records in the adaptive hash index could remain valid (or they are simply pointing to pre-truncate freed pages that were not yet reused in the buffer pool). Timing is critical here.
|
|
As joe.oreste@xpressbet.com mentioned earlier, he only gets the failure with innodb_adaptive_hash_index=ON and with TRUNCATE, so it fits.
I think we can consider this particular variation of the problem fixed in scope of MDEV-16515, although remaining ones are still in question.
joe.oreste@xpressbet.com, 10.1.35 should be out soon enough, then you can upgrade and see if it helps. You might also want to watch MDEV-9663, which is where the most of activity regarding related issues is happening.
|
|
We have also tested for 24 hours replacing the TRUNCATE with DELETE FROM statement with innodb_adaptive_hash_index=ON for testing purposes and there were no crashes.
|
|
Also for the record, this variation of the failure (at least what's triggered by my test case) appeared in 10.1 with this commit (released in 10.1.29):
commit 51b4366bfb8a77c6341d7811f71c851e52f36f68
|
Author: Marko Mäkelä
|
Date: Thu Nov 2 22:38:37 2017 +0200
|
|
MDEV-13328 ALTER TABLE…DISCARD TABLESPACE takes a lot of time
|
|