Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2.10, 10.2(EOL), 10.3(EOL)
-
None
-
Ubuntu 64 bits. CPU: Intel Xeon E5-1660v3 - 8c/16t - 3GHz /3.5GHz
RAM: 128GB DDR4 ECC 2400 MHz
Disks: SoftRaid 2x4TB
Description
we are having the following assertion error and server is crashing on production on every 3-5 days.
Oct 24 17:46:15 ns535551 mysqld[30669]: 2017-10-24 17:46:15 0x7f1044218700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.2.9/storage/innobase/row/row0ins.cc line 274
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: Failing assertion: !cursor->index->is_committed()
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: We intentionally generate a memory trap.
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: If you get repeated assertion failures or crashes, even
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: immediately after the mysqld startup, there may be
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: corruption in the InnoDB tablespace. Please refer to
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
|
Oct 24 17:46:15 ns535551 mysqld[30669]: InnoDB: about forcing recovery.
|
Oct 24 17:46:15 ns535551 mysqld[30669]: 171024 17:46:15 [ERROR] mysqld got signal 6 ;
|
Oct 24 17:46:15 ns535551 mysqld[30669]: This could be because you hit a bug. It is also possible that this binary
|
Oct 24 17:46:15 ns535551 mysqld[30669]: or one of the libraries it was linked against is corrupt, improperly built,
|
Oct 24 17:46:15 ns535551 mysqld[30669]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Oct 24 17:46:15 ns535551 mysqld[30669]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Oct 24 17:46:15 ns535551 mysqld[30669]: We will try our best to scrape up some info that will hopefully help
|
Oct 24 17:46:15 ns535551 mysqld[30669]: diagnose the problem, but since we have already crashed,
|
Oct 24 17:46:15 ns535551 mysqld[30669]: something is definitely wrong and this may fail.
|
Oct 24 17:46:15 ns535551 mysqld[30669]: Server version: 10.2.9-MariaDB-10.2.9+maria~zesty-log
|
Oct 24 17:46:15 ns535551 mysqld[30669]: key_buffer_size=134217728
|
Oct 24 17:46:15 ns535551 mysqld[30669]: read_buffer_size=2097152
|
Oct 24 17:46:15 ns535551 mysqld[30669]: max_used_connections=887
|
Oct 24 17:46:15 ns535551 mysqld[30669]: max_threads=20002
|
Oct 24 17:46:15 ns535551 mysqld[30669]: thread_count=845
|
Oct 24 17:46:15 ns535551 mysqld[30669]: It is possible that mysqld could use up to
|
Oct 24 17:46:15 ns535551 mysqld[30669]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 82388470 K bytes of memory
|
Oct 24 17:46:15 ns535551 mysqld[30669]: Hope that's ok; if not, decrease some variables in the equation.
|
Oct 24 17:46:15 ns535551 mysqld[30669]: Thread pointer: 0x7f0f3c13f0a8
|
Oct 24 17:46:15 ns535551 mysqld[30669]: Attempting backtrace. You can use the following information to find out
|
Oct 24 17:46:15 ns535551 mysqld[30669]: where mysqld died. If you see no messages after this, something went
|
Oct 24 17:46:15 ns535551 mysqld[30669]: terribly wrong...
|
Oct 24 17:46:15 ns535551 mysqld[30669]: stack_bottom = 0x7f1044217cf8 thread_stack 0x49000
|
Oct 24 17:46:15 ns535551 mysqld[30669]: *** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
Oct 24 17:46:15 ns535551 mysqld[30669]: ======= Backtrace: =========
|
Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7f18bc07308b]
|
Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x54)[0x7f18bc114f84]
|
Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(+0x118f00)[0x7f18bc112f00]
|
Oct 24 17:46:15 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(+0x11aeda)[0x7f18bc114eda]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(my_addr_resolve+0x4c)[0x56114b087c4c]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(my_print_stacktrace+0x1bb)[0x56114b06e8eb]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(handle_fatal_signal+0x3cd)[0x56114ab0c6cd]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11670)[0x7f18bca63670]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x9f)[0x7f18bc02f77f]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f18bc03137a]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x409d3b)[0x56114a8c2d3b]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8918ea)[0x56114ad4a8ea]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8945dc)[0x56114ad4d5dc]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8d801e)[0x56114ad9101e]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8d8619)[0x56114ad91619]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x8a7b14)[0x56114ad60b14]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(+0x7e8813)[0x56114aca1813]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_ZN7handler13ha_update_rowEPKhPh+0x152)[0x56114ab16652]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x1c3d)[0x56114aa0ef6d]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x26af)[0x56114a974a5f]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x28a)[0x56114a97b40a]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1fb6)[0x56114a97e266]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z10do_commandP3THD+0x167)[0x56114a97eba7]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x25a)[0x56114aa451ca]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /usr/sbin/mysqld(handle_one_connection+0x3d)[0x56114aa4533d]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da)[0x7f18bca596da]
|
Oct 24 17:46:16 ns535551 mysqld[30669]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x5f)[0x7f18bc102d7f]
|
I checked the source code
It is happening here
if (!rec_get_deleted_flag(rec, rec_offs_comp(*offsets))) {
|
/* We should never insert in place of a record that
|
has not been delete-marked. The only exception is when
|
online CREATE INDEX copied the changes that we already
|
made to the clustered index, and completed the
|
secondary index creation before we got here. In this
|
case, the change would already be there. The CREATE
|
INDEX should be waiting for a MySQL meta-data lock
|
upgrade at least until this INSERT or UPDATE
|
returns. After that point, set_committed(true)
|
would be invoked in commit_inplace_alter_table(). */
|
ut_a(update->n_fields == 0);
|
ut_a(!cursor->index->is_committed());
|
ut_ad(!dict_index_is_online_ddl(cursor->index));
|
return(DB_SUCCESS);
|
}
|
We have started to have this problem after doing the following configuration
innodb_flush_log_at_trx_commit = 0
|
we have a chat application and inserting 1 million records to message table every day
one of these inserts is trying to use a secondary index which is already deleted and committed but not delete marked. That is why it is database is crashing with assertion failure. It seems to be a synchronization problem between change buffer and disk. Delete transaction on index is committed to table but it does not have delete marked flag probably waiting for change buffer to be merged.
This case can be reproducible with many inserts and deletes on the same table with parameter innodb_flush_log_at_trx_commit = 0 or innodb_flush_log_at_trx_commit = 2
in our case, it is happening after 3 - 5 days.
I configured database with the following parameter
innodb_change_buffering = inserts
|
and restarted. I disabled keeping change buffer for delete operations then the problem has been resolved database is working without crash for 17 days.
But this time delete operations are very slow.. 10 times slower than before and we are having replication delays
Attachments
- all_threads_Jan9
- 54 kB
- my.cnf
- 6 kB
- t65.test
- 7.84 MB
Issue Links
- is blocked by
-
MDEV-22924 Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
-
- Closed
-
- relates to
-
MDEV-9663 InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX, or !cursor->index->is_committed()
-
- Closed
-
-
MDEV-13637 InnoDB change buffer housekeeping can cause redo log overrun and possibly deadlocks
-
- Closed
-
-
MDEV-13899 IMPORT TABLESPACE may corrupt ROW_FORMAT=REDUNDANT tables
-
- Closed
-
-
MDEV-13980 InnoDB fails to discard record lock when discarding an index page
-
- Closed
-
-
MDEV-18272 InnoDB fails to rollback after exceeding FOREIGN KEY recursion depth
-
- Closed
-
-
MDEV-19338 InnoDB: Failing assertion: !cursor->index->is_committed()
-
- Closed
-
-
MDEV-22759 Failing assertion: !cursor->index->is_committed() upon update on table with HASH index
-
- Closed
-
-
MDEV-24449 Corruption of system tablespace or last recovered page
-
- Closed
-
-
MDEV-16797 Node keep on IST every a few hours (InnoDB: Failing assertion: !cursor->index->is_committed())
-
- Closed
-
-
MDEV-19775 Bug in WSREP/Galera with virtual columns (keeps crashing on startup)
-
- Closed
-
-
MDEV-20396 Server crashes after DELETE with SET NULL Foreign key and a virtual column in index
-
- Closed
-
-
MDEV-22061 InnoDB: Assertion of missing row in sec index row_start upon REPLACE on a system-versioned table
-
- Closed
-
-
MDEV-22076 InnoDB: Failing assertion: !cursor->index->is_committed() or Assertion `0' in row_purge_remove_sec_if_poss_leaf after DML/DDL with XA
-
- Closed
-
-
MDEV-22739 !cursor->index->is_committed() in row0ins.cc after update to 10.4.13 from 10.3.21
-
- Closed
-
Activity
This is our table structure on which we are having crash. We have 11 indexes on table, assertion failure is happening on index data.
CREATE TABLE `messagecontent` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`content` longtext COLLATE utf8mb4_unicode_ci,
`crDate` datetime DEFAULT NULL,
`fromProfileId` bigint(20) DEFAULT NULL,
`readDate` datetime DEFAULT NULL,
`status` varchar(1) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`subject` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`toProfileId` bigint(20) DEFAULT NULL,
`type` varchar(1) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`msgId` bigint(20) DEFAULT NULL,
`updDate` datetime DEFAULT NULL,
`updProfileId` bigint(20) DEFAULT NULL,
`subType` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `messagecontent_toprofileid_index` (`toProfileId`),
KEY `messagecontent_fromprofileid_index` (`fromProfileId`),
KEY `messagecontent_type_index` (`type`),
KEY `messagecontent_crDate_index` (`crDate`),
KEY `messagecontent_msgid_index` (`msgId`),
KEY `messagecontent_status_index` (`status`),
KEY `messagecontent_subtype_index` (`subType`),
KEY `messagecontent_fromtoprofileid_index` (`fromProfileId`,`toProfileId`),
KEY `messagecontent_toprofileidstatus_index` (`toProfileId`,`status`),
KEY `messagecontent_toprofileidtype_index` (`toProfileId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=92716335 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
we have 3 master to master replicated servers, our chat socket is connected to only one of these servers and this server has a lot of insert/update and delete transactions , crash is only happening on the server which chat socket is connected. Other servers never crashes even though they have same transactions by replication. When we switch our chat socket to any other server, it crashes within 3 days with the exception trace i sent before. Our servers crashed at least 10 times on production environment.
mehmet, I believe that this issue may share a root cause with MDEV-9663, which is a known but rarely occurring problem in MySQL as well. When I was employed at Oracle, I remember seeing this class of failures already during early 5.6 development, maybe even reported for 5.5.
I would greatly appreciate any efforts to reproduce this problem. There have been several bug fixes that may have fixed some cases of leading to this error, but I believe that the bug is still out there. Like I have written in MDEV-9663, I suspect that innodb_change_buffering=none or innodb_change_buffering=inserts may help avoid hitting this corruption. But we do not know even that; nobody has reported that it occurred frequently with the default setting, and not at all with the changed setting. (Note that if the database is already corrupted, disabling delete buffering will not help.)
mehmet, it looks very promising that you claim that you saw this problem reasonably often before you configured innodb_change_buffering=inserts. Is it also repeatable with innodb_change_buffering=changes? This setting enables insert and delete-mark buffering, but disables purge buffering.
Due to the nondeterministic nature of the InnoDB purge and the InnoDB change buffer, it is very hard to reproduce this. It may require the right innodb_buffer_pool_size and the right timing, and enough "padding operations" to force page eviction of some secondary indexes. To improve reproducibility, we could use the option innodb_change_buffering_debug=1 that is available in the debug server.
The setting innodb_flush_log_at_trx_commit=0 should not affect any InnoDB correctness, except for one thing: If the server is killed, some transaction commits may be lost. Everything should still be fully crash-safe, but the server may ‘jump to the past’, in case some redo log that corresponds to transaction commits had not yet been written to the redo log files. The redo log is linear, and the logical time is called LSN (log sequence number). It could simply be that changing this option changed the timing just enough to make the bug trigger more easily.
Yes this bug also exists at older versions. I have read a lot on internet to fix this issue, also tried full database repair & optimization, But problem has not been resolved until i set innodb_change_buffering=inserts
But this time, our deletes are very slow and having replication lag. We need to delete messages older than 30 days with 1000 rows chunks but it is taking very long time and server performance is getting worse. Table and index data is growing and decreasing overall server performance and we can not take any action to fix this.
Default settings are not causing this crash, but after we set innodb_flush_log_at_trx_commit = 0 we have started to get this problem. We have millions of transactions on message table and had performance issues so setting innodb_flush_log_at_trx_commit = 0 helped us a lot. We have tried both innodb_flush_log_at_trx_commit = 0 and innodb_flush_log_at_trx_commit = 2 server is crashing at both cases.
do you think that innodb_change_buffering=changes will help us to avoid this issue ? it may help to increase delete performance.
we have never used innodb_change_buffering=changes but wonder it will help or not, when the server is down, it is really very bad experience for our users.
I know it is hard to reproduce this issue
but once on production i executed the following query at least 20 times to delete old records
delete from MessageContent where id < 360619487 limit 10000;
then server crashed immediately after we have just restarted it
also when setting the innodb_buffer_pool_size to lower values seems to cause crash in short time
I hope you can reproduce and fix this issue, we have to use innodb_flush_log_at_trx_commit = 0 setting regarding performance issues otherwise with default settings, our system gets %50 slower.
We have understood the risk with innodb_flush_log_at_trx_commit = 0 and it is all right for us.
mehmet, thank you.
Your table schema (SHOW CREATE TABLE) and a sample of INSERT/DELETE/UPDATE statements would be helpful in reproducing this.
I wonder if you could try this in a test environment:
- Load a dump of the table into a freshly initialized database.
- Shut down and restart the server, with innodb_buffer_pool_load_at_startup=0 and with a small buffer pool.
- Perform the mass DELETE.
- Run CHECK TABLE.
Hopefully it would reproduce the issue. If the problem reproduces, then the experiment should be repeated with innodb_change_buffering=changes to see if it is related to the purge-buffering, or if already the delete-mark buffering is the culprit.
When I was employed at Oracle until last year, we did run into this problem from time to time, but not enough resources were assigned to repeat this problem, and I think that any directed efforts to reproduce this were stopped several years ago. There have been several minor fixes to the change buffering, but something clearly still remains.
I am considering to change the MariaDB default to innodb_change_buffering=inserts, to trade some performance for correctness.
In InnoDB, DELETE is a very inefficient operation. It will internally first perform a locking SELECT, then UPDATE the delete-mark of the records in all indexes. Some time after COMMIT, the InnoDB purge will perform index lookups to check if the delete-marked records can be safely deleted, and finally delete them. For bulk deletes, it would be much better if you could partition your data, so that old data could be removed by dropping partitions. If that is not possible, another possibility could be to copy the not-to-be-deleted records into a new table, and drop the old table.
Thank you @Marko for the information.
Unfortunately this problem is only happening on production when used by real members.
We have 2 slave mariadb databases. All insert/delete/update statements on master database are replicated to these slaves , but they never crashed. I guess, this problem is about timing & data consistency issue, somehow it is not broken on slaves even if we have same amount of transactions but only master database is crashing. That is why it is not reproducible with regular insert/delete/update statements. We have a chat application and our 5000 online members can delete, update or insert message content in to table at randomly and parallel. Somehow during insert process, an index record which is expected to be delete-marked is deleted & committed physically but not delete-marked is chosen for being used as new index entry and at this point, current code gives assertion failure since new entry is not delete-marked but committed. I guess there is latency between innodb change buffer and index data merging, commit is happening automatically after 1 second with parameter innodb_flush_log_at_trx_commit=0, if we remove this parameter to be default, each commit is performed when the transaction is completed so no latency for merging buffers happens.
I do not know internal logic of innodb but as i can see, we need to find a way to reproduce this case, it is not happening with regular mass insert/update/delete statements, if it was, all our slave databases also would crash.
We are using innodb_change_buffering=inserts parameter only on master database currently and it is not crashing but was having slow performance during delete operations and it was causing replication delays.
We have used the following data delete method from message table and reduced amount of data
CREATE TABLE MessageContent_new LIKE MessageContent;
INSERT INTO MessageContent_new SELECT * FROM MessageContent where crDate > DATE_ADD(CURDATE(), INTERVAL -30 DAY);
ALTER TABLE MessageContent RENAME MessageContent_old;
ALTER TABLE MessageContent_new RENAME MessageContent;
this helped performance a lot and deletes on this table are faster now.
Somehow we have found a workaround to overcome this issue.
We have tried latest versions of mysql, perconadb and mariadb on our system and observed that mariadb is giving the best performance for most of our queries . Everything was fine until we have found this bug. i think this bug should be reproduced & fixed because regular server crash on production is big headache, you know it will crash in 5 days but do not know when.. so you do not feel comfortable with your services.
Hi mehmet, I am glad that you were able to work around the issue.
While I understand that nobody wants the production server to crash, it would be very useful to know if the problem is repeatable with innodb_change_buffering=changes. I am not aware of anyone else being able to reproduce this issue, and this includes the MySQL and MariaDB QA departments. So, I would still want to ask for your help.
The setting innodb_change_buffering=changes should only slow down the background operation of purging old history; it allows the buffering of delete-mark and insert operations. The experiment would be useful for two aspects: If the bug does not occur with innodb_change_buffering=changes, we would have less code to suspect for the bug, and we could consider making it the default setting, because it should not degrade the performance of delete or update operations.
The change buffer works independently of user transactions. Changes are buffered when a qualifying secondary index leaf page is not in the buffer pool. Changes will be merged to the leaf page when the page is being read (either due to a secondary index lookup, or due to a background process for merging the change buffer). The change buffer B-tree is a persistent data structure that resides in the InnoDB system tablespace. All operations are redo-logged. I believe that the bug should be independent of the innodb_flush_log_at_trx_commit setting; the setting likely only affects the timing. And we seem to require very lucky timing for this bug to occur.
This bug should be present in all MySQL derivatives. I implemented the change buffering in MySQL 5.5, and there have not been significant changes since then.
Last night, in the middle of sleep, I got an idea of a possible scenario that could explain this, but unfortunately it was a wrong guess. I was thinking that when an INSERT operation is being rolled back (for example due to a duplicate key in another unique index, or due to a deadlock with another transaction) and the secondary index leaf page was meanwhile evicted from the buffer pool, then the change buffer would kick in and do something wrong. But, looking at row_undo_ins_remove_sec_low() I see that the change buffering is disabled in this case. Also in the other ROLLBACK operations (file row0umod.cc) the change buffering is disabled.
Side note: I suspect that I found one more bug in the InnoDB R-tree (SPATIAL INDEX) implementation that was introduced in MySQL 5.7 (and MariaDB 10.2). The function row_undo_ins_remove_sec_low() is using the flag BTR_RTREE_DELETE_MARK which suggests that it would not remove the record, but only mark it as deleted. The delete-marked garbage record would likely remain forever in the R-tree index.
Hi @Marko
I understand your point, you need to know where the bug is happening exactly
We can try innodb_change_buffering=changes setting when we need to restart our servers
Our system has worked on mariadb without any crash at least 1 year with innodb_flush_log_at_trx_commit=1 (default) setting. I also think that it is about timing issue.
I guess a new insert is using the existent secondary index record which is expected to be deleted-marked after being merged with change buffer.. generally deleted changed buffer records are merging before inserts but at some cases, merging is delayed and insert happens before them and this causes server crash. with this aspect I am afraid innodb_change_buffering=change will also lead server crash since it is not changing merge logic of change buffer with buffer pool.
As i indicated before, this crash is not happening on slave databases even though they have same amount of insert/delete/update transactions with same order because somehow timing of buffers are working different on master and slaves.
We still have innodb_change_buffering=all settings on slaves and there is no problem. innodb_change_buffering=inserts settings is only configured on master database and we do not have any crash with this model.
I understand that this problem is not easy to be fixed since it is not reproducible. Your bug suspect on R-tree (SPATIAL INDEX) also could be the reason. It is better to focus on reproducing this bug, after that i believe that fixing it will be easier. Maybe developing some test scripts which would perform transactions on database can help this. Our chat application performs many inserts/deletes and updates at the same time but interval between them is not static always changing.. this could be leading data synchronization problem or delay between change buffer and buffer pool. I believe that this bug is about timing between change buffer and buffer pool.
mehmet,
I've been trying to reproduce the problem (without any luck so far). One approach to it is imitating your workflow as closely as possible. We have your table structure and the config, and you also gave examples of your DELETE statements, so that's good; and we know the approximate amount of data that you insert at once; what's missing is how you insert the data. I don't know if it's important, but for the lack of better ideas, I'm going to imitate that as well. Is it big INSERT statements, or big transactions with small INSERT statements, or is it LOAD DATA, or..?
Also, you mentioned that you had this problem once after a set of DELETE statements and restarts; but is it always like so, does the problem always occur after you DELETE upon further server restart, or does it happen differently? And do you restart the server after the big data insertion?
All in all, if you noticed any pattern in the circumstances under which the problem occurs, could you please describe it?
Hi Elena,
I am sending insert update delete statements on our production environment, it may be important using same keys for update and delete transactions. We do not have a bulk insert or update or delete... it is a chat application with 5000 online people they are writing,reading and deleting messages.
I did bulk delete operation once to clear old data and server crashed. I think we should execute many insert, updates delete statements until all buffers are full , then perform bulk deletes. Our server crashed maybe 10 times in 2 months
It can be reproducible with random transactions.
update MessageContent use index(messagecontent_fromtoprofileid_index) set status='R',readDate='2017-12-29 19:05:19' where toProfileId = 2501508 and fromProfileId=2418870 and status ='C' and type in ('M','C') order by id ;
update MessageContent set status='R',readDate='2017-12-29 19:05:49' where id in (511123704) ;
insert into MessageContent (content, crDate, fromProfileId, msgId, readDate, status, subType, subject, toProfileId, type, updDate, updProfileId) values ('<br><span img-attach="T" img-profileid="294695" img-fileid="7287085" onclick="openImageUrl(this);" style="cursor: pointer;"><img src="/image?fileId=7287085&attachmentImage=true"></span><br><span img-attach="T" img-profileid="294695" img-fileid="7287103" onclick="openImageUrl(this);" style="cursor: pointer;"><img src="/image?fileId=7287103&attachmentImage=true"></span><br><span img-attach="T" img-profileid="294695" img-fileid="7287121" onclick="openImageUrl(this);" style="cursor: pointer;"><img src="/image?fileId=7287121&attachmentImage=true"></span><br><span img-attach="T" img-profileid="294695" img-fileid="7290602" onclick="openImageUrl(this);" style="cursor: pointer;"><img src="/image?fileId=7290602&attachmentImage=true"></span><br><span img-attach="T" img-profileid="294695" img-fileid="7291400" onclick="openImageUrl(this);" style="cursor: pointer;"><img src="/image?fileId=7291400&attachmentImage=true"></span>', '2017-12-29 19:06:07', 294695, 1514577936477, null, 'C', null, null, 2495079, 'C', null, null);
insert into MessageContent (content, crDate, fromProfileId, msgId, readDate, status, subType, subject, toProfileId, type, updDate, updProfileId) values ('Hola . <font color="#898989"><small>
{hi . thanks..}</small></font>', '2017-12-29 19:06:49', 2367426, 1514574409311, null, 'C', null, null, 2125355, 'C', null, null) ;
delete from MessageContent where fromProfileId=1914221 and toProfileId=2241632 ;
delete from MessageContent where fromProfileId=1730091 and toProfileId=2491365 ;
For what it is worth, the timing that allows this bug to be repeated may have been affected by an earlier fix that is only present in MariaDB, not MySQL:
MDEV-13637 InnoDB change buffer housekeeping can cause redo log overrun and possibly deadlocks
I got this exact assertion failure in last night tests on 10.3.
10.3 c5ac1f953bf6d4279967f03a8343303f715ad7aa |
Version: '10.3.4-MariaDB-debug-log' socket: '/home/travis/logs/vardir1_4/mysql.sock' port: 19300 Source distribution
|
2018-01-09 01:33:18 0x7f61f9651700 InnoDB: Assertion failure in file /home/travis/src/storage/innobase/row/row0ins.cc line 274
|
InnoDB: Failing assertion: !cursor->index->is_committed()
|
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f61b0015ae8): INSERT INTO t1 (col1,col2,col3,col4) VALUES ( 4, 4 - 1, 4, REPEAT(CAST(4 AS CHAR(1)),@fill_amount)) /* QNO 6953 CON_ID 15 */
|
Connection ID (thread ID): 15
|
#5 0x00007f6210ddd79b in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#6 0x0000000000ff3163 in ut_dbg_assertion_failed (expr=0x1543118 "!cursor->index->is_committed()", file=0x1542ff8 "/home/travis/src/storage/innobase/row/row0ins.cc", line=274) at /home/travis/src/storage/innobase/ut/ut0dbg.cc:61
|
#7 0x0000000000ee92fc in row_ins_sec_index_entry_by_modify (flags=0, mode=2, cursor=0x7f61f964dc20, offsets=0x7f61f964dce8, offsets_heap=0x7f61b01cf1e0, heap=0x7f61b01da6d0, entry=0x7f61b0091e70, thr=0x7f61c80fec88, mtr=0x7f61f964dd40) at /home/travis/src/storage/innobase/row/row0ins.cc:274
|
#8 0x0000000000ef0690 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7f61c8136098, offsets_heap=0x7f61b01cf1e0, heap=0x7f61b01da6d0, entry=0x7f61b0091e70, trx_id=0, thr=0x7f61c80fec88, dup_chk_only=false) at /home/travis/src/storage/innobase/row/row0ins.cc:3134
|
#9 0x0000000000ef0ef1 in row_ins_sec_index_entry (index=0x7f61c8136098, entry=0x7f61b0091e70, thr=0x7f61c80fec88, dup_chk_only=false) at /home/travis/src/storage/innobase/row/row0ins.cc:3314
|
#10 0x0000000000ef1080 in row_ins_index_entry (index=0x7f61c8136098, entry=0x7f61b0091e70, thr=0x7f61c80fec88) at /home/travis/src/storage/innobase/row/row0ins.cc:3360
|
#11 0x0000000000ef1616 in row_ins_index_entry_step (node=0x7f61c80fe9b8, thr=0x7f61c80fec88) at /home/travis/src/storage/innobase/row/row0ins.cc:3508
|
#12 0x0000000000ef19f7 in row_ins (node=0x7f61c80fe9b8, thr=0x7f61c80fec88) at /home/travis/src/storage/innobase/row/row0ins.cc:3650
|
#13 0x0000000000ef229e in row_ins_step (thr=0x7f61c80fec88) at /home/travis/src/storage/innobase/row/row0ins.cc:3907
|
#14 0x0000000000f122b9 in row_insert_for_mysql (mysql_rec=0x7f61c80cadd8 "\376\001 @\264\a\260a\177", prebuilt=0x7f61c80fdce8) at /home/travis/src/storage/innobase/row/row0mysql.cc:1481
|
#15 0x0000000000dba7bd in ha_innobase::write_row (this=0x7f61c80ef1d8, record=0x7f61c80cadd8 "\376\001 @\264\a\260a\177") at /home/travis/src/storage/innobase/handler/ha_innodb.cc:8367
|
#16 0x0000000000a86d2e in handler::ha_write_row (this=0x7f61c80ef1d8, buf=0x7f61c80cadd8 "\376\001 @\264\a\260a\177") at /home/travis/src/sql/handler.cc:6196
|
#17 0x0000000000771e94 in write_record (thd=0x7f61b0000c00, table=0x7f61c80d5320, info=0x7f61f964f0f0) at /home/travis/src/sql/sql_insert.cc:1932
|
#18 0x000000000076f8e5 in mysql_insert (thd=0x7f61b0000c00, table_list=0x7f61b0015c90, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false) at /home/travis/src/sql/sql_insert.cc:1060
|
#19 0x00000000007a4e31 in mysql_execute_command (thd=0x7f61b0000c00) at /home/travis/src/sql/sql_parse.cc:4697
|
#20 0x00000000007af65f in mysql_parse (thd=0x7f61b0000c00, rawbuf=0x7f61b0015ae8 "INSERT INTO t1 (col1,col2,col3,col4) VALUES ( 4, 4 - 1, 4, REPEAT(CAST(4 AS CHAR(1)),@fill_amount)) /* QNO 6953 CON_ID 15 */", length=124, parser_state=0x7f61f9650490, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:7968
|
#21 0x000000000079cd32 in dispatch_command (command=COM_QUERY, thd=0x7f61b0000c00, packet=0x7f61b001c3f1 "INSERT INTO t1 (col1,col2,col3,col4) VALUES ( 4, 4 - 1, 4, REPEAT(CAST(4 AS CHAR(1)),@fill_amount)) /* QNO 6953 CON_ID 15 */ ", packet_length=125, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:1825
|
#22 0x000000000079b759 in do_command (thd=0x7f61b0000c00) at /home/travis/src/sql/sql_parse.cc:1369
|
#23 0x00000000008f2f73 in do_handle_one_connection (connect=0x350c100) at /home/travis/src/sql/sql_connect.cc:1401
|
#24 0x00000000008f2cf5 in handle_one_connection (arg=0x350c100) at /home/travis/src/sql/sql_connect.cc:1307
|
#25 0x00007f621197ee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
|
#26 0x00007f6210e9a2ed in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
#27 0x0000000000000000 in ?? ()
|
All threads are attached as all_threads_Jan9 .
I'm not sure yet why it wasn't happening before and only happened now – maybe it's really just such low probability, or maybe recent tweaks in tests (tests get changed quite often) moved them in the right direction.
Since it's auto-tests in remote environment, I don't have a coredump or a general log, but now that we know we have the test flow which can potentially trigger it, we should be able to make some progress.
Test log (4th test run) is the one that hit this problem, other ones are irrelevant:
https://api.travis-ci.org/v3/job/326434774/log.txt
Test command line (this is for my information, for now)
experimental 3279504130c10724b13e69de355f1ba4b6b03bd6 |
perl /home/travis/rqg/runall-new.pl --duration=400 --threads=6 seed= --reporters=Backtrace,ErrorLog,Deadlock --validators=TransformerNoComparator --transformers=ConvertSubqueriesToViews,ConvertTablesToDerived,Count,DisableOptimizations,Distinct,EnableOptimizations,ExecuteAsCTE,ExecuteAsDeleteReturning,ExecuteAsDerived,ExecuteAsExcept,ExecuteAsExecuteImmediate,ExecuteAsInsertSelect,ExecuteAsIntersect,ExecuteAsSelectItem,ExecuteAsUnion,ExecuteAsUpdateDelete,ExecuteAsView,ExecuteAsWhereSubquery,Having,InlineSubqueries,InlineVirtualColumns,LimitRowsExamined,OrderBy,StraightJoin,ExecuteAsPreparedTwice,ExecuteAsTrigger,ExecuteAsSPTwice,ExecuteAsFunctionTwice --redefine=conf/mariadb/general-workarounds.yy --mysqld=--log_output=FILE --mysqld=--log_bin_trust_function_creators=1 --mysqld=--log-bin --mysqld=--max-statement-time=30 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --views --redefine=conf/mariadb/sequences.yy --basedir=/home/travis/server --grammar=conf/runtime/alter_online.yy --gendata=conf/runtime/alter_online.zz --engine=Aria --mysqld=--default-storage-engine=Aria --vardir=/home/travis/logs/vardir1_4
|
Great job Elena !
I am very glad that you have reproduced bug and one step closer to fix
Yes as you indicated, it is happening very rare at some cases, but very annoying for production systems.
Your trace log is little different from ours, maybe it is because of the mariadb version, but it happened at the same line
/src/storage/innobase/row/row0ins.cc:274 with same error description
I hope you can make progress to fix this issue. I wish you good luck
Even if parameter innodb_change_buffering=inserts our server is down again with the same exception
Here is trace log
Jan 30 13:36:14 ns548787 mysqld[1938]: 2018-01-30 13:36:14 0x7fdb88813700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.2.11/storage/innobase/row/row0ins.cc line 274
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: Failing assertion: !cursor->index->is_committed()
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: We intentionally generate a memory trap.
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: If you get repeated assertion failures or crashes, even
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: immediately after the mysqld startup, there may be
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
Jan 30 13:36:14 ns548787 mysqld[1938]: InnoDB: about forcing recovery.
Jan 30 13:36:15 ns548787 mysqld[1938]: 180130 13:36:14 [ERROR] mysqld got signal 6 ;
Jan 30 13:36:15 ns548787 mysqld[1938]: This could be because you hit a bug. It is also possible that this binary
Jan 30 13:36:15 ns548787 mysqld[1938]: or one of the libraries it was linked against is corrupt, improperly built,
Jan 30 13:36:15 ns548787 mysqld[1938]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jan 30 13:36:15 ns548787 mysqld[1938]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jan 30 13:36:15 ns548787 mysqld[1938]: We will try our best to scrape up some info that will hopefully help
Jan 30 13:36:15 ns548787 mysqld[1938]: diagnose the problem, but since we have already crashed,
Jan 30 13:36:15 ns548787 mysqld[1938]: something is definitely wrong and this may fail.
Jan 30 13:36:15 ns548787 mysqld[1938]: Server version: 10.2.11-MariaDB-10.2.11+maria~zesty-log
Jan 30 13:36:15 ns548787 mysqld[1938]: key_buffer_size=134217728
Jan 30 13:36:15 ns548787 mysqld[1938]: read_buffer_size=131072
Jan 30 13:36:15 ns548787 mysqld[1938]: max_used_connections=1214
Jan 30 13:36:15 ns548787 mysqld[1938]: max_threads=5002
Jan 30 13:36:15 ns548787 mysqld[1938]: thread_count=813
Jan 30 13:36:15 ns548787 mysqld[1938]: It is possible that mysqld could use up to
Jan 30 13:36:15 ns548787 mysqld[1938]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11122164 K bytes of memory
Jan 30 13:36:15 ns548787 mysqld[1938]: Hope that's ok; if not, decrease some variables in the equation.
Jan 30 13:36:15 ns548787 mysqld[1938]: Thread pointer: 0x7fda38402e58
Jan 30 13:36:15 ns548787 mysqld[1938]: Attempting backtrace. You can use the following information to find out
Jan 30 13:36:15 ns548787 mysqld[1938]: where mysqld died. If you see no messages after this, something went
Jan 30 13:36:15 ns548787 mysqld[1938]: terribly wrong...
Jan 30 13:36:15 ns548787 mysqld[1938]: stack_bottom = 0x7fdb88812cf8 thread_stack 0x49000
Jan 30 13:36:15 ns548787 mysqld[1938]: (my_addr_resolve failure: fork)
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x556c6a6c443e]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(handle_fatal_signal+0x3cd) [0x556c6a164a9d]
Jan 30 13:36:15 ns548787 mysqld[1938]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11670) [0x7fe767a4a670]
Jan 30 13:36:15 ns548787 mysqld[1938]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x9f) [0x7fe76701677f]
Jan 30 13:36:15 ns548787 mysqld[1938]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7fe76701837a]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x40a67b) [0x556c69f1a67b]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x8924da) [0x556c6a3a24da]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x8951cc) [0x556c6a3a51cc]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x8d7f3e) [0x556c6a3e7f3e]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x8d8539) [0x556c6a3e8539]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x8a86f4) [0x556c6a3b86f4]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(+0x7e9fb3) [0x556c6a2f9fb3]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x152) [0x556c6a16ea82]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, enum_duplicates, bool, unsigned long long*, unsigned long long*)+0x1c39) [0x556c6a0667e9]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(mysql_execute_command(THD*)+0x2726) [0x556c69fcc416]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool)+0x28a) [0x556c69fd2b8a]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool)+0x215f) [0x556c69fd5b8f]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(do_command(THD*)+0x167) [0x556c69fd64e7]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(do_handle_one_connection(CONNECT*)+0x25a) [0x556c6a09ccda]
Jan 30 13:36:15 ns548787 mysqld[1938]: /usr/sbin/mysqld(handle_one_connection+0x3d) [0x556c6a09ce4d]
Jan 30 13:36:15 ns548787 mysqld[1938]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da) [0x7fe767a406da]
Jan 30 13:36:15 ns548787 mysqld[1938]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x5f) [0x7fe7670e9d7f]
Jan 30 13:36:15 ns548787 mysqld[1938]: Trying to get some variables.
Jan 30 13:36:15 ns548787 mysqld[1938]: Some pointers may be invalid and cause the dump to abort.
Jan 30 13:36:15 ns548787 mysqld[1938]: Query (0x7fda3844efb0): update MessageContent use index(messagecontent_fromtoprofileid_index) set status='R',readDate='2018-01-30 18:36:14' where toProfileId = 2745668 and fromProfileId=2660432 and status ='C' and type in ('M','C') order by id
Jan 30 13:36:15 ns548787 mysqld[1938]: Connection ID (thread ID): 633523
Jan 30 13:36:15 ns548787 mysqld[1938]: Status: NOT_KILLED
Jan 30 13:36:15 ns548787 mysqld[1938]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
Jan 30 13:36:15 ns548787 mysqld[1938]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Jan 30 13:36:15 ns548787 mysqld[1938]: information that should help you find out what is causing the crash.
Jan 30 13:36:16 ns548787 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Jan 30 13:36:16 ns548787 systemd[1]: mariadb.service: Unit entered failed state.
Jan 30 13:36:16 ns548787 systemd[1]: mariadb.service: Failed with result 'signal'.
Jan 30 13:36:21 ns548787 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Jan 30 13:36:21 ns548787 systemd[1]: Stopped MariaDB database server.
Jan 30 13:36:21 ns548787 systemd[1]: Starting MariaDB database server...
mehmet, thank you for the update. Maybe the change buffer is "unsafe at any speed" then.
Or, maybe you will report again in a few months that you still got this error after setting innodb_change_buffering=none.
I am afraid that we are still a long way from a fix. First we need to reproduce this deterministically enough, then we have to reduce the test case so that the problem can be debugged. Because this has to do with long-lived persistent data structures, core dumps or data files may be useless if the seed of the corruption was planted several minutes or days or weeks earlier.
innodb_change_buffering=all with this setting, it was crashing every 3-5 days but with innodb_change_buffering=inserts
it crashed after 3 months. it just reduced the probability of fault case but did not fix the problem completely.
Our 5000 online members do all transactions on message table on only one mariadb, then these transactions are replicated to other mariadb databases. Crash is happening only on this master database. I guess it is about timing since slave databases manage it in better way.
We have 3 mariadb physical servers with master to master configuration and when one of servers crashes, we have problems regarding data consistency and performance.
It seems that this issue will not be fixed in short time. Maybe we should use mariadb galeria cluster to tolerate crash in a better way ? do you suggest it ? I did some investigation about it, just worry that it is going to decrease performance.
marko, I have a coredump and a datadir now. Will it help, or do you still need a reliable way to reproduce?
I am going to try to narrow it down further, but it might be impossible if it's a low-level race condition.
Update:
Unfortunately, "my" failure on 10.3, however similar it looks, apparently has a different cause than the originally reported. I've got a reproducible non-concurrent test case, but it only causes the failure on 10.3, and only pre-MDEV-14837-fix. No crash afterwards.
Here is the test case for the record.
--source include/have_innodb.inc
|
CREATE TABLE t1 (col1 INT, col2 INT, col3 INT, col4 TEXT) ENGINE = InnoDB; |
INSERT INTO t1 (col1,col2,col3,col4) VALUES ( 1, 1, 1 - 1, REPEAT('1',8193)); |
ALTER TABLE t1 ADD PRIMARY KEY ( col4(10), col3 ); |
ALTER TABLE t1 ADD COLUMN extra INT; |
UPDATE t1 SET extra = col3; |
ALTER TABLE t1 ADD KEY idx ( col4(10), col3 ); |
INSERT INTO t1 (col1,col2,col3,col4) VALUES ( 1, 1, 1 - 1, REPEAT('1',8193)); |
For the record, as noted in MDEV-15199, secondary indexes can also become corrupted in MariaDB 10.2.2 to 10.2.13 (and MySQL 5.7.2 to 5.7.21) because of FOREIGN KEY constraints with CASCADE or SET NULL options. Furthermore, MariaDB 10.2.8 introduced another related bug that was also fixed in 10.2.13 by MDEV-15219.
mehmet, if the server was killed and restarted, or if you copied files by hot backup, then my comment in MDEV-9663 offers a possible explanation. It would also explain corruption with innodb_change_buffering=inserts.
Could this explain your latest crash?
Yes, we have used hot backup , migrated all data from one server to another using mariabackup tool
it could have caused latest crash with innodb_change_buffering=inserts
mehmet, if you used ALTER TABLE or TRUNCATE TABLE, it could be that the adaptive hash index (a transient data structure) was corrupted, causing corruption of the persistent data. Please see my comment in MDEV-9663 that refers to MDEV-16759.
I also optimized all tables, in other words, I created new tables and migrated all data to new tables
in spite of this, I had same crash.
Our database is working without any problem with innodb_change_buffering=inserts
but when we change it to innodb_change_buffering=all, database is crashing in 3-5 days.
I have a big ugly test case (see t65.test) which reproduces the assertion failure on 10.2 and 10.3 non-debug builds. It can probably be significantly reduced, but I don't see much point in it anymore, because it appears the test case has become irrelevant after this commit:
commit 1b4c5b7327e84f9c9dd86d72b5ae706d9e82d9b0
|
Author: Marko Mäkelä <marko.makela@mariadb.com>
|
Date: Fri Aug 24 09:38:52 2018 +0300
|
|
MDEV-16868 Same query gives different results
|
Please note that the commit doesn't cure the assertion failure completely, it's still there, I got it at least once by running a concurrent test (which was a base for this test case). However, after the above-mentioned commit the failure has become much harder to reproduce.
Run the test case with
perl ./mtr bug.t65 --mysqld=--innodb --mysqld=--partition --mysqld=--default-storage-engine=InnoDB --mysqld=--sql-mode= --mem --mysqld=--innodb_change_buffering=none --mysqld=--innodb-adaptive-hash-index=off --mysqld=--max-statement-time=6 --mysqld=--lock-wait-timeout=4 --mysqld=--loose-innodb-lock-wait-timeout=2 --mysqld=--innodb_buffer_pool_size=128M --mysqld=--innodb_log_buffer_size=16M --mysqld=--innodb_log_file_size=48M --mysqld=--innodb_lru_scan_depth=1024 --mysqld=--innodb_stats_persistent=ON --mysqld=--innodb_read_io_threads=4 --mysqld=--innodb_write_io_threads=4
|
Possibly some options are non-needed, but most definitely are.
Here is the stack trace caused by this test on a build before the commit above:
10.2 c164d0cc622 |
#4 0x00007f69828ed3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#5 0x0000557e3292921f in ut_dbg_assertion_failed (expr=expr@entry=0x557e33174cf8 "!cursor->index->is_committed()", file=file@entry=0x557e33174ba0 "/data/src/10.2-bug/storage/innobase/row/row0ins.cc", line=line@entry=268) at /data/src/10.2-bug/storage/innobase/ut/ut0dbg.cc:61
|
#6 0x0000557e32e2e27d in row_ins_sec_index_entry_by_modify (mtr=0x7f69800b26e0, thr=0x7f690c050330, entry=0x7f690c02f0a0, heap=0x7f690c02dbc0, offsets_heap=0x7f690c0ba860, offsets=0x7f69800b17d0, cursor=0x7f69800b1850, mode=2, flags=2) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:268
|
#7 row_ins_sec_index_entry_low (flags=flags@entry=2, mode=<optimized out>, mode@entry=2, index=index@entry=0x7f690c084470, offsets_heap=offsets_heap@entry=0x7f690c0ba860, heap=heap@entry=0x7f690c02dbc0, entry=entry@entry=0x7f690c02f0a0, trx_id=0, thr=0x7f690c050330, dup_chk_only=false) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3065
|
#8 0x0000557e32e30fdc in row_ins_sec_index_entry (index=0x7f690c084470, entry=0x7f690c02f0a0, thr=0x7f690c050330, dup_chk_only=<optimized out>) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3259
|
#9 0x0000557e32e333d8 in row_ins_index_entry (thr=0x7f690c050330, entry=<optimized out>, index=<optimized out>) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3305
|
#10 row_ins_index_entry_step (thr=0x7f690c050330, node=0x7f690c04ffd8) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3453
|
#11 row_ins (thr=0x7f690c050330, node=<optimized out>) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3595
|
#12 row_ins_step (thr=thr@entry=0x7f690c050330) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3830
|
#13 0x0000557e32e428e8 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7f690c010900 "", prebuilt=0x7f690c04f8e0) at /data/src/10.2-bug/storage/innobase/row/row0mysql.cc:1412
|
#14 0x0000557e32d92941 in ha_innobase::write_row (this=0x7f690c062c60, record=0x7f690c010900 "") at /data/src/10.2-bug/storage/innobase/handler/ha_innodb.cc:8217
|
#15 0x0000557e32b71c3f in handler::ha_write_row (this=0x7f690c062c60, buf=0x7f690c010900 "") at /data/src/10.2-bug/sql/handler.cc:5984
|
#16 0x0000557e329bab2b in write_record (thd=thd@entry=0x7f690c0009a8, table=table@entry=0x7f690c03bfb8, info=info@entry=0x7f69800b3720) at /data/src/10.2-bug/sql/sql_insert.cc:1930
|
#17 0x0000557e32c9e5c4 in read_sep_field (ignore_check_option_errors=<optimized out>, skip_lines=<optimized out>, enclosed=..., read_info=..., set_values=..., set_fields=..., fields_vars=..., table_list=0x7f690c025aa8, info=..., thd=0x7f690c0009a8) at /data/src/10.2-bug/sql/sql_load.cc:1256
|
#18 mysql_load (thd=<optimized out>, ex=<optimized out>, table_list=0x7f690c025aa8, fields_vars=..., set_fields=..., set_values=..., handle_duplicates=DUP_ERROR, ignore=<optimized out>, read_file_from_client=false) at /data/src/10.2-bug/sql/sql_load.cc:649
|
#19 0x0000557e329d8b31 in mysql_execute_command (thd=0x7f690c0009a8) at /data/src/10.2-bug/sql/sql_parse.cc:4833
|
#20 0x0000557e329dc2b9 in mysql_parse (thd=0x7f690c0009a8, rawbuf=<optimized out>, length=73, parser_state=0x7f69800b5260, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /data/src/10.2-bug/sql/sql_parse.cc:8009
|
#21 0x0000557e329defae in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f690c0009a8, packet=packet@entry=0x7f690c006d39 "LOAD DATA INFILE 'load_DD' IGNORE INTO TABLE DD /* QNO 10838 CON_ID 14 */", packet_length=packet_length@entry=73, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.2-bug/sql/sql_parse.cc:1824
|
#22 0x0000557e329df7f8 in do_command (thd=0x7f690c0009a8) at /data/src/10.2-bug/sql/sql_parse.cc:1378
|
#23 0x0000557e32aa85b4 in do_handle_one_connection (connect=connect@entry=0x557e369f7cb8) at /data/src/10.2-bug/sql/sql_connect.cc:1335
|
#24 0x0000557e32aa8754 in handle_one_connection (arg=0x557e369f7cb8) at /data/src/10.2-bug/sql/sql_connect.cc:1241
|
#25 0x00007f69845bb494 in start_thread (arg=0x7f69800b6700) at pthread_create.c:333
|
#26 0x00007f69829a193f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
This is the stack trace caused by a concurrent test on a build after the commit:
10.2 9a815401c6c19 |
#4 0x00007ffa57ab33fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#5 0x000055efabe6421f in ut_dbg_assertion_failed (expr=expr@entry=0x55efac6afa70 "!cursor->index->is_committed()", file=file@entry=0x55efac6af920 "/data/src/10.2/storage/innobase/row/row0ins.cc", line=line@entry=268) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:61
|
#6 0x000055efac36929d in row_ins_sec_index_entry_by_modify (mtr=0x7ffa4dd1d6e0, thr=0x7ff9e00510e0, entry=0x7ff9e009fdf0, heap=0x7ff9e004dc20, offsets_heap=0x7ff9e00bc7f0, offsets=0x7ffa4dd1c7d0, cursor=0x7ffa4dd1c850, mode=2, flags=2) at /data/src/10.2/storage/innobase/row/row0ins.cc:268
|
#7 row_ins_sec_index_entry_low (flags=flags@entry=2, mode=<optimized out>, mode@entry=2, index=index@entry=0x7ff9e00703b0, offsets_heap=offsets_heap@entry=0x7ff9e00bc7f0, heap=heap@entry=0x7ff9e004dc20, entry=entry@entry=0x7ff9e009fdf0, trx_id=0, thr=0x7ff9e00510e0, dup_chk_only=false) at /data/src/10.2/storage/innobase/row/row0ins.cc:3065
|
#8 0x000055efac36bffc in row_ins_sec_index_entry (index=0x7ff9e00703b0, entry=0x7ff9e009fdf0, thr=0x7ff9e00510e0, dup_chk_only=<optimized out>) at /data/src/10.2/storage/innobase/row/row0ins.cc:3259
|
#9 0x000055efac36e3f8 in row_ins_index_entry (thr=0x7ff9e00510e0, entry=<optimized out>, index=<optimized out>) at /data/src/10.2/storage/innobase/row/row0ins.cc:3305
|
#10 row_ins_index_entry_step (thr=0x7ff9e00510e0, node=0x7ff9e0050d88) at /data/src/10.2/storage/innobase/row/row0ins.cc:3453
|
#11 row_ins (thr=0x7ff9e00510e0, node=<optimized out>) at /data/src/10.2/storage/innobase/row/row0ins.cc:3595
|
#12 row_ins_step (thr=thr@entry=0x7ff9e00510e0) at /data/src/10.2/storage/innobase/row/row0ins.cc:3830
|
#13 0x000055efac37d908 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7ff9e008ce40 "", prebuilt=0x7ff9e0050690) at /data/src/10.2/storage/innobase/row/row0mysql.cc:1412
|
#14 0x000055efac2cd941 in ha_innobase::write_row (this=0x7ff9e0068fd0, record=0x7ff9e008ce40 "") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:8217
|
#15 0x000055efac0acc3f in handler::ha_write_row (this=0x7ff9e0068fd0, buf=0x7ff9e008ce40 "") at /data/src/10.2/sql/handler.cc:5984
|
#16 0x000055efabef5b2b in write_record (thd=thd@entry=0x7ff9e00009a8, table=table@entry=0x7ff9e004b558, info=info@entry=0x7ffa4dd1e720) at /data/src/10.2/sql/sql_insert.cc:1930
|
#17 0x000055efac1d95c4 in read_sep_field (ignore_check_option_errors=<optimized out>, skip_lines=<optimized out>, enclosed=..., read_info=..., set_values=..., set_fields=..., fields_vars=..., table_list=0x7ff9e000f208, info=..., thd=0x7ff9e00009a8) at /data/src/10.2/sql/sql_load.cc:1256
|
#18 mysql_load (thd=<optimized out>, ex=<optimized out>, table_list=0x7ff9e000f208, fields_vars=..., set_fields=..., set_values=..., handle_duplicates=DUP_ERROR, ignore=<optimized out>, read_file_from_client=false) at /data/src/10.2/sql/sql_load.cc:649
|
#19 0x000055efabf13b31 in mysql_execute_command (thd=0x7ff9e00009a8) at /data/src/10.2/sql/sql_parse.cc:4833
|
#20 0x000055efabf172b9 in mysql_parse (thd=0x7ff9e00009a8, rawbuf=<optimized out>, length=73, parser_state=0x7ffa4dd20260, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /data/src/10.2/sql/sql_parse.cc:8009
|
#21 0x000055efabf19fae in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7ff9e00009a8, packet=packet@entry=0x7ff9e0006d39 " LOAD DATA INFILE 'load_DD' IGNORE INTO TABLE DD /* QNO 10838 CON_ID 14 */ ", packet_length=packet_length@entry=75, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.2/sql/sql_parse.cc:1824
|
#22 0x000055efabf1a7f8 in do_command (thd=0x7ff9e00009a8) at /data/src/10.2/sql/sql_parse.cc:1378
|
#23 0x000055efabfe35b4 in do_handle_one_connection (connect=connect@entry=0x55efaeddf698) at /data/src/10.2/sql/sql_connect.cc:1335
|
#24 0x000055efabfe3754 in handle_one_connection (arg=0x55efaeddf698) at /data/src/10.2/sql/sql_connect.cc:1241
|
#25 0x00007ffa59781494 in start_thread (arg=0x7ffa4dd21700) at pthread_create.c:333
|
#26 0x00007ffa57b6793f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
marko, please take a look at the above. If the test case which fails before the mentioned commit is enough for you to figure out the general problem, then I leave it to you. If not, please let me know and I'll see if I can get something out from the concurrent test.
elenst, MDEV-16868 fixed a class of corruption for temporary tables. That corruption could also cause mismatch between the indexes of a temporary table. The fix should not have changed anything for persistent tables. So, I am afraid that more effort will be needed to reproduce the remaining problem.
It appears that I was mistaken about the sequence of commits, 9a815401c6c19 where the problem was last observed was still before 1b4c5b7327e8 where the problem (with temporary tables) was presumably fixed. So, there is no certainty that the failure still exists in any form after MDEV-16868 fix. Now, do we have any reason to be sure that the initial problem reported there (as well as in other sibling bug reports) wasn't related to temporary tables?
mehmet, do you use CREATE TEMPORARY TABLE with the InnoDB storage engine? There was a bug introduced in 10.2.9, 10.3.2 that was fixed by MDEV-16868 (10.2.18, 10.3.10) which causes writes to InnoDB temporary tables to be sometimes omitted, causing data loss and corruption when such pages are evicted from the InnoDB buffer pool.
We have never used temporary tables.
This crash is happening on MessageContent table. We optimized this table several times.
CREATE TABLE `messagecontent` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`content` longtext COLLATE utf8mb4_unicode_ci,
`crDate` datetime DEFAULT NULL,
`fromProfileId` bigint(20) DEFAULT NULL,
`readDate` datetime DEFAULT NULL,
`status` varchar(1) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`subject` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`toProfileId` bigint(20) DEFAULT NULL,
`type` varchar(1) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`msgId` bigint(20) DEFAULT NULL,
`updDate` datetime DEFAULT NULL,
`updProfileId` bigint(20) DEFAULT NULL,
`subType` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `messagecontent_toprofileid_index` (`toProfileId`),
KEY `messagecontent_fromprofileid_index` (`fromProfileId`),
KEY `messagecontent_type_index` (`type`),
KEY `messagecontent_crDate_index` (`crDate`),
KEY `messagecontent_msgid_index` (`msgId`),
KEY `messagecontent_status_index` (`status`),
KEY `messagecontent_fromtoprofileid_index` (`fromProfileId`,`toProfileId`),
KEY `messagecontent_toprofileidstatus_index` (`toProfileId`,`status`),
KEY `messagecontent_toprofileidtype_index` (`toProfileId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=92717114 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
mehmet, I wonder if you can provide any additional data that could help us reproduce this.
A few thoughts:
- innodb_flush_log_at_trx_commit=0 should not cause any corruption, other than the loss of some transaction commits: When the server is killed and restarted, the latest durably committed and recovered transaction could be somewhat older than the last transaction that successfully returned to the client before the server was killed. Even the setting 1 is currently broken; see
MDEV-15740. It could be that the setting 0 will slightly change timings between I/O and user threads, enough to trigger this elusive bug. - On the slaves, you would typically have more reads, and reads via the secondary indexes will cause any buffered changes to be merged. So, the change buffer would be merged more often. It is also possible that due to fewer writes, more changes will be applied directly on the slaves, and the change buffer will not kick in.
- It is only my hypothesis that the change buffer could be the cause.
Which versions were you using when encountering this problem? Did you ever get a crash with innodb_change_buffering=changes? Or inserts?
Hello,
my servers have been crashed with different versions starting from 12.2.+
with innodb_change_buffering=inserts, it crashed only once after 2 months
but with innodb_change_buffering=all, it was crashing every 3-5 days
I remember Elena prepared a test case, and my configuration crashed server after 2 days.. so you were able to reproduce it
We are using mariadb for 3 years and there was no problem.. when we set
innodb_flush_log_at_trx_commit = 0
server started to crash
I have not tried
innodb_change_buffering=changes
Now the setting is
innodb_change_buffering=inserts
innodb_flush_log_at_trx_commit = 0
and server is not crashing but deletes are very slow
We are restarting servers every 4 months.. and it did not crash so far with this setting
but when i change it to innodb_change_buffering=all , it is crashing in 3-5 days
we have a chat android application and inserting 1 million records on message table everyday
and problem is happening on this table
I think it is about timing issue... we have 2 other physical servers and replicating data to slaves
but they never crashed.. I mean.. all transactions on message table are executed first on master database then on slave databases but only master database is crashing.. not slaves, so it must be something about timing on master table .. i guess timing is managed differently on slave databases
I hope you can reproduce and fix this issue, we would like to use innodb_change_buffering=all
unfortunately we are having latency on data synchronization between servers due to slow deletes
I've got a test case which does not use temporary tables and produces the assertion failure discussed here; but it involves indexed virtual columns, so apparently it still doesn't explain the problem that mehmet observed. Anyway, for the record, the test case is below. The problem it reveals was apparently fixed in the scope of MDEV-15114. The test case needs to be run with --mem.
--source include/have_innodb.inc
|
|
CREATE TABLE t1 ( |
pk SERIAL,
|
b LONGTEXT NOT NULL DEFAULT '', |
c CHAR(179) NOT NULL DEFAULT '', |
v LONGTEXT AS (b) VIRTUAL, |
PRIMARY KEY(pk), |
INDEX (v(64)), |
INDEX (b(64)) |
) ENGINE=InnoDB;
|
|
INSERT INTO t1 (b) VALUES ('foo'); |
SELECT pk, b, c FROM t1 INTO OUTFILE 'load.data'; |
--let $run= 1000
|
while ($run)
|
{
|
LOAD DATA INFILE 'load.data' REPLACE INTO TABLE t1 (pk,b,c); |
--dec $run |
}
|
|
# Cleanup
|
DROP TABLE t1; |
--let $datadir= `SELECT @@datadir`
|
--remove_file $datadir/test/load.data |
10.2 831df10981 |
2018-12-02 02:22:42 0x7f8e6c075700 InnoDB: Assertion failure in file /data/src/10.2-bug/storage/innobase/row/row0ins.cc line 268
|
InnoDB: Failing assertion: !cursor->index->is_committed()
|
|
#4 0x00007f8e71da33fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#5 0x00005628dd81409f in ut_dbg_assertion_failed (expr=expr@entry=0x5628de055138 "!cursor->index->is_committed()", file=file@entry=0x5628de054fe0 "/data/src/10.2-bug/storage/innobase/row/row0ins.cc", line=line@entry=268) at /data/src/10.2-bug/storage/innobase/ut/ut0dbg.cc:61
|
#6 0x00005628ddd1371d in row_ins_sec_index_entry_by_modify (mtr=0x7f8e6c0716b0, thr=0x7f8e200b9318, entry=0x7f8e200b9aa0, heap=0x7f8e2002a040, offsets_heap=0x7f8e20029bc0, offsets=0x7f8e6c0707a0, cursor=0x7f8e6c070820, mode=2, flags=0) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:268
|
#7 row_ins_sec_index_entry_low (flags=flags@entry=0, mode=<optimized out>, mode@entry=2, index=index@entry=0x7f8e200b8120, offsets_heap=offsets_heap@entry=0x7f8e20029bc0, heap=heap@entry=0x7f8e2002a040, entry=entry@entry=0x7f8e200b9aa0, trx_id=0, thr=0x7f8e200b9318, dup_chk_only=false) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3050
|
#8 0x00005628ddd162ec in row_ins_sec_index_entry (index=0x7f8e200b8120, entry=0x7f8e200b9aa0, thr=0x7f8e200b9318, dup_chk_only=<optimized out>) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3244
|
#9 0x00005628ddd18418 in row_ins_index_entry (thr=0x7f8e200b9318, entry=<optimized out>, index=<optimized out>) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3290
|
#10 row_ins_index_entry_step (thr=0x7f8e200b9318, node=0x7f8e200b8fb8) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3438
|
#11 row_ins (thr=0x7f8e200b9318, node=<optimized out>) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3580
|
#12 row_ins_step (thr=thr@entry=0x7f8e200b9318) at /data/src/10.2-bug/storage/innobase/row/row0ins.cc:3815
|
#13 0x00005628ddd2797f in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7f8e200a3c10 "\376\001", prebuilt=0x7f8e200b8a10) at /data/src/10.2-bug/storage/innobase/row/row0mysql.cc:1404
|
#14 0x00005628ddc78891 in ha_innobase::write_row (this=0x7f8e200a3490, record=0x7f8e200a3c10 "\376\001") at /data/src/10.2-bug/storage/innobase/handler/ha_innodb.cc:8350
|
#15 0x00005628dda5925a in handler::ha_write_row (this=0x7f8e200a3490, buf=0x7f8e200a3c10 "\376\001") at /data/src/10.2-bug/sql/handler.cc:5958
|
#16 0x00005628dd8a5f54 in write_record (thd=thd@entry=0x7f8e200009a8, table=table@entry=0x7f8e200a6aa8, info=info@entry=0x7f8e6c072700) at /data/src/10.2-bug/sql/sql_insert.cc:1655
|
#17 0x00005628ddb84fc4 in read_sep_field (ignore_check_option_errors=<optimized out>, skip_lines=<optimized out>, enclosed=..., read_info=..., set_values=..., set_fields=..., fields_vars=..., table_list=0x7f8e2000f218, info=..., thd=0x7f8e200009a8) at /data/src/10.2-bug/sql/sql_load.cc:1256
|
#18 mysql_load (thd=<optimized out>, ex=<optimized out>, table_list=0x7f8e2000f218, fields_vars=..., set_fields=..., set_values=..., handle_duplicates=DUP_REPLACE, ignore=<optimized out>, read_file_from_client=false) at /data/src/10.2-bug/sql/sql_load.cc:649
|
#19 0x00005628dd8c3d31 in mysql_execute_command (thd=0x7f8e200009a8) at /data/src/10.2-bug/sql/sql_parse.cc:4830
|
#20 0x00005628dd8c74c9 in mysql_parse (thd=0x7f8e200009a8, rawbuf=<optimized out>, length=59, parser_state=0x7f8e6c074240, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /data/src/10.2-bug/sql/sql_parse.cc:7999
|
#21 0x00005628dd8ca374 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f8e200009a8, packet=packet@entry=0x7f8e20006ce9 "LOAD DATA INFILE 'load.data' REPLACE INTO TABLE t1 (pk,b,c)", packet_length=packet_length@entry=59, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.2-bug/sql/sql_parse.cc:1821
|
#22 0x00005628dd8cabb8 in do_command (thd=0x7f8e200009a8) at /data/src/10.2-bug/sql/sql_parse.cc:1375
|
#23 0x00005628dd990d34 in do_handle_one_connection (connect=connect@entry=0x5628dfeb70f8) at /data/src/10.2-bug/sql/sql_connect.cc:1335
|
#24 0x00005628dd990ed4 in handle_one_connection (arg=arg@entry=0x5628dfeb70f8) at /data/src/10.2-bug/sql/sql_connect.cc:1241
|
#25 0x00005628ddc53404 in pfs_spawn_thread (arg=0x5628dfe79038) at /data/src/10.2-bug/storage/perfschema/pfs.cc:1862
|
#26 0x00007f8e73a71494 in start_thread (arg=0x7f8e6c075700) at pthread_create.c:333
|
#27 0x00007f8e71e5793f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
Alternatively or additionally the test case caused the error in the log:
2018-12-02 2:22:42 140249674503936 [ERROR] InnoDB: Record in index `v` of table `test`.`t1` was not found on update: TUPLE (info_bits=0, 2 fields): {[3] (0x000000),[8] (0x0000000
|
000000001)} at: COMPACT RECORD(info_bits=0, 1 fields): {[8]infimum (0x090E06090D050D00)}
|
It also disappeared with commit ab194666564a (MDEV-15114).
mehmet, I see quite a few indexes in your table, but no FOREIGN KEY. Were there any FOREIGN KEY relations, possibly from other tables referring to this table?
If there were any FOREIGN KEY constraints on the table, then the corruption could be explained by MDEV-18272, which wrongly allows InnoDB to commit a transaction in an inconsistent state in a rare case where the SQL layer would not request InnoDB to roll back to the start of the statement or transaction.
Would that be related? running 10.3.13 on amazon rds.
2019-04-16 10:02:54 0x2b2cf5d70700 InnoDB: Assertion failure in file /local/p4clients/pkgbuild-7z9Zh/workspace/src/RDSMariaDB/storage/innobase/row/row0ins.cc line 270
|
InnoDB: Failing assertion: !cursor->index->is_committed()
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
InnoDB: If you get repeated assertion failures or crashes, even
|
InnoDB: immediately after the mysqld startup, there may be
|
InnoDB: corruption in the InnoDB tablespace. Please refer to
|
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
InnoDB: about forcing recovery.
|
190416 10:02:54 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.3.13-MariaDB-log
|
key_buffer_size=16777216
|
read_buffer_size=262144
|
max_used_connections=265
|
max_threads=802
|
thread_count=272
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1882058 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x2b3958c12008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x2b2cf5d6fe48 thread_stack 0x40000
|
/rdsdbbin/mysql/bin/mysqld(my_print_stacktrace+0x29)[0x559c6b0678c9]
|
/rdsdbbin/mysql/bin/mysqld(handle_fatal_signal+0x57d)[0x559c6ab2f07d]
|
/lib64/libpthread.so.0(+0xf100)[0x2b2cf6df4100]
|
/lib64/libc.so.6(gsignal+0x37)[0x2b2cf70365f7]
|
/lib64/libc.so.6(abort+0x148)[0x2b2cf7037ce8]
|
/rdsdbbin/mysql/bin/mysqld(+0x59c22d)[0x559c6a88f22d]
|
/rdsdbbin/mysql/bin/mysqld(+0xa4f06b)[0x559c6ad4206b]
|
/rdsdbbin/mysql/bin/mysqld(+0xa4f5b4)[0x559c6ad425b4]
|
/rdsdbbin/mysql/bin/mysqld(+0xa89c9f)[0x559c6ad7cc9f]
|
/rdsdbbin/mysql/bin/mysqld(+0xa8fd13)[0x559c6ad82d13]
|
/rdsdbbin/mysql/bin/mysqld(+0xa61a0c)[0x559c6ad54a0c]
|
/rdsdbbin/mysql/bin/mysqld(+0x9a3eb8)[0x559c6ac96eb8]
|
/rdsdbbin/mysql/bin/mysqld(_ZN7handler13ha_update_rowEPKhS1_+0x112)[0x559c6ab391d2]
|
ut/ut0rbt.cc:469(rbt_eject_node)[0x559c6a9fb726]
|
row/row0ins.cc:293(row_ins_sec_index_entry_by_modify)[0x559c6a95f930]
|
row/row0ins.cc:3363(row_ins_sec_index_entry(dict_index_t*, dtuple_t*, que_thr_t*, bool))[0x559c6a966d3c]
|
row/row0upd.cc:2536(row_upd_sec_index_entry)[0x559c6a96968c]
|
row/row0upd.cc:3333(row_upd)[0x559c6a96a207]
|
row/row0mysql.cc:1890(row_update_for_mysql(row_prebuilt_t*))[0x559c6aa34dcc]
|
handler/ha_innodb.cc:8841(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x559c6aa34fa4]
|
/lib64/libpthread.so.0(+0x7dc5)[0x2b2cf6decdc5]
|
/lib64/libc.so.6(clone+0x6d)[0x2b2cf70f7c9d]
|
Coincidently our use case is very similar to the OPs - it's a write heavy messages table within chat feature of our application.
Note: for indexes on virtual columns, there is a separate bug MDEV-19338.
Let us keep this ticket for assertion failures for other secondary indexes than those that include any virtual columns.
I have unconfirmed the report. elenst had confirmed it, because she repeated the assertion with a test case that involved temporary tables. She was essentially repeating MDEV-16868, which was fixed in MariaDB 10.2.18 and 10.3.10.
mehmet and madsheep, can you repeat the issue without involving indexed virtual columns? Without a way to repeat the bug, we cannot work on a fix.
We have still have the following configuration on our messaging server and it is working without a problem
the only problem is very slow delete operations
innodb_change_buffering = inserts
This bug is only happening on our production environment , we can not remove indexes from table and repeat it
It is better that you reproduce this problem on your test environment using the table structure i have provided
this table keeps messages, daily about 1 million inserts/deletes/updates
and it was crashing after 3 days
In related news, MDEV-22497 fixed a bug that was introduced in MySQL 5.5’s change buffering. I do not immediately see how it could address other change buffer related failures than the one mentioned in the bug summary. If only we manage to reproduce something internally, I believe that with https://rr-project.org/ it will be possible to find and fix the cause.
The originally reported bug cannot have been caused by MDEV-20301, which introduced MDEV-22924.
I was having this exact bug on one of my replicas with 10.4.15. I also saw some random errors about indexes, so I ran mysqlcheck on whole DB. Turned out some tables had corrupt indexes, so I ran OPTIMIZE TABLE on them and so far 8 hours of no crash.
I also experienced this only with innodb_flush_log_at_trx_commit != 1
shalogo, yes, rebuilding the table should fix corrupted secondary indexes.
I am fairly certain that MDEV-24449 (to be fixed in 10.2.37, 10.3.28, 10.4.18) could explain this kind of corruption. The bug should not affect the 10.5 series, because MDEV-19514 removed the problematic code. As I noted in MDEV-24449, I was able to repeat some kind of corruption within the system tablespace, but not within a secondary index.
mehmet, did you copy data files around using mariabackup or xtrabackup or file system snapshots? If not, then we would still need some other explanation for the original corruption.
Note that even very old corruption may remain undetected for a long time. To make matters worse, hholzgra noted recently somewhere, CHECK TABLE in InnoDB will not always detect the corruption, because it is only counting records, instead of actually checking that the secondary index records match records in the clustered index.
Yes we always have backup using mariabackup and migrate it to other server clusters
Moreover, we clear historical data and optimize all tables using mysqlcheck command every 6 months.
We have installed 10.5.8-MariaDB-1:10.5.8+maria~xenial-log version to all servers
and updated settings like innodb_change_buffering=all , system is working without any problem since January 10
It seems that problem has been fixed with the latest release
If we have same problem, I am going to update you with this info
mehmet, thank you, this is excellent news. Recently, we found and fixed another related corner case MDEV-24709, which I believe should affect anything up to and including 10.5.8. That one will be fixed in 10.5.9.
mehmet, how does it look like, can we close this as likely fixed by MDEV-24449?
It is almost 2 months that our system is working with the following setting
innodb_change_buffering=all
|
Everything is fine. It seems that problem has been fixed
Thank you
It looks like MDEV-19514 had fixed this in 10.5.0, and MDEV-24449 fixed this in 10.2.37, 10.3.28, 10.4.18.
marko, please take a look whether you have enough here for fixing. I wasn't able to reproduce it so far. If the information here is not enough to fix it by code analysis alone, please feel free to unassign it back from you, and we'll keep trying to reproduce.