[MDEV-14506] [winx64] rocksdb.truncate_table3 failed in buildbot, lost connection to server Created: 2017-11-27  Updated: 2018-09-27  Resolved: 2018-09-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB, Tests
Affects Version/s: 10.2
Fix Version/s: 10.2.18

Type: Bug Priority: Major
Reporter: Alice Sherepa Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-16649 [log_writer.cc:44] rocksdb.bloomfilte... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/5855/steps/test/logs/stdio

rocksdb.truncate_table3                  w2 [ fail ]
        Test ended at 2017-11-24 19:37:27
 
CURRENT_TEST: rocksdb.truncate_table3
mysqltest: In included file "D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\t\drop_table3_repopulate_table.inc": 
included from D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\t\drop_table3.inc at line 31:
included from D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\t\truncate_table3.test at line 6:
At line 14: query '$insert' failed: 2013: Lost connection to MySQL server during query
 
The result from queries just before the failure was:
call mtr.add_suppression("Column family 'cf1' not found");
call mtr.add_suppression("Column family 'rev:cf2' not found");
DROP TABLE IF EXISTS t1;
call mtr.add_suppression("Column family 'cf1' not found");
call mtr.add_suppression("Column family 'rev:cf2' not found");
set global rocksdb_compact_cf = 'cf1';
set global rocksdb_compact_cf = 'rev:cf2';
set global rocksdb_signal_drop_index_thread = 1;
CREATE TABLE t1 (
a int not null,
b int not null,
c varchar(500) not null,
primary key (a,b) comment 'cf1',
key (b) comment 'rev:cf2'
) ENGINE=RocksDB;
DELETE FROM t1;

2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown variable 'loose-feedback-debug-startup-interval=20'
2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown variable 'loose-feedback-debug-first-interval=60'
2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown variable 'loose-feedback-debug-interval=60'
2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown option '--loose-pam-debug'
2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown option '--loose-enable-rocksdb_lock_info'
2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown option '--loose-enable-rocksdb_lock_info'
2017-11-24 19:37:08 4756 [Warning] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: unknown variable 'loose-debug-sync-timeout=300'
2017-11-24 19:37:08 4756 [Note] Server socket created on IP: '::'.
2017-11-24 19:37:08 4756 [Note] D:/winx64-packages/build/sql//RelWithDebInfo/mysqld.exe: ready for connections.
Version: '10.2.11-MariaDB-log'  socket: ''  port: 16020  mariadb.org binary distribution
171124 19:37:23 [ERROR] mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.11-MariaDB-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=65537
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4217 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x1cb70e8298
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
ha_rocksdb.dll!rocksdb::log::Writer::AddRecord()[log_writer.cc:44]
ha_rocksdb.dll!rocksdb::DBImpl::WriteToWAL()[db_impl_write.cc:705]
ha_rocksdb.dll!rocksdb::DBImpl::ConcurrentWriteToWAL()[db_impl_write.cc:805]
ha_rocksdb.dll!rocksdb::DBImpl::WriteImpl()[db_impl_write.cc:240]
ha_rocksdb.dll!rocksdb::DBImpl::Write()[db_impl_write.cc:49]
ha_rocksdb.dll!rocksdb::WriteCommittedTxn::CommitWithoutPrepareInternal()[pessimistic_transaction.cc:306]
ha_rocksdb.dll!rocksdb::PessimisticTransaction::Commit()[pessimistic_transaction.cc:264]
ha_rocksdb.dll!myrocks::Rdb_transaction_impl::commit_no_binlog()[ha_rocksdb.cc:2271]
ha_rocksdb.dll!myrocks::rocksdb_commit()[ha_rocksdb.cc:3145]
mysqld.exe!commit_one_phase_2()[handler.cc:1567]
mysqld.exe!ha_commit_one_phase()[handler.cc:1550]
mysqld.exe!ha_commit_trans()[handler.cc:1415]
mysqld.exe!trans_commit_stmt()[transaction.cc:511]
mysqld.exe!mysql_execute_command()[sql_parse.cc:6262]
mysqld.exe!mysql_parse()[sql_parse.cc:7892]
mysqld.exe!dispatch_command()[sql_parse.cc:1807]
mysqld.exe!do_command()[sql_parse.cc:1359]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:366]
mysqld.exe!tp_callback()[threadpool_common.cc:192]
ntdll.dll!RtlFreeUnicodeString()
ntdll.dll!RtlFreeUnicodeString()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x1cb70f4cf0): INSERT INTO t1 VALUES (31771, 31771, rpad('a', 499, 'b'))
Connection ID (thread ID): 3
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
Minidump written to D:\winx64-packages\build\mysql-test\var\2\mysqld.1\data\mysqld.dmp
----------SERVER LOG END-------------



 Comments   
Comment by Sergei Petrunia [ 2018-04-20 ]

Happened again on the bb-10.2-mariarocks-merge tree.

rocksdb.truncate_table3                  w1 [ fail ]
        Test ended at 2018-04-19 14:48:30
 
CURRENT_TEST: rocksdb.truncate_table3
mysqltest: In included file "D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\t\drop_table3_repopulate_table.inc": 
included from D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\t\drop_table3.inc at line 31:
included from D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\t\truncate_table3.test at line 6:
At line 14: query '$insert' failed: 2013: Lost connection to MySQL server during query

...

Thread pointer: 0x975761ef08
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
ha_rocksdb.dll!rocksdb::log::Writer::AddRecord()[log_writer.cc:44]
ha_rocksdb.dll!rocksdb::DBImpl::WriteToWAL()[db_impl_write.cc:793]
ha_rocksdb.dll!rocksdb::DBImpl::ConcurrentWriteToWAL()[db_impl_write.cc:903]
ha_rocksdb.dll!rocksdb::DBImpl::WriteImpl()[db_impl_write.cc:274]
ha_rocksdb.dll!rocksdb::DBImpl::Write()[db_impl_write.cc:49]
ha_rocksdb.dll!rocksdb::WriteCommittedTxn::CommitWithoutPrepareInternal()[pessimistic_transaction.cc:308]
ha_rocksdb.dll!rocksdb::PessimisticTransaction::Commit()[pessimistic_transaction.cc:266]
ha_rocksdb.dll!myrocks::Rdb_transaction_impl::commit_no_binlog()[ha_rocksdb.cc:2493]
ha_rocksdb.dll!myrocks::rocksdb_commit()[ha_rocksdb.cc:3393]
mysqld.exe!commit_one_phase_2()[handler.cc:1567]
mysqld.exe!ha_commit_one_phase()[handler.cc:1550]
mysqld.exe!ha_commit_trans()[handler.cc:1415]
mysqld.exe!trans_commit_stmt()[transaction.cc:511]
mysqld.exe!mysql_execute_command()[sql_parse.cc:6267]
mysqld.exe!mysql_parse()[sql_parse.cc:7896]
mysqld.exe!dispatch_command()[sql_parse.cc:1808]
mysqld.exe!do_command()[sql_parse.cc:1359]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:366]
mysqld.exe!tp_callback()[threadpool_common.cc:192]
ntdll.dll!RtlFreeUnicodeString()
ntdll.dll!RtlFreeUnicodeString()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x975762bd90): INSERT INTO t1 VALUES (19649, 19649, rpad('a', 499, 'b'))

This seems to be some random issue - it started to happen after a totally unrelated change (different error logging).

Comment by Sergei Petrunia [ 2018-05-11 ]

Observed again on the current (post-merge) 10.2:
http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/8215/steps/test/logs/stdio

Version: '10.2.15-MariaDB-log'  socket: ''  port: 16100  mariadb.org binary distribution
180511 12:13:31 [ERROR] mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.15-MariaDB-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=65537
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4218 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x4a35b4e9b8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
ha_rocksdb.dll!rocksdb::log::Writer::AddRecord()[log_writer.cc:44]
ha_rocksdb.dll!rocksdb::DBImpl::WriteToWAL()[db_impl_write.cc:793]
ha_rocksdb.dll!rocksdb::DBImpl::ConcurrentWriteToWAL()[db_impl_write.cc:903]
ha_rocksdb.dll!rocksdb::DBImpl::WriteImpl()[db_impl_write.cc:274]
ha_rocksdb.dll!rocksdb::DBImpl::Write()[db_impl_write.cc:49]
ha_rocksdb.dll!rocksdb::WriteCommittedTxn::CommitWithoutPrepareInternal()[pessimistic_transaction.cc:308]
ha_rocksdb.dll!rocksdb::PessimisticTransaction::Commit()[pessimistic_transaction.cc:266]
ha_rocksdb.dll!myrocks::Rdb_transaction_impl::commit_no_binlog()[ha_rocksdb.cc:2557]
ha_rocksdb.dll!myrocks::rocksdb_commit()[ha_rocksdb.cc:3484]
mysqld.exe!commit_one_phase_2()[handler.cc:1569]
mysqld.exe!ha_commit_one_phase()[handler.cc:1552]
mysqld.exe!ha_commit_trans()[handler.cc:1417]
mysqld.exe!trans_commit_stmt()[transaction.cc:511]
mysqld.exe!mysql_execute_command()[sql_parse.cc:6294]
mysqld.exe!mysql_parse()[sql_parse.cc:7945]
mysqld.exe!dispatch_command()[sql_parse.cc:1822]
mysqld.exe!do_command()[sql_parse.cc:1373]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:366]
mysqld.exe!tp_callback()[threadpool_common.cc:192]
ntdll.dll!RtlFreeUnicodeString()
ntdll.dll!RtlFreeUnicodeString()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x4a35b5b420): INSERT INTO t1 VALUES (17618, 17618, rpad('a', 499, 'b'))
Connection ID (thread ID): 3
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file at D:\winx64-packages\build\mysql-test\var\2\mysqld.1\data\
Minidump written to D:\winx64-packages\build\mysql-test\var\2\mysqld.1\data\mysqld.dmp
----------SERVER LOG END-------------

Comment by Sergei Petrunia [ 2018-05-11 ]

More details from there:

FAULTING_SOURCE_CODE:  
    40:   size_t left = slice.size();
    41: 
    42:   // Header size varies depending on whether we are recycling or not.
    43:   const int header_size =
>   44:       recycle_log_files_ ? kRecyclableHeaderSize : kHeaderSize;
    45: 
    46:   // Fragment the record if necessary and emit it.  Note that if slice
    47:   // is empty, we still want to iterate once to emit a single
    48:   // zero-length record
    49:   Status s;

It is failing, because this == NULL:

ha_rocksdb!rocksdb::log::Writer::AddRecord [d:\winx64-packages\build\src\storage\rocksdb\rocksdb\db\log_writer.cc @ 44]
class rocksdb::log::Writer * this = 0x00000000`00000000
class rocksdb::Slice * slice = 0x0000004a`37ddcae0
char * ptr = 0x0000004a`35b6db80 "--- memory read error at address 0x0000004a`35b6db80 ---"
unsigned int64 left = 0x223
bool begin = <value unavailable>
unsigned int64 avail = <value unavailable>
unsigned int64 fragment_length = <value unavailable>
int64 leftover = <value unavailable>
rocksdb::log::RecordType type = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

further frames:

ha_rocksdb!rocksdb::DBImpl::WriteToWAL [d:\winx64-packages\build\src\storage\rocksdb\rocksdb\db\db_impl_write.cc @ 793]
class rocksdb::DBImpl * this = 0x0000004a`35a193c0
class rocksdb::WriteBatch * merged_batch = <value unavailable>
class rocksdb::log::Writer * log_writer = <value unavailable>
unsigned int64 * log_used = 0x00000000`00000000
unsigned int64 * log_size = 0x0000004a`37ddcb48
class rocksdb::Slice log_entry = class rocksdb::Slice
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ha_rocksdb!rocksdb::DBImpl::ConcurrentWriteToWAL [d:\winx64-packages\build\src\storage\rocksdb\rocksdb\db\db_impl_write.cc @ 903]
class rocksdb::DBImpl * this = 0x0000004a`35a193c0
struct rocksdb::WriteThread::WriteGroup * write_group = 0x0000004a`37ddcd28
unsigned int64 * log_used = 0x00000000`00000000
unsigned int64 * last_sequence = 0x0000004a`37ddccd0
unsigned int64 seq_inc = 2
class rocksdb::WriteBatch * merged_batch = 0x0000004a`35b6cd60
unsigned int64 write_with_wal = 1
unsigned int64 sequence = <value unavailable>
unsigned int64 log_size = 0x223
class rocksdb::WriteBatch * to_be_cached_state = 0x00000000`00000000
class rocksdb::WriteBatch tmp_batch = class rocksdb::WriteBatch
struct rocksdb::WriteThread::WriteGroup::Iterator <begin>$L0 = <value unavailable>
class rocksdb::InternalStats * stats = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Comment by Sergei Petrunia [ 2018-05-11 ]

it is not clear to me why rocksdb::DBImpl::WriteToWAL would have log_writer==NULL

Comment by Maysam Yabandeh [ 2018-05-15 ]

Sergei, does it occur persistently? I have an idea of what the bug is but cannot reproduce it with unit tests. https://github.com/facebook/rocksdb/issues/3852#issuecomment-389270014
I was wondering if it is a possibility to verify the patch some other way.

Comment by Sergei Petrunia [ 2018-05-21 ]

myabandeh, it is not easy to verify the patch alone (our build/compile setup doesn't allow patching RockDB easily).
But since it's now pushed into RocksDB's main repository, I can check if it helped. I've started tests.

Comment by Sergei Petrunia [ 2018-05-24 ]

Didn't observe the failure after one test pass: http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/8421

Comment by Sergei Petrunia [ 2018-09-27 ]

Last time this kind of failure happened in 10.2 was on 2018-08-31,
commit 6aa9219ae4ad4c689a1ab2f3bc3d9b9d. This is before the upstream merge.

Last time this kind of failure happened in 10.3 was 2018-08-03.

(This test also has failed once in bb-10.4-wlad, on 2018-09-24,
766b1a88146dc2f07a41fc844af64c3b. I think that one was a different kind of
failure, affecting multiple tests.)

Comment by Sergei Petrunia [ 2018-09-27 ]

Fixed by merge from MyRocks upstream (MDEV-17117).

Generated at Thu Feb 08 08:14:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.