[MDEV-20490] rocksdb.ttl_primary_read_filtering fails in BB Created: 2019-09-04  Updated: 2019-09-11  Resolved: 2019-09-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB, Tests
Affects Version/s: 10.2, 10.4
Fix Version/s: 10.2.28, 10.3.19, 10.4.9

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None
Environment:

kvm-rpm-centos74-amd64-debug


Issue Links:
Relates
relates to MDEV-16560 [counter] rocksdb.ttl_primary_read_fi... Closed

 Description   

The failure occurs fairly rarely

Examples:

CURRENT_TEST: rocksdb.ttl_primary_read_filtering
--- /usr/share/mysql-test/plugin/rocksdb/rocksdb/r/ttl_primary_read_filtering.result	2019-09-02 13:05:20.000000000 +0000
+++ /dev/shm/var/2/log/ttl_primary_read_filtering.reject	2019-09-02 16:59:00.342030558 +0000
@@ -120,7 +120,7 @@
 a
 select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
 variable_value-@c
-0
+8
 set global rocksdb_compact_cf='default';
 select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
 variable_value-@c

CURRENT_TEST: rocksdb.ttl_primary_read_filtering
--- /usr/share/mysql-test/plugin/rocksdb/rocksdb/r/ttl_primary_read_filtering.result	2019-09-02 05:57:13.000000000 +0000
+++ /dev/shm/var/2/log/ttl_primary_read_filtering.reject	2019-09-02 08:07:36.734956435 +0000
@@ -124,7 +124,7 @@
 set global rocksdb_compact_cf='default';
 select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
 variable_value-@c
-8
+5
 DROP TABLE t1;
 CREATE TABLE t1 (
 a int PRIMARY KEY

rocksdb.ttl_primary_read_filtering 'write_prepared' w2 [ fail ]
        Test ended at 2019-09-02 08:07:36
 
CURRENT_TEST: rocksdb.ttl_primary_read_filtering
--- /usr/share/mysql-test/plugin/rocksdb/rocksdb/r/ttl_primary_read_filtering.result	2019-09-02 05:57:13.000000000 +0000
+++ /dev/shm/var/2/log/ttl_primary_read_filtering.reject	2019-09-02 08:07:36.734956435 +0000
@@ -124,7 +124,7 @@
 set global rocksdb_compact_cf='default';
 select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
 variable_value-@c
-8
+5
 DROP TABLE t1;
 CREATE TABLE t1 (
 a int PRIMARY KEY



 Comments   
Comment by Sergei Petrunia [ 2019-09-05 ]

Another one

CURRENT_TEST: rocksdb.ttl_primary_read_filtering
--- /home/psergey/dev-git/10.4/storage/rocksdb/mysql-test/rocksdb/r/ttl_primary_read_filtering.result   2019-05-03 22:04:07.692432165 +0300
+++ /home/psergey/dev-git/10.4/storage/rocksdb/mysql-test/rocksdb/r/ttl_primary_read_filtering.reject   2019-09-05 10:23:57.068492290 +0300
@@ -124,7 +124,7 @@
 set global rocksdb_compact_cf='default';
 select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
 variable_value-@c
-8
+6
 DROP TABLE t1;
 CREATE TABLE t1 (
 a int PRIMARY KEY
 
mysqltest: Result content mismatch

Comment by Sergei Petrunia [ 2019-09-05 ]

Reading the failing part of the testcase:

CREATE TABLE t1 (
a int,
b int,
c int,
PRIMARY KEY (a,b,c)
) ENGINE=rocksdb
COMMENT='ttl_duration=1;';
set global rocksdb_debug_ttl_rec_ts = -100;
<INSERT 8 rows>;
set global rocksdb_debug_ttl_rec_ts = 0;

So, now the table has 8 rows, with timestamp NOW()-100 seconds. This means that
all rows are "expired" and will be removed on next compaction.

Then, the test notes the counter value, and flushes the rows from the memtable
(I assume the compaction filter is not applied and the expired records get into
the SST file at L0):

select variable_value into @c from information_schema.global_status where variable_name='rocksdb_rows_expired';
set global rocksdb_force_flush_memtable_now=1;

<a few selects are run>

Now, we check if the expired records were removed:

select variable_value-@c from information_schema.global_status 
where variable_name='rocksdb_rows_expired';
variable_value-@c
0

In failure #1, the value here is 8.
A guess: compaction happened for some reason, and removed the rows. I have no
idea what would cause the compaction, though.

set global rocksdb_compact_cf='default';
select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
variable_value-@c
8

Now, we've caused a compaction to happen and checked that it has removed all 8 rows.

In test failures 2,3,4, the number was less than 8. The causes could be:

  • Some of the records were not removed (that is, they are still in the database)
  • Some of the records were removed before the initial value of the counter was taken
Comment by Sergei Petrunia [ 2019-09-05 ]

Looking at a failing test run:

CURRENT_TEST: rocksdb.ttl_primary_read_filtering
--- /home/psergey/dev-git/10.4/storage/rocksdb/mysql-test/rocksdb/r/ttl_primary_read_filtering.result   2019-05-03 22:04:07.692432165 +0300
+++ /home/psergey/dev-git/10.4/storage/rocksdb/mysql-test/rocksdb/r/ttl_primary_read_filtering.reject   2019-09-05 13:05:16.108631444 +0300
@@ -124,7 +124,7 @@
 set global rocksdb_compact_cf='default';
 select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired';
 variable_value-@c
-8
+7
 DROP TABLE t1;
 CREATE TABLE t1 (
 a int PRIMARY KEY

Logging query names (at query start) and record expiration in the compaction filter. I see this:

AAAA query: CREATE TABLE t1 (
a int,
b int,
c int,
PRIMARY KEY (a,b,c)
) ENGINE=rocksdb
COMMENT='ttl_duration=1;'
AAAA query: set global rocksdb_debug_ttl_rec_ts = -100
AAAA query: INSERT INTO t1 values (0,0,0)
AAAA query: INSERT INTO t1 values (0,0,1)
AAAA: record expired
AAAA query: INSERT INTO t1 values (0,1,0)
AAAA query: INSERT INTO t1 values (0,1,1)
AAAA query: INSERT INTO t1 values (1,1,2)
AAAA query: INSERT INTO t1 values (1,2,1)
AAAA query: INSERT INTO t1 values (1,2,2)
AAAA query: INSERT INTO t1 values (1,2,3)
AAAA query: set global rocksdb_debug_ttl_rec_ts = 0
AAAA query: select variable_value into @c from information_schema.global_status where variable_name='rocksdb_rows_expired'
AAAA query: set global rocksdb_force_flush_memtable_now=1
2019-09-05 13:05:01 92 [Note] RocksDB: Manual memtable flush.
AAAA query: SELECT * FROM t1 WHERE a=1 AND b=2 AND c=2
AAAA query: SELECT * FROM t1 WHERE a = 1
AAAA query: SELECT max(a) from t1 where a < 3
AAAA query: SELECT max(a) from t1 where a < 2 AND b = 1 AND c < 3
AAAA query: SELECT min(a) from t1 where a >= 1
AAAA query: SELECT min(a) from t1 where a > 1
AAAA query: select * from t1 where a=1 and b in (1) order by c desc
AAAA query: select max(a) from t1 where a <=10
AAAA query: select a from t1 where a > 0 and a <= 2
AAAA query: select variable_value-@c from information_schema.global_status where variable_name='rocksdb_rows_expired'
AAAA query: set global rocksdb_compact_cf='default'
2019-09-05 13:05:01 92 [Note] RocksDB: Manual compaction of column family: default
 
2019-09-05 13:05:01 0 [Note] Manual Compaction id 80 cf default started.
AAAA: record expired
AAAA: record expired
AAAA: record expired
AAAA: record expired
AAAA: record expired
AAAA: record expired
AAAA: record expired
2019-09-05 13:05:02 0 [Note] Manual Compaction id 80 cf default ended.

Note that one record was removed during INSERTs.

Comment by Sergei Petrunia [ 2019-09-05 ]

A bit larger excerpt of the above log: https://gist.github.com/spetrunia/85932ef7bec8d0eb1308804a91737ea1

One can see that there was a memtable flush and a manual compaction shortly before the failing test case. So it is not clear why compaction filter was invoked in the middle of INSERT statements:

AAAA query: INSERT INTO t1 values (0,0,0)
AAAA query: INSERT INTO t1 values (0,0,1)
AAAA: record expired
AAAA query: INSERT INTO t1 values (0,1,0)
AAAA query: INSERT INTO t1 values (0,1,1)
AAAA query: INSERT INTO t1 values (1,1,2)
AAAA query: INSERT INTO t1 values (1,2,1)
AAAA query: INSERT INTO t1 values (1,2,2)
AAAA query: INSERT INTO t1 values (1,2,3)

Comment by Sergei Petrunia [ 2019-09-05 ]

Stack trace of when the record is marked as expired:

./rdb_compact_filter.h:105(myrocks::Rdb_compact_filter::Filter(int, rocksdb::Slice const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const)[0x7f7436c769f3]
rocksdb/compaction_filter.h:159(rocksdb::CompactionFilter::FilterV2(int, rocksdb::Slice const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const)[0x7f7436c76166]
compaction/compaction_iterator.cc:177(rocksdb::CompactionIterator::InvokeFilterIfNeeded(bool*, rocksdb::Slice*))[0x7f7436fb96b0]
compaction/compaction_iterator.cc:300(rocksdb::CompactionIterator::NextFromInput())[0x7f7436fb9fff]
compaction/compaction_iterator.cc:110(rocksdb::CompactionIterator::SeekToFirst())[0x7f7436fb9340]
compaction/compaction_job.cc:893(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*))[0x7f7436fc0506]
compaction/compaction_job.cc:593(rocksdb::CompactionJob::Run())[0x7f7436fbe660]
db_impl/db_impl_compaction_flush.cc:2689(rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority))[0x7f7436cf6a07]
db_impl/db_impl_compaction_flush.cc:2249(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority))[0x7f7436cf3db8]
db_impl/db_impl_compaction_flush.cc:2026(rocksdb::DBImpl::BGWorkCompaction(void*))[0x7f7436cf25c7]
bits/invoke.h:60(void std::__invoke_impl<void, void (*&)(void*), void*&>(std::__invoke_other, void (*&)(void*), void*&))[0x7f743703ef41]
bits/invoke.h:95(std::__invoke_result<void (*&)(void*), void*&>::type std::__invoke<void (*&)(void*), void*&>(void (*&)(void*), void*&))[0x7f743703eb3d]
7/functional:469(void std::_Bind<void (*(void*))(void*)>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>))[0x7f743703e2ee]
7/functional:551(void std::_Bind<void (*(void*))(void*)>::operator()<, void>())[0x7f743703d682]
bits/std_function.h:318(std::_Function_handler<void (), std::_Bind<void (*(void*))(void*)> >::_M_invoke(std::_Any_data const&))[0x7f743703c8a2]
bits/std_function.h:706(std::function<void ()>::operator()() const)[0x7f7436be0d5c]
util/threadpool_imp.cc:223(rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long))[0x7f7437039214]
util/threadpool_imp.cc:309(rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*))[0x7f743703938d]
bits/invoke.h:60(void* std::__invoke_impl<void*, void* (*)(void*), rocksdb::BGThreadMetadata*>(std::__invoke_other, void* (*&&)(void*), rocksdb::BGThreadMetadata*&&))[0x7f743703c13f]
bits/invoke.h:96(std::__invoke_result<void* (*)(void*), rocksdb::BGThreadMetadata*>::type std::__invoke<void* (*)(void*), rocksdb::BGThreadMetadata*>(void* (*&&)(void*), rocksdb::BGThreadMetadata*&&))[0x7f743703b55e]
7/thread:234(decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void* (*)(void*), rocksdb::BGThreadMetadata*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>))[0x7f743703f5f7]
7/thread:243(std::thread::_Invoker<std::tuple<void* (*)(void*), rocksdb::BGThreadMetadata*> >::operator()())[0x7f743703f5ad]

So, it is a background compaction.

Comment by Sergei Petrunia [ 2019-09-05 ]

... which is odd, because there is storage/rocksdb/mysql-test/rocksdb/t/ttl_primary_read_filtering-master.opt with this:

--rocksdb_default_cf_options=disable_auto_compactions=true

Comment by Sergei Petrunia [ 2019-09-05 ]

A guess: the compaction was scheduled by a DROP TABLE operation.

Comment by Sergei Petrunia [ 2019-09-05 ]

(for the record: diff with debug printouts: https://gist.github.com/spetrunia/25a567ac3a59e01b675c41be80a263a3 )

Generated at Thu Feb 08 08:59:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.