[MDEV-494] Cassandra: terminate called after throwing an instance of 'apache::thrift::transport::TTransportException' or a phantom row after big INSERT or LOAD Created: 2012-08-28  Updated: 2012-09-14  Resolved: 2012-09-14

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 10.0.0

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-431 Cassandra storage engine Closed

 Description   

When the test case is executed for the first time after Cassandra is started, it passes, but shows that after the INSERT which was supposed to insert 10,000 rows the table contains 10,001 rows. When the very same test case is run for the second time without restarting Cassandra, SELECT COUNT causes an exception (not fatal) in Cassandra and a crash in server.

The second SELECT (with GROUP BY) is not necessary for the test case, it just shows that the extra record is for real.

Test case:

# Make sure your cassandra is running,
# and thrift is on LD_LIBRARY_PATH,
# and cqlsh is on the system PATH
 
--source include/have_cassandra.inc
 
--remove_files_wildcard $MYSQLTEST_VARDIR test.cql
--write_file $MYSQLTEST_VARDIR/test.cql
DROP KEYSPACE bug_ks;
EOF
--error 0,1,2
--system cqlsh -3 -f $MYSQLTEST_VARDIR/test.cql
 
--remove_file $MYSQLTEST_VARDIR/test.cql
--write_file $MYSQLTEST_VARDIR/test.cql
CREATE KEYSPACE bug_ks
  WITH strategy_class = 'org.apache.cassandra.locator.SimpleStrategy'
   AND strategy_options:replication_factor='1';
USE bug_ks;
CREATE COLUMNFAMILY cf (id bigint PRIMARY KEY, k bigint);
EOF
 
--system cqlsh -3 -f $MYSQLTEST_VARDIR/test.cql
 
--disable_warnings
DROP TABLE IF EXISTS t_ca, t_my;
--enable_warnings
 
CREATE TABLE t_my ( i int not null auto_increment primary key, b int ) ENGINE=MyISAM;
INSERT INTO t_my (b) VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0);
INSERT INTO t_my (b) SELECT a1.b FROM t_my a1, t_my a2, t_my a3, t_my a4;
 
CREATE TABLE t_ca (rowkey int, k int, PRIMARY KEY (rowkey))
  ENGINE=CASSANDRA DEFAULT CHARSET=latin1
  thrift_host='localhost' keyspace='bug_ks' column_family = 'cf';
 
INSERT INTO t_ca SELECT * FROM t_my LIMIT 10000;
SELECT COUNT(*) FROM t_ca;
SELECT rowkey, COUNT(*) FROM t_ca GROUP BY rowkey HAVING COUNT(*) > 1;
 
# Cleanup
DROP TABLE t_ca, t_my;
 

First run result:

CREATE TABLE t_ca (rowkey int, k int, PRIMARY KEY (rowkey))
ENGINE=CASSANDRA DEFAULT CHARSET=latin1
thrift_host='localhost' keyspace='bug_ks' column_family = 'cf';
INSERT INTO t_ca SELECT * FROM t_my LIMIT 10000;
SELECT COUNT(*) FROM t_ca;
COUNT(*)
10001
SELECT rowkey, COUNT(*) FROM t_ca GROUP BY rowkey HAVING COUNT(*) > 1;
rowkey  COUNT(*)
2190    2
DROP TABLE t_ca, t_my;
 

Cassandra exception after the second run:

ERROR 02:08:14,301 Error occurred during processing of message.
java.lang.RuntimeException: corrupt sstable
        at org.apache.cassandra.io.sstable.SSTableScanner.seekTo(SSTableScanner.java:106)
        at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:74)
        at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1373)
        at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1433)
        at org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:50)
        at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:867)
        at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:691)
        at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3008)
        at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:2996)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.EOFException: unable to seek to position 700116 in cassandra/data/bug_ks/cf/bug_ks-cf-he-1-Data.db (543086 bytes) in read-only mode
        at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:253)
        at org.apache.cassandra.io.sstable.SSTableScanner.seekTo(SSTableScanner.java:100)
        ... 14 more
 

Server stack trace:

#4  <signal handler called>
#5  0xffffe430 in __kernel_vsyscall ()
#6  0xb74697af in raise () from /lib/libc.so.6
#7  0xb746b0f0 in abort () from /lib/libc.so.6
#8  0xb76a77dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#9  0xb76a5533 in ?? () from /usr/lib/libstdc++.so.6
#10 0xb76a556f in std::terminate() () from /usr/lib/libstdc++.so.6
#11 0xb76a56d1 in __cxa_throw () from /usr/lib/libstdc++.so.6
#12 0xb77945d5 in apache::thrift::transport::readAll<apache::thrift::transport::TBufferBase> (trans=...,
    buf=0xaced8d5c "\364\217\307\b\350;}\t\364\217\307\b\270\215\355\254\315oZ\b\b?}\t\204\215\355\254\001", len=4) at ./src/transport/TTransport.h:41
#13 0xb77a9596 in readAll (this=0x97b77a8,
    buf=0xaced8d5c "\364\217\307\b\350;}\t\364\217\307\b\270\215\355\254\315oZ\b\b?}\t\204\215\355\254\001", len=4) at ./src/transport/TBufferTransports.h:82
#14 readAll (this=0x97b77a8,
    buf=0xaced8d5c "\364\217\307\b\350;}\t\364\217\307\b\270\215\355\254\315oZ\b\b?}\t\204\215\355\254\001", len=4) at ./src/transport/TBufferTransports.h:390
#15 apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TFramedTransport, apache::thrift::transport::TBufferBase>::readAll_virt (this=0x97b77a8,
    buf=0xaced8d5c "\364\217\307\b\350;}\t\364\217\307\b\270\215\355\254\315oZ\b\b?}\t\204\215\355\254\001", len=4) at ./src/transport/TVirtualTransport.h:99
#16 0x08595aa8 in apache::thrift::transport::TTransport::readAll (this=0x97b77a8,
    buf=0xaced8d5c "\364\217\307\b\350;}\t\364\217\307\b\270\215\355\254\315oZ\b\b?}\t\204\215\355\254\001", len=4)
    at /home/elenst/cassandra/thrift/include/thrift/transport/TTransport.h:126
#17 0x085a7791 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readI32 (this=0x97d3f08, i32=@0xaced8d84)
    at /home/elenst/cassandra/thrift/include/thrift/protocol/TBinaryProtocol.tcc:372
#18 0x085a6fcd in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin (this=0x97d3f08, name="", messageType=@0xaced8e4c,
    seqid=@0xaced8e54)
    at /home/elenst/cassandra/thrift/include/thrift/protocol/TBinaryProtocol.tcc:203
#19 0x085a6772 in apache::thrift::protocol::TVirtualProtocol<apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>, apache::thrift::protocol::TProtocolDefaults>::readMessageBegin_virt (this=0x97d3f08, name="", messageType=@0xaced8e4c,
    seqid=@0xaced8e54)
    at /home/elenst/cassandra/thrift/include/thrift/protocol/TVirtualProtocol.h:432
#20 0x085daa97 in apache::thrift::protocol::TProtocol::readMessageBegin(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, apache::thrift::protocol::TMessageType&, int&) ()
#21 0x085c04b8 in org::apache::cassandra::CassandraClient::recv_get_range_slices (
    this=0x97d3fb0, _return=std::vector of length 10000, capacity 10000 = {...})
    at storage/cassandra/gen-cpp/Cassandra.cpp:8854
#22 0x085c02b5 in org::apache::cassandra::CassandraClient::get_range_slices (
    this=0x97d3fb0, _return=std::vector of length 10000, capacity 10000 = {...},
    column_parent=..., predicate=..., range=...,
    consistency_level=org::apache::cassandra::ConsistencyLevel::ONE)
    at storage/cassandra/gen-cpp/Cassandra.cpp:8827
#23 0x08594343 in Cassandra_se_impl::get_range_slices (this=0x97d3870,
    last_key_as_start_key=true)
    at storage/cassandra/cassandra_se.cc:420
#24 0x085945d4 in Cassandra_se_impl::get_next_range_slice_row (this=0x97d3870,
    eof=0xaced8feb) at storage/cassandra/cassandra_se.cc:456
#25 0x08591962 in ha_cassandra::rnd_next (this=0x97d27d0, buf=0x97d2bf0 "\375\216\b")
    at storage/cassandra/ha_cassandra.cc:872
#26 0x082177e5 in handler::ha_rnd_next (this=0x97d27d0, buf=0x97d2bf0 "\375\216\b")
    at sql/sql_class.h:4263
#27 0x085198c1 in rr_sequential (info=0x97bef10)
    at sql/records.cc:452
#28 0x082c7f66 in sub_select (join=0x97be510, join_tab=0x97beeb8, end_of_records=false)
    at sql/sql_select.cc:16117
#29 0x082c77a9 in do_select (join=0x97be510, fields=0x97be754, table=0x0, procedure=0x0)
    at sql/sql_select.cc:15768
#30 0x082aa380 in JOIN::exec (this=0x97be510)
    at sql/sql_select.cc:2824
#31 0x082aab5a in mysql_select (thd=0x970d978, rref_pointer_array=0x970f548,
    tables=0x97be070, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0,
    having=0x0, proc_param=0x0, select_options=2147748608, result=0x97be500,
    unit=0x970ef58, select_lex=0x970f408)
    at sql/sql_select.cc:3044
#32 0x082a25de in handle_select (thd=0x970d978, lex=0x970eef4, result=0x97be500,
    setup_tables_done_option=0) at sql/sql_select.cc:313
#33 0x0827e1c1 in execute_sqlcom_select (thd=0x970d978, all_tables=0x97be070)
    at sql/sql_parse.cc:4621
#34 0x0827715b in mysql_execute_command (thd=0x970d978)
    at sql/sql_parse.cc:2189
#35 0x082807b3 in mysql_parse (thd=0x970d978,
    rawbuf=0x97bde78 "SELECT COUNT(*) FROM t_ca", length=25, parser_state=0xaced9da4)
    at sql/sql_parse.cc:5736
#36 0x08274bfc in dispatch_command (command=COM_QUERY, thd=0x970d978, packet=0x97b8099 "",
    packet_length=25) at sql/sql_parse.cc:1055
#37 0x082740c1 in do_command (thd=0x970d978)
    at sql/sql_parse.cc:794
#38 0x0835fe71 in do_handle_one_connection (thd_arg=0x970d978)
    at sql/sql_connect.cc:1253
#39 0x0835f9bc in handle_one_connection (arg=0x970d978)
    at sql/sql_connect.cc:1168
#40 0x0858c278 in pfs_spawn_thread (arg=0x9812850)
    at storage/perfschema/pfs.cc:1015
#41 0xb77dbb25 in start_thread () from /lib/libpthread.so.0
 

bzr version-info

revision-id: psergey@askmonty.org-20120827044458-y0x419xfr7n22q1q
date: 2012-08-27 08:44:58 +0400
build-date: 2012-08-28 02:07:14 +0400
revno: 3496



 Comments   
Comment by Sergei Petrunia [ 2012-08-28 ]

It seems to be easy to get an extra row. I've compared outputs and see that the extra row returned has the same rowkey value as some other row. This means that:

  • the data in Cassandra is correct
  • we make some error when reading it. I suspect full-scan batching.
Comment by Sergei Petrunia [ 2012-08-28 ]

Pushed the fix for the part with phantom row.

Comment by Sergei Petrunia [ 2012-08-29 ]

Assuming closed.

Comment by Sergei Petrunia [ 2012-08-29 ]

We need to also look at why does throwing Exception from Thrift crashed the server. This should not have happened.

Comment by Elena Stepanova [ 2012-08-29 ]

In regard to the thrift exception itself, I installed the same versions of thrift, cassandra and server on another machine, also openSUSE 11.3, mimicked the command line, but still didn't get the crash, which is reliably reproducible on the first machine; so I guess it's limited to the configuration.
Obvious difference between two machines is the hardware (32 bit / 1.25 Gb / 1x1.5 GHz vs 64 bit / 8 Gb / 4x2x3.2 GHz), although of course there can be different library versions behind.

For the record, it was thrift 0.8.0, cassandra 1.1.4, command line

/usr/lib/jvm/java/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms629M -Xmx629M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.4.jar:bin/../lib/apache-cassandra-clientutil-1.1.4.jar:bin/../lib/apache-cassandra-thrift-1.1.4.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar org.apache.cassandra.thrift.CassandraDaemon

Comment by Sergei Petrunia [ 2012-09-14 ]

Added handling for generic Thrift exceptions, and arbitrary exceptions that might be thrown by Thrift API calls.

Generated at Thu Feb 08 06:29:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.