[MDEV-431] Cassandra storage engine - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.0.1
Component/s: None
Labels:
None

Description

Implement HBase storage for Cassandra instead.

See http://kb.askmonty.org/en/cassandra-storage-engine/ for user-level description

Attachments

Issue Links

includes

MDEV-377 Name support for dynamic columns

Closed

MDEV-506 Cassandra dynamic columns access

Closed

MDEV-530 Cassandra SE: Locking is incorrect

Closed

MDEV-534 Cassandra SE: small features

Closed

MDEV-535 Cassandra SE: Internal error: 'TimedOutException: Default TException errors

Closed

MDEV-3931 Cassandra SE packaging

Closed

is blocked by

MDEV-4012 10.0-base merge

Closed

relates to

MDEV-122 Data mapping between HBase and SQL

Closed

MDEV-476 Cassandra: Server crashes in calculate_key_len on DELETE with ORDER BY

Closed

MDEV-477 Cassandra: Assertion `!table || (!table->read_set || bitmap_is_set(table->read_set, field_index))' failed on DELETE with ORDER BY

Closed

MDEV-480 Cassandra: TRUNCATE TABLE on a Cassandra table does not remove rows

Closed

MDEV-494 Cassandra: terminate called after throwing an instance of 'apache::thrift::transport::TTransportException' or a phantom row after big INSERT or LOAD

Closed

MDEV-497 Cassandra: Table elimination is not working

Closed

MDEV-498 Cassandra: Inserting a timestamp does not work on a 32-bit system

Closed

MDEV-501 Cassandra SE fails to compile on Ubuntu

Closed

MDEV-560 Cassandra: DynCols: Server crashes in ha_cassandra::read_cassandra_columns on reading from a dynamic column

Closed

MDEV-561 Cassandra: DynCols: debugger aborting because missing DBUG_RETURN or DBUG_VOID_RETURN macro in function "ha_cassandra::write_row"

Closed

MDEV-565 Cassandra: DynCols: Server crashes in ha_cassandra::write_row on inserting NULL into a dynamic column

Closed

MDEV-3996 Cassandra: Error message for ER_CONNECT_TO_FOREIGN_DATA_SOURCE does not fit into allowed 64 symbols

Closed

MDEV-3997 Querying a Cassandra table on a server with query cache enabled is likely to cause problems

Closed

MDEV-3998 Cassandra SE: Cryptic error message on SELECT from a badly formed table

Closed

MDEV-4000 Mapping between Cassandra blob (BytesType) and MySQL BLOB does not work

Closed

MDEV-4001 Cassandra: server crashes in ha_cassandra::end_bulk_insert on INSERT .. SELECT with a non-existing column

Closed

MDEV-4003 Cassandra: Error 1032 (Can't find record) on concurrent delete from the same storage through different tables

Closed

MDEV-4005 Server crashes on creating a Cassandra table with a mix of static and dynamic columns

Closed

MDEV-4014 Cassandra: TException or Internal (unspecified) error in handle on using symbols > 127 with latin1/ascii

Closed

MDEV-3792 review the handler part of the cassandra storage engine

Closed

MDEV-23024 Remove Cassandra Storage Engine

Closed

(1 includes, 1 is blocked by, 21 relates to)

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia created issue - 2012-08-04 20:39

Sergei Golubchik made changes - 2012-08-04 22:09

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-122~~ [ ~~MDEV-122~~ ]

Sergei Petrunia made changes - 2012-08-06 15:56

Assignee

Sergei Petrunia [ psergey ]

Sergei Petrunia made changes - 2012-08-06 15:56

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia added a comment - 2012-08-15 17:13

Ok, column validators do not seem to be mandatory. cassandra-cli allows to insert any (rowid, column_name, column_value) regardless of what validators are present.
This opportunity seems to be missing from CQL. There you can only use column names that were defined.
SELECTs in CQL ignore (=do not produce) rows that do not have the required columns.

Sergei Petrunia added a comment - 2012-08-15 17:13 Ok, column validators do not seem to be mandatory. cassandra-cli allows to insert any (rowid, column_name, column_value) regardless of what validators are present. This opportunity seems to be missing from CQL. There you can only use column names that were defined. SELECTs in CQL ignore (=do not produce) rows that do not have the required columns.

Sergei Petrunia added a comment - 2012-08-17 15:38

Mapping between CQL type names and ColumnDef::validation_class values:

blob, "org.apache.cassandra.db.marshal.BytesType"
ascii, "org.apache.cassandra.db.marshal.AsciiType"
text, "org.apache.cassandra.db.marshal.UTF8Type"
varint, "org.apache.cassandra.db.marshal.IntegerType"
int, "org.apache.cassandra.db.marshal.Int32Type"
bigint, "org.apache.cassandra.db.marshal.LongType"
uuid, "org.apache.cassandra.db.marshal.UUIDType"
timestamp, "org.apache.cassandra.db.marshal.DateType"
boolean, "org.apache.cassandra.db.marshal.BooleanType"
float, "org.apache.cassandra.db.marshal.FloatType"
double, "org.apache.cassandra.db.marshal.DoubleType"
decimal "org.apache.cassandra.db.marshal.DecimalType"

Sergei Petrunia added a comment - 2012-08-17 15:38 Mapping between CQL type names and ColumnDef::validation_class values: blob, "org.apache.cassandra.db.marshal.BytesType" ascii, "org.apache.cassandra.db.marshal.AsciiType" text, "org.apache.cassandra.db.marshal.UTF8Type" varint, "org.apache.cassandra.db.marshal.IntegerType" int, "org.apache.cassandra.db.marshal.Int32Type" bigint, "org.apache.cassandra.db.marshal.LongType" uuid, "org.apache.cassandra.db.marshal.UUIDType" timestamp, "org.apache.cassandra.db.marshal.DateType" boolean, "org.apache.cassandra.db.marshal.BooleanType" float, "org.apache.cassandra.db.marshal.FloatType" double, "org.apache.cassandra.db.marshal.DoubleType" decimal "org.apache.cassandra.db.marshal.DecimalType"

Sergei Petrunia added a comment - 2012-08-17 15:44

...,
counter org.apache.cassandra.db.marshal.CounterColumnType

Sergei Petrunia added a comment - 2012-08-17 15:44 ..., counter org.apache.cassandra.db.marshal.CounterColumnType

Sergei Petrunia made changes - 2012-08-17 18:37

Description

Implement HBase storage for Cassandra instead.

Implement HBase storage for Cassandra instead.

See http://kb.askmonty.org/en/cassandra-storage-engine/ for user-level description

Jonathan Ellis added a comment - 2012-08-20 18:21

Cassandra schema definition has evolved a bit. The old way to define a "wide row" CF (i.e., representing a partition of data, clustered on the comparator) was to define comparator and default_validation_class, and leave column names implicit in your application code. That is, the Cassandra "column name" would really be the value of an unnamed column that would be part of the primary key in the partition.

Since columnfamily definition defaulted to no column names, this was the default behavior, but mixing this with "static" column definitions is very bad practice. (But Cassandra will not ignore validators that are correctly declared, you are mistaken on that point.)

We cleaned this up for Cassandra 1.1 with CQL3, as outlined here: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Old-style schema will be supported indefinitely for backwards compatibility, but cql schema is far more straightforward to use correctly.

Jonathan Ellis added a comment - 2012-08-20 18:21 Cassandra schema definition has evolved a bit. The old way to define a "wide row" CF (i.e., representing a partition of data, clustered on the comparator) was to define comparator and default_validation_class, and leave column names implicit in your application code. That is, the Cassandra "column name" would really be the value of an unnamed column that would be part of the primary key in the partition. Since columnfamily definition defaulted to no column names, this was the default behavior, but mixing this with "static" column definitions is very bad practice. (But Cassandra will not ignore validators that are correctly declared, you are mistaken on that point.) We cleaned this up for Cassandra 1.1 with CQL3, as outlined here: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 Old-style schema will be supported indefinitely for backwards compatibility, but cql schema is far more straightforward to use correctly.

Sergei Petrunia added a comment - 2012-08-22 01:16

yes, I've repeated my experiment and see that indeed the defined validators are enforced.

Thanks a lot for pointing out what the CQL's composite PRIMARY KEYs are! I was overwhelmed by all the new things I was learning about Cassandra, and dismissed composite PKs as "ok, these are probably just like non-composite ones, except that they are tuples". Apparently, I was wrong, they play a more important role. I'll need to think more before I understand what this means for this project, though. At least, we shouldn't ignore them.

Sergei Petrunia added a comment - 2012-08-22 01:16 yes, I've repeated my experiment and see that indeed the defined validators are enforced. Thanks a lot for pointing out what the CQL's composite PRIMARY KEYs are! I was overwhelmed by all the new things I was learning about Cassandra, and dismissed composite PKs as "ok, these are probably just like non-composite ones, except that they are tuples". Apparently, I was wrong, they play a more important role. I'll need to think more before I understand what this means for this project, though. At least, we shouldn't ignore them.

Sergei Petrunia added a comment - 2012-08-22 14:00

Testing todo task

Sergei Petrunia added a comment - 2012-08-22 14:00 Testing todo task

Sergei Petrunia made changes - 2012-08-22 14:00

Link

This issue relates to TODO-263 [ TODO-263 ]

Elena Stepanova made changes - 2012-08-22 21:14

Link

This issue relates to ~~MDEV-476~~ [ ~~MDEV-476~~ ]

Elena Stepanova made changes - 2012-08-22 23:51

Link

This issue relates to ~~MDEV-477~~ [ ~~MDEV-477~~ ]

Elena Stepanova made changes - 2012-08-23 23:09

Link

This issue relates to ~~MDEV-480~~ [ ~~MDEV-480~~ ]

Elena Stepanova made changes - 2012-08-28 01:14

Link

This issue relates to ~~MDEV-494~~ [ ~~MDEV-494~~ ]

Elena Stepanova made changes - 2012-08-29 02:32

Link

This issue relates to ~~MDEV-497~~ [ ~~MDEV-497~~ ]

Elena Stepanova made changes - 2012-08-30 04:06

Link

This issue relates to ~~MDEV-498~~ [ ~~MDEV-498~~ ]

Sergei Petrunia made changes - 2012-09-01 12:54

Link

This issue relates to ~~MDEV-501~~ [ ~~MDEV-501~~ ]

Sergei Petrunia made changes - 2012-09-05 11:37

Link

This issue includes ~~MDEV-506~~ [ ~~MDEV-506~~ ]

Sergei Petrunia added a comment - 2012-09-12 11:02

there is now read-only support for counter datatype
Started to do benchmarks in Amazon. First results for data load operations:
= ha_cassandra fails to utilize available network bandwidth
= ha_cassandra occupies about 50% of one cpu, and seems to be the bottleneck.

Possible directions for speedup:

Use async API and multiple connections to Cassandra
Optimize ha_cassandra code be less CPU-intensive.

Sergei Petrunia added a comment - 2012-09-12 11:02 there is now read-only support for counter datatype Started to do benchmarks in Amazon. First results for data load operations: = ha_cassandra fails to utilize available network bandwidth = ha_cassandra occupies about 50% of one cpu, and seems to be the bottleneck. Possible directions for speedup: Use async API and multiple connections to Cassandra Optimize ha_cassandra code be less CPU-intensive.

Sergei Petrunia added a comment - 2012-09-12 23:52

Tried profiling ha_cassandra on home setup and on EC2. Results from EC2 (% numbers are cumulative-time)
mysqld - 99.9%

start_thread 66.41 %
mysql_load 66.34 %
read_sep_field 65.77%
write_record 55.81%
ha_cassandra::write_row 54.75 %
(the next big one is "No map [/home/ubuntu/5.5-cassandra/sql/mysqld]") with 11.23%
then assortment of libc/libgcc locations, a lot of them pointing to std::string members.

This means: at least 54% of time is spent in ha_cassandra::write_row(). Some of other time should probably be blamed on ha_cassandra also, because no other part of the server uses std::string.

Sergei Petrunia added a comment - 2012-09-12 23:52 Tried profiling ha_cassandra on home setup and on EC2. Results from EC2 (% numbers are cumulative-time) mysqld - 99.9% start_thread 66.41 % mysql_load 66.34 % read_sep_field 65.77% write_record 55.81% ha_cassandra::write_row 54.75 % (the next big one is "No map [/home/ubuntu/5.5-cassandra/sql/mysqld] ") with 11.23% then assortment of libc/libgcc locations, a lot of them pointing to std::string members. This means: at least 54% of time is spent in ha_cassandra::write_row(). Some of other time should probably be blamed on ha_cassandra also, because no other part of the server uses std::string.

Sergei Petrunia made changes - 2012-09-12 23:58

Link

This issue includes ~~MDEV-530~~ [ ~~MDEV-530~~ ]

Sergei Petrunia made changes - 2012-09-14 12:37

Link

This issue includes ~~MDEV-534~~ [ ~~MDEV-534~~ ]

Sergei Petrunia made changes - 2012-09-14 22:49

Link

This issue includes ~~MDEV-535~~ [ ~~MDEV-535~~ ]

Sergei Petrunia added a comment - 2012-09-14 23:24

Did some more benchmarks, results summarized here: https://lists.launchpad.net/maria-developers/msg04889.html. It seems, CPU usage of SQL node is not actually a problem - get a release build + better CPU. Lack of ability to use multiple connections IS a problem.

Sergei Petrunia added a comment - 2012-09-14 23:24 Did some more benchmarks, results summarized here: https://lists.launchpad.net/maria-developers/msg04889.html . It seems, CPU usage of SQL node is not actually a problem - get a release build + better CPU. Lack of ability to use multiple connections IS a problem.

Sergei Petrunia added a comment - 2012-09-18 12:22

Started to think about how we could use multiple Thrift API connections. Thrift library doesn't support asynchronous clients. We could use thread-per-connection model, but think of all the effort (both development and run-time) we'll need to sync the threads.

There is a patch for Thrift that allows to use async client: https://issues.apache.org/jira/browse/THRIFT-579. It's been maintained across a few years, which probably means it's not just something that "barely compiles". I'm going to try using it.

Sergei Petrunia added a comment - 2012-09-18 12:22 Started to think about how we could use multiple Thrift API connections. Thrift library doesn't support asynchronous clients. We could use thread-per-connection model, but think of all the effort (both development and run-time) we'll need to sync the threads. There is a patch for Thrift that allows to use async client: https://issues.apache.org/jira/browse/THRIFT-579 . It's been maintained across a few years, which probably means it's not just something that "barely compiles". I'm going to try using it.

Sergei Petrunia added a comment - 2012-09-18 12:35

It compiles and works (btw figured how get Cassandra.thrift output to compile with fewer edits in generated code).

Problems:

this thing relies heavily on boost library which is not debugger-friendly or newbie-friendly.
thrift-trunk/lib/cpp/src/async/TAsioAsync.cpp: TAsioClient::handleConnect() has " Todo: call user-provided errback" for error handling.
I don't know what should I call in "on_connect" call back to have the code return from io_service.run().

Sergei Petrunia added a comment - 2012-09-18 12:35 It compiles and works (btw figured how get Cassandra.thrift output to compile with fewer edits in generated code). Problems: this thing relies heavily on boost library which is not debugger-friendly or newbie-friendly. thrift-trunk/lib/cpp/src/async/TAsioAsync.cpp: TAsioClient::handleConnect() has " Todo: call user-provided errback" for error handling. I don't know what should I call in "on_connect" call back to have the code return from io_service.run().

Sergei Golubchik made changes - 2012-09-18 15:41

Fix Version/s		10.0.1 [ 11400 ]
Fix Version/s	10.0.0 [ 10000 ]

Jonathan Ellis added a comment - 2012-09-18 18:21

You might want to use the Cassandra native protocol introduced in 1.2 (trunk): https://github.com/apache/cassandra/blob/trunk/doc/native_protocol.spec

Jonathan Ellis added a comment - 2012-09-18 18:21 You might want to use the Cassandra native protocol introduced in 1.2 (trunk): https://github.com/apache/cassandra/blob/trunk/doc/native_protocol.spec

Sergei Petrunia added a comment - 2012-09-19 23:00

Thanks for the note. I guess the protocol has been pushed fairly recently? IIRC when I checked for it, there was only a Jira entry.
I think, I won't be able to implement the protocol before my nearest delivery, we'll consider implementing it for the milestone after that.

Sergei Petrunia added a comment - 2012-09-19 23:00 Thanks for the note. I guess the protocol has been pushed fairly recently? IIRC when I checked for it, there was only a Jira entry. I think, I won't be able to implement the protocol before my nearest delivery, we'll consider implementing it for the milestone after that.

Sergei Petrunia added a comment - 2012-09-25 17:07

Added support for 'varint' type.

Sergei Petrunia added a comment - 2012-09-25 17:07 Added support for 'varint' type.

Elena Stepanova made changes - 2012-09-28 00:19

Link

This issue relates to ~~MDEV-560~~ [ ~~MDEV-560~~ ]

Elena Stepanova made changes - 2012-09-28 02:19

Link

This issue relates to ~~MDEV-561~~ [ ~~MDEV-561~~ ]

Elena Stepanova made changes - 2012-09-28 23:56

Link

This issue relates to ~~MDEV-565~~ [ ~~MDEV-565~~ ]

Jonathan Ellis added a comment - 2012-09-30 23:35

Saw that you announced a preview release. Congrats!

Wanted to make sure you guys were aware of a couple changes we're making:

Cassandra is moving away from "dynamic columns" per se. Although supporting that is nice for legacy purposes, in CQL3 (opt-in in C* 1.1, default in C* 1.2, although fallback to CQL2 is still available) columns must be defined before use.
Cassandra is adding collections (maps, lists, and sets) to CQL3 in 1.2. (This is available in our recent beta1 release.) Not sure how you'd want to expose that, tbh... I don't think dynamic columns is necessarily a good fit for Maps for instance since you can have key conflicts.

Jonathan Ellis added a comment - 2012-09-30 23:35 Saw that you announced a preview release. Congrats! Wanted to make sure you guys were aware of a couple changes we're making: Cassandra is moving away from "dynamic columns" per se. Although supporting that is nice for legacy purposes, in CQL3 (opt-in in C* 1.1, default in C* 1.2, although fallback to CQL2 is still available) columns must be defined before use. Cassandra is adding collections (maps, lists, and sets) to CQL3 in 1.2. (This is available in our recent beta1 release.) Not sure how you'd want to expose that, tbh... I don't think dynamic columns is necessarily a good fit for Maps for instance since you can have key conflicts.

Oleksandr Byelkin added a comment - 2012-10-01 07:43

JFYI Our dynamic columns collect not only Cassandra dynamic columns but all not mentioned in MariaDB descriptions columns.

Oleksandr Byelkin added a comment - 2012-10-01 07:43 JFYI Our dynamic columns collect not only Cassandra dynamic columns but all not mentioned in MariaDB descriptions columns.

Sergei Golubchik made changes - 2012-10-04 15:24

Link

This issue relates to ~~MDEV-3792~~ [ ~~MDEV-3792~~ ]

Sergei Golubchik made changes - 2012-12-11 20:01

Link

This issue includes ~~MDEV-377~~ [ ~~MDEV-377~~ ]

Sergei Golubchik made changes - 2012-12-11 22:21

Link

This issue includes ~~MDEV-3931~~ [ ~~MDEV-3931~~ ]

Elena Stepanova made changes - 2013-01-04 16:38

Link

This issue relates to ~~MDEV-3996~~ [ ~~MDEV-3996~~ ]

Elena Stepanova made changes - 2013-01-04 17:20

Link

This issue relates to ~~MDEV-3997~~ [ ~~MDEV-3997~~ ]

Elena Stepanova made changes - 2013-01-04 17:44

Link

This issue relates to ~~MDEV-3998~~ [ ~~MDEV-3998~~ ]

Elena Stepanova made changes - 2013-01-04 22:10

Link

This issue relates to ~~MDEV-4000~~ [ ~~MDEV-4000~~ ]

Elena Stepanova made changes - 2013-01-05 01:25

Link

This issue relates to ~~MDEV-4001~~ [ ~~MDEV-4001~~ ]

Elena Stepanova made changes - 2013-01-06 00:51

Link

This issue relates to ~~MDEV-4003~~ [ ~~MDEV-4003~~ ]

Elena Stepanova made changes - 2013-01-06 17:41

Link

This issue relates to ~~MDEV-4005~~ [ ~~MDEV-4005~~ ]

Elena Stepanova made changes - 2013-01-09 11:56

Link

This issue relates to ~~MDEV-4014~~ [ ~~MDEV-4014~~ ]

Sergei Golubchik made changes - 2013-01-11 17:51

Link

This issue is blocked by ~~MDEV-4012~~ [ ~~MDEV-4012~~ ]

Sergei Golubchik added a comment - 2013-01-25 13:15

pushed in 10.0-base

Sergei Golubchik added a comment - 2013-01-25 13:15 pushed in 10.0-base

Sergei Golubchik made changes - 2013-01-25 13:15

Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2014-06-13 15:06

Workflow

defaullt [ 13668 ]

MariaDB v2 [ 44224 ]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 44224 ]

MariaDB v3 [ 63513 ]

Marko Mäkelä made changes - 2020-07-14 13:06

Link

This issue relates to ~~MDEV-23024~~ [ ~~MDEV-23024~~ ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 63513 ]

MariaDB v4 [ 131958 ]

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2012-08-04 20:39

Updated:: 2020-07-14 13:06

Resolved:: 2013-01-25 13:15

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration