[MDEV-431] Cassandra storage engine Created: 2012-08-04 Updated: 2020-07-14 Resolved: 2013-01-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | 10.0.1 |
| Type: | Task | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Implement HBase storage for Cassandra instead. See http://kb.askmonty.org/en/cassandra-storage-engine/ for user-level description |
| Comments |
| Comment by Sergei Petrunia [ 2012-08-15 ] |
|
Ok, column validators do not seem to be mandatory. cassandra-cli allows to insert any (rowid, column_name, column_value) regardless of what validators are present. |
| Comment by Sergei Petrunia [ 2012-08-17 ] |
|
Mapping between CQL type names and ColumnDef::validation_class values: blob, "org.apache.cassandra.db.marshal.BytesType" |
| Comment by Sergei Petrunia [ 2012-08-17 ] |
|
..., |
| Comment by Jonathan Ellis [ 2012-08-20 ] |
|
Cassandra schema definition has evolved a bit. The old way to define a "wide row" CF (i.e., representing a partition of data, clustered on the comparator) was to define comparator and default_validation_class, and leave column names implicit in your application code. That is, the Cassandra "column name" would really be the value of an unnamed column that would be part of the primary key in the partition. Since columnfamily definition defaulted to no column names, this was the default behavior, but mixing this with "static" column definitions is very bad practice. (But Cassandra will not ignore validators that are correctly declared, you are mistaken on that point.) We cleaned this up for Cassandra 1.1 with CQL3, as outlined here: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 Old-style schema will be supported indefinitely for backwards compatibility, but cql schema is far more straightforward to use correctly. |
| Comment by Sergei Petrunia [ 2012-08-22 ] |
|
yes, I've repeated my experiment and see that indeed the defined validators are enforced. Thanks a lot for pointing out what the CQL's composite PRIMARY KEYs are! I was overwhelmed by all the new things I was learning about Cassandra, and dismissed composite PKs as "ok, these are probably just like non-composite ones, except that they are tuples". Apparently, I was wrong, they play a more important role. I'll need to think more before I understand what this means for this project, though. At least, we shouldn't ignore them. |
| Comment by Sergei Petrunia [ 2012-08-22 ] |
|
Testing todo task |
| Comment by Sergei Petrunia [ 2012-09-12 ] |
Possible directions for speedup:
|
| Comment by Sergei Petrunia [ 2012-09-12 ] |
|
Tried profiling ha_cassandra on home setup and on EC2. Results from EC2 (% numbers are cumulative-time)
This means: at least 54% of time is spent in ha_cassandra::write_row(). Some of other time should probably be blamed on ha_cassandra also, because no other part of the server uses std::string. |
| Comment by Sergei Petrunia [ 2012-09-14 ] |
|
Did some more benchmarks, results summarized here: https://lists.launchpad.net/maria-developers/msg04889.html. It seems, CPU usage of SQL node is not actually a problem - get a release build + better CPU. Lack of ability to use multiple connections IS a problem. |
| Comment by Sergei Petrunia [ 2012-09-18 ] |
|
Started to think about how we could use multiple Thrift API connections. Thrift library doesn't support asynchronous clients. We could use thread-per-connection model, but think of all the effort (both development and run-time) we'll need to sync the threads. There is a patch for Thrift that allows to use async client: https://issues.apache.org/jira/browse/THRIFT-579. It's been maintained across a few years, which probably means it's not just something that "barely compiles". I'm going to try using it. |
| Comment by Sergei Petrunia [ 2012-09-18 ] |
|
It compiles and works (btw figured how get Cassandra.thrift output to compile with fewer edits in generated code). Problems:
|
| Comment by Jonathan Ellis [ 2012-09-18 ] |
|
You might want to use the Cassandra native protocol introduced in 1.2 (trunk): https://github.com/apache/cassandra/blob/trunk/doc/native_protocol.spec |
| Comment by Sergei Petrunia [ 2012-09-19 ] |
|
Thanks for the note. I guess the protocol has been pushed fairly recently? IIRC when I checked for it, there was only a Jira entry. |
| Comment by Sergei Petrunia [ 2012-09-25 ] |
|
Added support for 'varint' type. |
| Comment by Jonathan Ellis [ 2012-09-30 ] |
|
Saw that you announced a preview release. Congrats! Wanted to make sure you guys were aware of a couple changes we're making:
|
| Comment by Oleksandr Byelkin [ 2012-10-01 ] |
|
JFYI Our dynamic columns collect not only Cassandra dynamic columns but all not mentioned in MariaDB descriptions columns. |
| Comment by Sergei Golubchik [ 2013-01-25 ] |
|
pushed in 10.0-base |