[MDEV-6784] OQGRAPH: External Connection Crashes MariaDB with "InnoDB: Assertion failure" Created: 2014-09-25  Updated: 2015-05-03  Due: 2014-11-05  Resolved: 2015-05-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - OQGRAPH
Affects Version/s: 10.0.10, 10.0.11, 10.0.12, 10.0.13
Fix Version/s: 10.0.18

Type: Bug Priority: Major
Reporter: Brian LaVallee Assignee: Andrew McDonnell
Resolution: Fixed Votes: 0
Labels: oqgraph
Environment:

CentOS 6.5: 2.6.32-431.29.2.el6.i686: CentOS-6.5-i386-minimal.iso: VMware ESXi = 1vCPU/8192MB


Attachments: File oqgraph.base_config.mysql     File sp.current_path.mysql     Text File syslog.messages.txt    
Issue Links:
Relates
relates to MDEV-6282 OQGraph crashes under load Closed
relates to MDEV-6345 mariadb with oq_graph server crash wh... Closed

 Description   

I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.

Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.

OK = localhost> SELECT * FROM oss.path;
OK = remote_host> SELECT * FROM oss.path;
NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'

Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.

OK = localhost> CALL oss.current_path(3);
OK = remote_host> CALL oss.current_path(3);
NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'

This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.

Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.



 Comments   
Comment by Elena Stepanova [ 2014-09-26 ]

Hi Brian,

Could you please clarify some points regarding the problem description?

I assume lines starting with "OK" mean that the query went well, and "NG" – that the query failed?
In the summary, you say that external connection causes a crash – do you mean external as one coming from a remote host, or external as one executed in the batch mode (mysql -e ..)?
In the latter case, what is the significance of "remote host" vs "localhost"? It looks like both local and remote queries fail for you, do remote/local queries need to be executed in turn to trigger the assertion failure?

There is an open bug MDEV-6345 which looks similar, except that it required 2 connections to cause the failure, while in your case there seems to be only one; so it would be good to understand what exactly it takes to reproduce the problem you described.

Comment by Brian LaVallee [ 2014-09-29 ]

Hi Elena,

OK = Okay
NG = Not Good (Subsequent query caused an assertion failure)

Anyway... Yes the issue looks similar to MDEV-6345, but it's not about multiple connections. What I've seen is, subsequent queries fail outside of the MariaDB CLI. In the troubleshooting process, "remote host" vs "localhost" was simply an attempt to isolate the issue.

I originally noticed the issue from a remote PHP connection, but narrowed it down to subsequent queries outside of the CLI. Even batch mode on the localhost, so "external connection" might be the wrong phrase. Especially since a CLI connection from a "remote host" DOES NOT encounter the issue. "non-CLI" is probably the most accurate way to describe what I mean by "external connections".

The steps to reproduce are below. I've actually followed these steps multiple times (10.0.11/10.0.12/10.0.13) with the same results. Even found the same issue with Fedora20 (similar steps).

Install CentOS 6.5 (or CentOS 6.3)
disable selinux
disable iptables
rpm -Uvh epel-release-6-8.noarch.rpm
add yum.mariadb.repo for 10.0
yum install OQGRAPH
yum update
DEFAULT my.cnf // No Changes 
mysql_secure_installation

UPDATE root user Host='%'; FLUSH PRIVILEGES;  // For external box
INSTALL SONAME 'ha_oqgraph';
CREATE TABLE(s) ...
INSERT some_data
mysql -e 'SELECT * FROM oq_graph_table;'  // Results OK
mysql -e 'SELECT * FROM oq_graph_table;'  // No Results Returned
mysql -e 'SELECT * FROM oq_graph_table;'  // No Results Returned

– When using a simple 'stored procedures' the results were slightly different –

oq_graph() == "SELECT * FROM oq_graph_table;"
 
mysql -e 'CALL oq_graph();'  // Results OK
mysql -e 'CALL oq_graph();'  // No Results Returned
mysql -e 'CALL oq_graph();'  // Results OK
mysql -e 'CALL oq_graph();'  // No Results Returned

– Control / Regular InnoDB tables do not have any issues –

mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK

~~~~~
Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.
~~~~~

Personally, I just discovered oqgraph and testing with the CLI looks like it will meet my needs. While I am moving to MariaDB 5.5 (oqgraph v2), I do look forward to using v3. So let me know if you need any additional information, I can even provide some server resources if you need them.

Sincerely,
Brian LaVallee

Comment by Elena Stepanova [ 2014-10-22 ]

I installed CentOS 6.5 2.6.32-431.29.2.el6.i686, MariaDB 10.0.13, and went through the steps as described above (except for the minimal image which I didn't find for 6.5), but still didn't get the failure.

My assumptions about pseudo-code above were as follows:

  • yum install OQGRAPH means yum install MariaDB-oqgraph-engine (which, in turn, pulls MariaDB server/client as dependencies;
  • mysql_secure_installation – exact settings shouldn't be important, but for what it's worth, I set a root password, allowed remote root connections, disabled anonymous users, allowed 'test' database, flushed privileges;
  • UPDATE root user Host='%' means UPDATE mysql.user SET Host = '%' WHERE user='root' LIMIT 1 (or any other way to update only one 'root' record)
  • CREATE TABLE(s) ... means create tables as given in the attached file oqgraph.base_config.mysql
  • INSERT some_data means run INSERTs from sp.current_path.mysql
  • mysql -e everywhere means mysql -uroot, connect via socket
  • SELECT * FROM oq_graph_table means SELECT * FROM oss.path
  • No Results Returned means an assertion failure? or wrong result? In any case, I was getting identical results from all mysql -e "SELECT ..." attempts.

[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Enter password: 
+-------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq  | linkid |
+-------+--------+--------+--------+------+--------+
| NULL  |      0 |      1 |      0 | NULL |   NULL |
| NULL  |      1 |      2 |      0 | NULL |   NULL |
| NULL  |      2 |      3 |      0 | NULL |   NULL |
+-------+--------+--------+--------+------+--------+
[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Enter password: 
+-------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq  | linkid |
+-------+--------+--------+--------+------+--------+
| NULL  |      0 |      1 |      0 | NULL |   NULL |
| NULL  |      1 |      2 |      0 | NULL |   NULL |
| NULL  |      2 |      3 |      0 | NULL |   NULL |
+-------+--------+--------+--------+------+--------+
[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Enter password: 
+-------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq  | linkid |
+-------+--------+--------+--------+------+--------+
| NULL  |      0 |      1 |      0 | NULL |   NULL |
| NULL  |      1 |      2 |      0 | NULL |   NULL |
| NULL  |      2 |      3 |      0 | NULL |   NULL |
+-------+--------+--------+--------+------+--------+

Please confirm that the assumptions above are correct, then I'll pass it over to our InnoDB expert to evaluate your theory about shim.

Comment by Brian LaVallee [ 2014-10-23 ]

Hi Elena,

Yes, your assumptions about my psudo-code are correct. I followed those steps many (many many) times.

I am actually disheartened to see your results did not encounter the issue.
I could always reproduce the issue. Strange to see it working.

One test that you did not try, was from an external host.
But I have a feeling it will work for you without encountering the issue I had.

Sincerely,
Brian LaVallee

Comment by Elena Stepanova [ 2014-10-23 ]

b.lavallee@globaltank.jp,
Do you happen to know if it's still reproducible on 10.0.14?

jplindst, andymc73,
It seems we need your expert help in here.

andymc73, maybe you've seen anything similar on your side?

jplindst, see Brian's guess about shim – I'm clueless about it, do you think whether it might be true, and if so, what should I do to exploit the alleged problem?
Re-quoting for your convenience:

Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.

Comment by Jan Lindström (Inactive) [ 2014-10-23 ]

Can you repeat the problem using debug build and using gdb/ddd ? If yes, could you provide following output when assertion is hit:

thread apply all bt

R: Jan

Comment by Andrew McDonnell [ 2014-10-23 ]

I suspect that this is the same problem discussed at length in MDEV-6282.
I also suspect that MDEV-6345 is caused by the same underlying problem.

Essentially, I managed to trace the problem to a flaw in the way oqgraph handles connections from different 'contexts' (for want of a better word), but at the time I was unable to see an easy fix. My best effort description of the problem is this comment at the end of MDEV-6282.
Unfortunately I have never yet had enough time to dig into the internals of mariadb and really understand what is going on on the other side of the engine interface.

It will involve looking at how other engines that use a shim-like mechanism resolve the same problem and working the fix into oqgraph.

I will send the link to this bug Arjen and Antony (the original designer) in case they have some ideas... at the time Antony suggested that something called 'toku' might be worth looking at, but at this stage I never got that far

Unfortunately I have had other committments that have prevented me from looking at this the last few months. I am hoping to get back and spend some time on this across November, with some luck. I suspect I will need some help though; With work and conference commitments I dont think I will be able to find enough of the time I believe it would take me to learn enough about the internals of things and fix things properly, if nothing obvious rears its head from the other engines, before Christmas

Comment by Elena Stepanova [ 2014-10-24 ]

andymc73,

Since you have more ideas about the issue than anyone of us, is it okay with you if I assign it to you for now, so that it doesn't get forgotten and lost?

Comment by Andrew McDonnell [ 2014-10-24 ]

ok

Comment by Andrew McDonnell [ 2014-10-30 ]

Can you please try testing this building MariaDB using the branch:

https://github.com/pastcompute/server/tree/oqgraph_mdev_6282_6345_5748

git clone https://github.com/pastcompute/server.git
git checkout oqgraph_mdev_6282_6345_5748
... build and test as normal ...

Comment by Andrew McDonnell [ 2015-02-27 ]

I just rebased my work on latest 10.1 and intending to submit pull request after re-testing

Comment by Andrew McDonnell [ 2015-02-27 ]

10.1 version pull request: https://github.com/MariaDB/server/pull/16

I'll be trying to get 10.0 sorted soon

Comment by Andrew McDonnell [ 2015-02-28 ]

https://github.com/MariaDB/server/pull/17

Generated at Thu Feb 08 07:14:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.