I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.
Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.
OK = localhost> SELECT * FROM oss.path;
OK = remote_host> SELECT * FROM oss.path;
NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'
Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.
OK = localhost> CALL oss.current_path(3);
OK = remote_host> CALL oss.current_path(3);
NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'
–
This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.
–
Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.
Could you please clarify some points regarding the problem description?
I assume lines starting with "OK" mean that the query went well, and "NG" – that the query failed?
In the summary, you say that external connection causes a crash – do you mean external as one coming from a remote host, or external as one executed in the batch mode (mysql -e ..)?
In the latter case, what is the significance of "remote host" vs "localhost"? It looks like both local and remote queries fail for you, do remote/local queries need to be executed in turn to trigger the assertion failure?
There is an open bug MDEV-6345 which looks similar, except that it required 2 connections to cause the failure, while in your case there seems to be only one; so it would be good to understand what exactly it takes to reproduce the problem you described.
Elena Stepanova
added a comment - Hi Brian,
Could you please clarify some points regarding the problem description?
I assume lines starting with "OK" mean that the query went well, and "NG" – that the query failed?
In the summary, you say that external connection causes a crash – do you mean external as one coming from a remote host, or external as one executed in the batch mode (mysql -e ..)?
In the latter case, what is the significance of "remote host" vs "localhost"? It looks like both local and remote queries fail for you, do remote/local queries need to be executed in turn to trigger the assertion failure?
There is an open bug MDEV-6345 which looks similar, except that it required 2 connections to cause the failure, while in your case there seems to be only one; so it would be good to understand what exactly it takes to reproduce the problem you described.
OK = Okay
NG = Not Good (Subsequent query caused an assertion failure)
Anyway... Yes the issue looks similar to MDEV-6345, but it's not about multiple connections. What I've seen is, subsequent queries fail outside of the MariaDB CLI. In the troubleshooting process, "remote host" vs "localhost" was simply an attempt to isolate the issue.
I originally noticed the issue from a remote PHP connection, but narrowed it down to subsequent queries outside of the CLI. Even batch mode on the localhost, so "external connection" might be the wrong phrase. Especially since a CLI connection from a "remote host" DOES NOT encounter the issue. "non-CLI" is probably the most accurate way to describe what I mean by "external connections".
The steps to reproduce are below. I've actually followed these steps multiple times (10.0.11/10.0.12/10.0.13) with the same results. Even found the same issue with Fedora20 (similar steps).
Install CentOS 6.5 (or CentOS 6.3)
disable selinux
disable iptables
rpm -Uvh epel-release-6-8.noarch.rpm
add yum.mariadb.repo for 10.0
yum install OQGRAPH
yum update
DEFAULT my.cnf // No Changes
mysql_secure_installation
UPDATE root user Host='%'; FLUSH PRIVILEGES; // For external box
INSTALL SONAME 'ha_oqgraph';
CREATETABLE(s) ...
INSERT some_data
mysql -e 'SELECT * FROM oq_graph_table;' // Results OK
mysql -e 'SELECT * FROM oq_graph_table;' // No Results Returned
mysql -e 'SELECT * FROM oq_graph_table;' // No Results Returned
– When using a simple 'stored procedures' the results were slightly different –
oq_graph() == "SELECT * FROM oq_graph_table;"
mysql -e 'CALL oq_graph();' // Results OK
mysql -e 'CALL oq_graph();' // No Results Returned
mysql -e 'CALL oq_graph();' // Results OK
mysql -e 'CALL oq_graph();' // No Results Returned
– Control / Regular InnoDB tables do not have any issues –
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
~~~~~
Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.
~~~~~
Personally, I just discovered oqgraph and testing with the CLI looks like it will meet my needs. While I am moving to MariaDB 5.5 (oqgraph v2), I do look forward to using v3. So let me know if you need any additional information, I can even provide some server resources if you need them.
Sincerely,
Brian LaVallee
Brian LaVallee
added a comment - - edited Hi Elena,
OK = Okay
NG = Not Good (Subsequent query caused an assertion failure)
Anyway... Yes the issue looks similar to MDEV-6345 , but it's not about multiple connections. What I've seen is, subsequent queries fail outside of the MariaDB CLI. In the troubleshooting process, "remote host" vs "localhost" was simply an attempt to isolate the issue.
I originally noticed the issue from a remote PHP connection, but narrowed it down to subsequent queries outside of the CLI. Even batch mode on the localhost, so "external connection" might be the wrong phrase. Especially since a CLI connection from a "remote host" DOES NOT encounter the issue. "non-CLI" is probably the most accurate way to describe what I mean by "external connections".
The steps to reproduce are below. I've actually followed these steps multiple times (10.0.11/10.0.12/10.0.13) with the same results. Even found the same issue with Fedora20 (similar steps).
Install CentOS 6.5 (or CentOS 6.3)
disable selinux
disable iptables
rpm -Uvh epel-release-6-8.noarch.rpm
add yum.mariadb.repo for 10.0
yum install OQGRAPH
yum update
DEFAULT my.cnf // No Changes
mysql_secure_installation
UPDATE root user Host= '%' ; FLUSH PRIVILEGES ; // For external box
INSTALL SONAME 'ha_oqgraph' ;
CREATE TABLE (s) ...
INSERT some_data
mysql -e 'SELECT * FROM oq_graph_table;' // Results OK
mysql -e 'SELECT * FROM oq_graph_table;' // No Results Returned
mysql -e 'SELECT * FROM oq_graph_table;' // No Results Returned
– When using a simple 'stored procedures' the results were slightly different –
oq_graph() == "SELECT * FROM oq_graph_table;"
mysql -e 'CALL oq_graph();' // Results OK
mysql -e 'CALL oq_graph();' // No Results Returned
mysql -e 'CALL oq_graph();' // Results OK
mysql -e 'CALL oq_graph();' // No Results Returned
– Control / Regular InnoDB tables do not have any issues –
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK
~~~~~
Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.
~~~~~
Personally, I just discovered oqgraph and testing with the CLI looks like it will meet my needs. While I am moving to MariaDB 5.5 (oqgraph v2), I do look forward to using v3. So let me know if you need any additional information, I can even provide some server resources if you need them.
Sincerely,
Brian LaVallee
I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.
Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.
OK = localhost> SELECT * FROM oss.path;
OK = remote_host> SELECT * FROM oss.path;
NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'
Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.
OK = localhost> CALL oss.current_path(3);
OK = remote_host> CALL oss.current_path(3);
NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'
--
This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.
--
Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.
I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.
Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.
{code:sql}
OK = localhost> SELECT * FROM oss.path;
OK = remote_host> SELECT * FROM oss.path;
NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'
{code}
Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.
{code:sql}
OK = localhost> CALL oss.current_path(3);
OK = remote_host> CALL oss.current_path(3);
NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'
{code}
--
This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.
--
Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.
I installed CentOS 6.5 2.6.32-431.29.2.el6.i686, MariaDB 10.0.13, and went through the steps as described above (except for the minimal image which I didn't find for 6.5), but still didn't get the failure.
My assumptions about pseudo-code above were as follows:
yum install OQGRAPH means yum install MariaDB-oqgraph-engine (which, in turn, pulls MariaDB server/client as dependencies;
mysql_secure_installation – exact settings shouldn't be important, but for what it's worth, I set a root password, allowed remote root connections, disabled anonymous users, allowed 'test' database, flushed privileges;
UPDATE root user Host='%' means UPDATE mysql.user SET Host = '%' WHERE user='root' LIMIT 1 (or any other way to update only one 'root' record)
CREATE TABLE(s) ... means create tables as given in the attached file oqgraph.base_config.mysql
INSERT some_data means run INSERTs from sp.current_path.mysql
mysql -e everywhere means mysql -uroot, connect via socket
SELECT * FROM oq_graph_table means SELECT * FROM oss.path
No Results Returned means an assertion failure? or wrong result? In any case, I was getting identical results from all mysql -e "SELECT ..." attempts.
[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Please confirm that the assumptions above are correct, then I'll pass it over to our InnoDB expert to evaluate your theory about shim.
Elena Stepanova
added a comment - - edited I installed CentOS 6.5 2.6.32-431.29.2.el6.i686, MariaDB 10.0.13, and went through the steps as described above (except for the minimal image which I didn't find for 6.5), but still didn't get the failure.
My assumptions about pseudo-code above were as follows:
yum install OQGRAPH means yum install MariaDB-oqgraph-engine (which, in turn, pulls MariaDB server/client as dependencies;
mysql_secure_installation – exact settings shouldn't be important, but for what it's worth, I set a root password, allowed remote root connections, disabled anonymous users, allowed 'test' database, flushed privileges;
UPDATE root user Host='%' means UPDATE mysql.user SET Host = '%' WHERE user='root' LIMIT 1 (or any other way to update only one 'root' record)
CREATE TABLE(s) ... means create tables as given in the attached file oqgraph.base_config.mysql
INSERT some_data means run INSERTs from sp.current_path.mysql
mysql -e everywhere means mysql -uroot, connect via socket
SELECT * FROM oq_graph_table means SELECT * FROM oss.path
No Results Returned means an assertion failure? or wrong result? In any case, I was getting identical results from all mysql -e "SELECT ..." attempts.
[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Enter password :
+ -------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq | linkid |
+ -------+--------+--------+--------+------+--------+
| NULL | 0 | 1 | 0 | NULL | NULL |
| NULL | 1 | 2 | 0 | NULL | NULL |
| NULL | 2 | 3 | 0 | NULL | NULL |
+ -------+--------+--------+--------+------+--------+
[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Enter password :
+ -------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq | linkid |
+ -------+--------+--------+--------+------+--------+
| NULL | 0 | 1 | 0 | NULL | NULL |
| NULL | 1 | 2 | 0 | NULL | NULL |
| NULL | 2 | 3 | 0 | NULL | NULL |
+ -------+--------+--------+--------+------+--------+
[elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
Enter password :
+ -------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq | linkid |
+ -------+--------+--------+--------+------+--------+
| NULL | 0 | 1 | 0 | NULL | NULL |
| NULL | 1 | 2 | 0 | NULL | NULL |
| NULL | 2 | 3 | 0 | NULL | NULL |
+ -------+--------+--------+--------+------+--------+
Please confirm that the assumptions above are correct, then I'll pass it over to our InnoDB expert to evaluate your theory about shim.
Yes, your assumptions about my psudo-code are correct. I followed those steps many (many many) times.
I am actually disheartened to see your results did not encounter the issue.
I could always reproduce the issue. Strange to see it working.
One test that you did not try, was from an external host.
But I have a feeling it will work for you without encountering the issue I had.
Sincerely,
Brian LaVallee
Brian LaVallee
added a comment - Hi Elena,
Yes, your assumptions about my psudo-code are correct. I followed those steps many (many many) times.
I am actually disheartened to see your results did not encounter the issue.
I could always reproduce the issue. Strange to see it working.
One test that you did not try, was from an external host.
But I have a feeling it will work for you without encountering the issue I had.
Sincerely,
Brian LaVallee
andymc73, maybe you've seen anything similar on your side?
jplindst, see Brian's guess about shim – I'm clueless about it, do you think whether it might be true, and if so, what should I do to exploit the alleged problem?
Re-quoting for your convenience:
Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.
Elena Stepanova
added a comment - b.lavallee@globaltank.jp ,
Do you happen to know if it's still reproducible on 10.0.14?
jplindst , andymc73 ,
It seems we need your expert help in here.
andymc73 , maybe you've seen anything similar on your side?
jplindst , see Brian's guess about shim – I'm clueless about it, do you think whether it might be true, and if so, what should I do to exploit the alleged problem?
Re-quoting for your convenience:
Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.
Can you repeat the problem using debug build and using gdb/ddd ? If yes, could you provide following output when assertion is hit:
thread apply all bt
R: Jan
Jan Lindström (Inactive)
added a comment - Can you repeat the problem using debug build and using gdb/ddd ? If yes, could you provide following output when assertion is hit:
thread apply all bt
R: Jan
I suspect that this is the same problem discussed at length in MDEV-6282.
I also suspect that MDEV-6345 is caused by the same underlying problem.
Essentially, I managed to trace the problem to a flaw in the way oqgraph handles connections from different 'contexts' (for want of a better word), but at the time I was unable to see an easy fix. My best effort description of the problem is this comment at the end of MDEV-6282.
Unfortunately I have never yet had enough time to dig into the internals of mariadb and really understand what is going on on the other side of the engine interface.
It will involve looking at how other engines that use a shim-like mechanism resolve the same problem and working the fix into oqgraph.
I will send the link to this bug Arjen and Antony (the original designer) in case they have some ideas... at the time Antony suggested that something called 'toku' might be worth looking at, but at this stage I never got that far
Unfortunately I have had other committments that have prevented me from looking at this the last few months. I am hoping to get back and spend some time on this across November, with some luck. I suspect I will need some help though; With work and conference commitments I dont think I will be able to find enough of the time I believe it would take me to learn enough about the internals of things and fix things properly, if nothing obvious rears its head from the other engines, before Christmas
Andrew McDonnell
added a comment - I suspect that this is the same problem discussed at length in MDEV-6282 .
I also suspect that MDEV-6345 is caused by the same underlying problem.
Essentially, I managed to trace the problem to a flaw in the way oqgraph handles connections from different 'contexts' (for want of a better word), but at the time I was unable to see an easy fix. My best effort description of the problem is this comment at the end of MDEV-6282 .
Unfortunately I have never yet had enough time to dig into the internals of mariadb and really understand what is going on on the other side of the engine interface.
It will involve looking at how other engines that use a shim-like mechanism resolve the same problem and working the fix into oqgraph.
I will send the link to this bug Arjen and Antony (the original designer) in case they have some ideas... at the time Antony suggested that something called 'toku' might be worth looking at , but at this stage I never got that far
Unfortunately I have had other committments that have prevented me from looking at this the last few months. I am hoping to get back and spend some time on this across November, with some luck. I suspect I will need some help though; With work and conference commitments I dont think I will be able to find enough of the time I believe it would take me to learn enough about the internals of things and fix things properly, if nothing obvious rears its head from the other engines, before Christmas
Since you have more ideas about the issue than anyone of us, is it okay with you if I assign it to you for now, so that it doesn't get forgotten and lost?
Elena Stepanova
added a comment - andymc73 ,
Since you have more ideas about the issue than anyone of us, is it okay with you if I assign it to you for now, so that it doesn't get forgotten and lost?
Andrew McDonnell
added a comment - Can you please try testing this building MariaDB using the branch:
https://github.com/pastcompute/server/tree/oqgraph_mdev_6282_6345_5748
git clone https://github.com/pastcompute/server.git
git checkout oqgraph_mdev_6282_6345_5748
... build and test as normal ...
Hi Brian,
Could you please clarify some points regarding the problem description?
I assume lines starting with "OK" mean that the query went well, and "NG" – that the query failed?
In the summary, you say that external connection causes a crash – do you mean external as one coming from a remote host, or external as one executed in the batch mode (mysql -e ..)?
In the latter case, what is the significance of "remote host" vs "localhost"? It looks like both local and remote queries fail for you, do remote/local queries need to be executed in turn to trigger the assertion failure?
There is an open bug
MDEV-6345which looks similar, except that it required 2 connections to cause the failure, while in your case there seems to be only one; so it would be good to understand what exactly it takes to reproduce the problem you described.