Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6784

OQGRAPH: External Connection Crashes MariaDB with "InnoDB: Assertion failure"

Details

    Description

      I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.

      Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.

      OK = localhost> SELECT * FROM oss.path;
      OK = remote_host> SELECT * FROM oss.path;
      NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
      NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'

      Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.

      OK = localhost> CALL oss.current_path(3);
      OK = remote_host> CALL oss.current_path(3);
      NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
      NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'

      This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.

      Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.

      Attachments

        Issue Links

          Activity

            b.lavallee@globaltank.jp Brian LaVallee created issue -

            Hi Brian,

            Could you please clarify some points regarding the problem description?

            I assume lines starting with "OK" mean that the query went well, and "NG" – that the query failed?
            In the summary, you say that external connection causes a crash – do you mean external as one coming from a remote host, or external as one executed in the batch mode (mysql -e ..)?
            In the latter case, what is the significance of "remote host" vs "localhost"? It looks like both local and remote queries fail for you, do remote/local queries need to be executed in turn to trigger the assertion failure?

            There is an open bug MDEV-6345 which looks similar, except that it required 2 connections to cause the failure, while in your case there seems to be only one; so it would be good to understand what exactly it takes to reproduce the problem you described.

            elenst Elena Stepanova added a comment - Hi Brian, Could you please clarify some points regarding the problem description? I assume lines starting with "OK" mean that the query went well, and "NG" – that the query failed? In the summary, you say that external connection causes a crash – do you mean external as one coming from a remote host, or external as one executed in the batch mode (mysql -e ..)? In the latter case, what is the significance of "remote host" vs "localhost"? It looks like both local and remote queries fail for you, do remote/local queries need to be executed in turn to trigger the assertion failure? There is an open bug MDEV-6345 which looks similar, except that it required 2 connections to cause the failure, while in your case there seems to be only one; so it would be good to understand what exactly it takes to reproduce the problem you described.
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Due Date 2014-10-26
            b.lavallee@globaltank.jp Brian LaVallee added a comment - - edited

            Hi Elena,

            OK = Okay
            NG = Not Good (Subsequent query caused an assertion failure)

            Anyway... Yes the issue looks similar to MDEV-6345, but it's not about multiple connections. What I've seen is, subsequent queries fail outside of the MariaDB CLI. In the troubleshooting process, "remote host" vs "localhost" was simply an attempt to isolate the issue.

            I originally noticed the issue from a remote PHP connection, but narrowed it down to subsequent queries outside of the CLI. Even batch mode on the localhost, so "external connection" might be the wrong phrase. Especially since a CLI connection from a "remote host" DOES NOT encounter the issue. "non-CLI" is probably the most accurate way to describe what I mean by "external connections".

            The steps to reproduce are below. I've actually followed these steps multiple times (10.0.11/10.0.12/10.0.13) with the same results. Even found the same issue with Fedora20 (similar steps).

            Install CentOS 6.5 (or CentOS 6.3)
            disable selinux
            disable iptables
            rpm -Uvh epel-release-6-8.noarch.rpm
            add yum.mariadb.repo for 10.0
            yum install OQGRAPH
            yum update
            DEFAULT my.cnf // No Changes 
            mysql_secure_installation

            UPDATE root user Host='%'; FLUSH PRIVILEGES;  // For external box
            INSTALL SONAME 'ha_oqgraph';
            CREATE TABLE(s) ...
            INSERT some_data
            mysql -e 'SELECT * FROM oq_graph_table;'  // Results OK
            mysql -e 'SELECT * FROM oq_graph_table;'  // No Results Returned
            mysql -e 'SELECT * FROM oq_graph_table;'  // No Results Returned

            – When using a simple 'stored procedures' the results were slightly different –

            oq_graph() == "SELECT * FROM oq_graph_table;"
             
            mysql -e 'CALL oq_graph();'  // Results OK
            mysql -e 'CALL oq_graph();'  // No Results Returned
            mysql -e 'CALL oq_graph();'  // Results OK
            mysql -e 'CALL oq_graph();'  // No Results Returned

            – Control / Regular InnoDB tables do not have any issues –

            mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK
            mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK
            mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK
            mysql -e 'SELECT * FROM oq2_backing_table;'  // Results OK

            ~~~~~
            Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.
            ~~~~~

            Personally, I just discovered oqgraph and testing with the CLI looks like it will meet my needs. While I am moving to MariaDB 5.5 (oqgraph v2), I do look forward to using v3. So let me know if you need any additional information, I can even provide some server resources if you need them.

            Sincerely,
            Brian LaVallee

            b.lavallee@globaltank.jp Brian LaVallee added a comment - - edited Hi Elena, OK = Okay NG = Not Good (Subsequent query caused an assertion failure) Anyway... Yes the issue looks similar to MDEV-6345 , but it's not about multiple connections. What I've seen is, subsequent queries fail outside of the MariaDB CLI. In the troubleshooting process, "remote host" vs "localhost" was simply an attempt to isolate the issue. I originally noticed the issue from a remote PHP connection, but narrowed it down to subsequent queries outside of the CLI. Even batch mode on the localhost, so "external connection" might be the wrong phrase. Especially since a CLI connection from a "remote host" DOES NOT encounter the issue. "non-CLI" is probably the most accurate way to describe what I mean by "external connections". The steps to reproduce are below. I've actually followed these steps multiple times (10.0.11/10.0.12/10.0.13) with the same results. Even found the same issue with Fedora20 (similar steps). Install CentOS 6.5 (or CentOS 6.3) disable selinux disable iptables rpm -Uvh epel-release-6-8.noarch.rpm add yum.mariadb.repo for 10.0 yum install OQGRAPH yum update DEFAULT my.cnf // No Changes mysql_secure_installation UPDATE root user Host= '%' ; FLUSH PRIVILEGES ; // For external box INSTALL SONAME 'ha_oqgraph' ; CREATE TABLE (s) ... INSERT some_data mysql -e 'SELECT * FROM oq_graph_table;' // Results OK mysql -e 'SELECT * FROM oq_graph_table;' // No Results Returned mysql -e 'SELECT * FROM oq_graph_table;' // No Results Returned – When using a simple 'stored procedures' the results were slightly different – oq_graph() == "SELECT * FROM oq_graph_table;"   mysql -e 'CALL oq_graph();' // Results OK mysql -e 'CALL oq_graph();' // No Results Returned mysql -e 'CALL oq_graph();' // Results OK mysql -e 'CALL oq_graph();' // No Results Returned – Control / Regular InnoDB tables do not have any issues – mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK mysql -e 'SELECT * FROM oq2_backing_table;' // Results OK ~~~~~ Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state. ~~~~~ Personally, I just discovered oqgraph and testing with the CLI looks like it will meet my needs. While I am moving to MariaDB 5.5 (oqgraph v2), I do look forward to using v3. So let me know if you need any additional information, I can even provide some server resources if you need them. Sincerely, Brian LaVallee
            elenst Elena Stepanova made changes -
            Due Date 2014-10-26
            elenst Elena Stepanova made changes -
            Description I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.

            Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.

            OK = localhost> SELECT * FROM oss.path;
            OK = remote_host> SELECT * FROM oss.path;
            NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
            NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'

            Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.

            OK = localhost> CALL oss.current_path(3);
            OK = remote_host> CALL oss.current_path(3);
            NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
            NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'

            --

            This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.

            --

            Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.


            I've spent about a week trying to build a stable instance of OQGRAPH for use by a webserver.

            Finding that almost every-other external query results in InnoDB: Assertion failure. But no issues when using the CLI on localhost or a remote_host.

            {code:sql}
            OK = localhost> SELECT * FROM oss.path;
            OK = remote_host> SELECT * FROM oss.path;
            NG = [localhost ~]# mysql -e 'SELECT * FROM oss.path;'
            NG = [remote_host ~]# mysql -h oqgrah_host -e 'SELECT * FROM oss.path;'
            {code}

            Note: The first query normally works, and subsequent queries fail. I've been using a stored procedure that normally restarts MariaDB, so every-other-query works.

            {code:sql}
            OK = localhost> CALL oss.current_path(3);
            OK = remote_host> CALL oss.current_path(3);
            NG = [localhost ~]# mysql -e 'CALL oss.current_path(3);'
            NG = [remote_host ~]# mysql -h oqgrah_host -e 'CALL oss.current_path(3);'
            {code}
            --

            This is a clean build, minimal CentOS, EPEL, & MariaDB from the repo. I've tried multiple MariaDB (default config) builds, even reverted to CentOS 6.3 to see if that would make a difference. I originally suspected the problem might with PHP, but no difference between mysql ' mysqli. Then I found the issue was reproducible from bash.

            --

            Attached you will find the syslog message, stored procedure, plus the DB schema and quick setup I have been working with. If you have any questions, please let me know.


            elenst Elena Stepanova added a comment - - edited

            I installed CentOS 6.5 2.6.32-431.29.2.el6.i686, MariaDB 10.0.13, and went through the steps as described above (except for the minimal image which I didn't find for 6.5), but still didn't get the failure.

            My assumptions about pseudo-code above were as follows:

            • yum install OQGRAPH means yum install MariaDB-oqgraph-engine (which, in turn, pulls MariaDB server/client as dependencies;
            • mysql_secure_installation – exact settings shouldn't be important, but for what it's worth, I set a root password, allowed remote root connections, disabled anonymous users, allowed 'test' database, flushed privileges;
            • UPDATE root user Host='%' means UPDATE mysql.user SET Host = '%' WHERE user='root' LIMIT 1 (or any other way to update only one 'root' record)
            • CREATE TABLE(s) ... means create tables as given in the attached file oqgraph.base_config.mysql
            • INSERT some_data means run INSERTs from sp.current_path.mysql
            • mysql -e everywhere means mysql -uroot, connect via socket
            • SELECT * FROM oq_graph_table means SELECT * FROM oss.path
            • No Results Returned means an assertion failure? or wrong result? In any case, I was getting identical results from all mysql -e "SELECT ..." attempts.

            [elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
            Enter password: 
            +-------+--------+--------+--------+------+--------+
            | latch | origid | destid | weight | seq  | linkid |
            +-------+--------+--------+--------+------+--------+
            | NULL  |      0 |      1 |      0 | NULL |   NULL |
            | NULL  |      1 |      2 |      0 | NULL |   NULL |
            | NULL  |      2 |      3 |      0 | NULL |   NULL |
            +-------+--------+--------+--------+------+--------+
            [elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
            Enter password: 
            +-------+--------+--------+--------+------+--------+
            | latch | origid | destid | weight | seq  | linkid |
            +-------+--------+--------+--------+------+--------+
            | NULL  |      0 |      1 |      0 | NULL |   NULL |
            | NULL  |      1 |      2 |      0 | NULL |   NULL |
            | NULL  |      2 |      3 |      0 | NULL |   NULL |
            +-------+--------+--------+--------+------+--------+
            [elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path"
            Enter password: 
            +-------+--------+--------+--------+------+--------+
            | latch | origid | destid | weight | seq  | linkid |
            +-------+--------+--------+--------+------+--------+
            | NULL  |      0 |      1 |      0 | NULL |   NULL |
            | NULL  |      1 |      2 |      0 | NULL |   NULL |
            | NULL  |      2 |      3 |      0 | NULL |   NULL |
            +-------+--------+--------+--------+------+--------+

            Please confirm that the assumptions above are correct, then I'll pass it over to our InnoDB expert to evaluate your theory about shim.

            elenst Elena Stepanova added a comment - - edited I installed CentOS 6.5 2.6.32-431.29.2.el6.i686, MariaDB 10.0.13, and went through the steps as described above (except for the minimal image which I didn't find for 6.5), but still didn't get the failure. My assumptions about pseudo-code above were as follows: yum install OQGRAPH means yum install MariaDB-oqgraph-engine (which, in turn, pulls MariaDB server/client as dependencies; mysql_secure_installation – exact settings shouldn't be important, but for what it's worth, I set a root password, allowed remote root connections, disabled anonymous users, allowed 'test' database, flushed privileges; UPDATE root user Host='%' means UPDATE mysql.user SET Host = '%' WHERE user='root' LIMIT 1 (or any other way to update only one 'root' record) CREATE TABLE(s) ... means create tables as given in the attached file oqgraph.base_config.mysql INSERT some_data means run INSERTs from sp.current_path.mysql mysql -e everywhere means mysql -uroot, connect via socket SELECT * FROM oq_graph_table means SELECT * FROM oss.path No Results Returned means an assertion failure? or wrong result? In any case, I was getting identical results from all mysql -e "SELECT ..." attempts. [elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path" Enter password : + -------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | + -------+--------+--------+--------+------+--------+ | NULL | 0 | 1 | 0 | NULL | NULL | | NULL | 1 | 2 | 0 | NULL | NULL | | NULL | 2 | 3 | 0 | NULL | NULL | + -------+--------+--------+--------+------+--------+ [elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path" Enter password : + -------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | + -------+--------+--------+--------+------+--------+ | NULL | 0 | 1 | 0 | NULL | NULL | | NULL | 1 | 2 | 0 | NULL | NULL | | NULL | 2 | 3 | 0 | NULL | NULL | + -------+--------+--------+--------+------+--------+ [elenst@localhost ~]$ mysql -uroot -p -e "select * from oss.path" Enter password : + -------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | + -------+--------+--------+--------+------+--------+ | NULL | 0 | 1 | 0 | NULL | NULL | | NULL | 1 | 2 | 0 | NULL | NULL | | NULL | 2 | 3 | 0 | NULL | NULL | + -------+--------+--------+--------+------+--------+ Please confirm that the assumptions above are correct, then I'll pass it over to our InnoDB expert to evaluate your theory about shim.
            elenst Elena Stepanova made changes -
            Due Date 2014-11-05

            Hi Elena,

            Yes, your assumptions about my psudo-code are correct. I followed those steps many (many many) times.

            I am actually disheartened to see your results did not encounter the issue.
            I could always reproduce the issue. Strange to see it working.

            One test that you did not try, was from an external host.
            But I have a feeling it will work for you without encountering the issue I had.

            Sincerely,
            Brian LaVallee

            b.lavallee@globaltank.jp Brian LaVallee added a comment - Hi Elena, Yes, your assumptions about my psudo-code are correct. I followed those steps many (many many) times. I am actually disheartened to see your results did not encounter the issue. I could always reproduce the issue. Strange to see it working. One test that you did not try, was from an external host. But I have a feeling it will work for you without encountering the issue I had. Sincerely, Brian LaVallee

            b.lavallee@globaltank.jp,
            Do you happen to know if it's still reproducible on 10.0.14?

            jplindst, andymc73,
            It seems we need your expert help in here.

            andymc73, maybe you've seen anything similar on your side?

            jplindst, see Brian's guess about shim – I'm clueless about it, do you think whether it might be true, and if so, what should I do to exploit the alleged problem?
            Re-quoting for your convenience:

            Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.

            elenst Elena Stepanova added a comment - b.lavallee@globaltank.jp , Do you happen to know if it's still reproducible on 10.0.14? jplindst , andymc73 , It seems we need your expert help in here. andymc73 , maybe you've seen anything similar on your side? jplindst , see Brian's guess about shim – I'm clueless about it, do you think whether it might be true, and if so, what should I do to exploit the alleged problem? Re-quoting for your convenience: Just speculation - The InnoDB error might point to a problem with the 'shim' mechanisms interaction with non-CLI connections. "Assertion failure" typically points to a data corruption, the 'shim' mechanism could be leaving InnoDB in an odd state.

            Can you repeat the problem using debug build and using gdb/ddd ? If yes, could you provide following output when assertion is hit:

            thread apply all bt

            R: Jan

            jplindst Jan Lindström (Inactive) added a comment - Can you repeat the problem using debug build and using gdb/ddd ? If yes, could you provide following output when assertion is hit: thread apply all bt R: Jan

            I suspect that this is the same problem discussed at length in MDEV-6282.
            I also suspect that MDEV-6345 is caused by the same underlying problem.

            Essentially, I managed to trace the problem to a flaw in the way oqgraph handles connections from different 'contexts' (for want of a better word), but at the time I was unable to see an easy fix. My best effort description of the problem is this comment at the end of MDEV-6282.
            Unfortunately I have never yet had enough time to dig into the internals of mariadb and really understand what is going on on the other side of the engine interface.

            It will involve looking at how other engines that use a shim-like mechanism resolve the same problem and working the fix into oqgraph.

            I will send the link to this bug Arjen and Antony (the original designer) in case they have some ideas... at the time Antony suggested that something called 'toku' might be worth looking at, but at this stage I never got that far

            Unfortunately I have had other committments that have prevented me from looking at this the last few months. I am hoping to get back and spend some time on this across November, with some luck. I suspect I will need some help though; With work and conference commitments I dont think I will be able to find enough of the time I believe it would take me to learn enough about the internals of things and fix things properly, if nothing obvious rears its head from the other engines, before Christmas

            andymc73 Andrew McDonnell added a comment - I suspect that this is the same problem discussed at length in MDEV-6282 . I also suspect that MDEV-6345 is caused by the same underlying problem. Essentially, I managed to trace the problem to a flaw in the way oqgraph handles connections from different 'contexts' (for want of a better word), but at the time I was unable to see an easy fix. My best effort description of the problem is this comment at the end of MDEV-6282 . Unfortunately I have never yet had enough time to dig into the internals of mariadb and really understand what is going on on the other side of the engine interface. It will involve looking at how other engines that use a shim-like mechanism resolve the same problem and working the fix into oqgraph. I will send the link to this bug Arjen and Antony (the original designer) in case they have some ideas... at the time Antony suggested that something called 'toku' might be worth looking at , but at this stage I never got that far Unfortunately I have had other committments that have prevented me from looking at this the last few months. I am hoping to get back and spend some time on this across November, with some luck. I suspect I will need some help though; With work and conference commitments I dont think I will be able to find enough of the time I believe it would take me to learn enough about the internals of things and fix things properly, if nothing obvious rears its head from the other engines, before Christmas

            andymc73,

            Since you have more ideas about the issue than anyone of us, is it okay with you if I assign it to you for now, so that it doesn't get forgotten and lost?

            elenst Elena Stepanova added a comment - andymc73 , Since you have more ideas about the issue than anyone of us, is it okay with you if I assign it to you for now, so that it doesn't get forgotten and lost?

            ok

            andymc73 Andrew McDonnell added a comment - ok
            andymc73 Andrew McDonnell made changes -
            andymc73 Andrew McDonnell made changes -
            elenst Elena Stepanova made changes -
            Assignee Andrew McDonnell [ andymc73 ]

            Can you please try testing this building MariaDB using the branch:

            https://github.com/pastcompute/server/tree/oqgraph_mdev_6282_6345_5748

            git clone https://github.com/pastcompute/server.git
            git checkout oqgraph_mdev_6282_6345_5748
            ... build and test as normal ...

            andymc73 Andrew McDonnell added a comment - Can you please try testing this building MariaDB using the branch: https://github.com/pastcompute/server/tree/oqgraph_mdev_6282_6345_5748 git clone https://github.com/pastcompute/server.git git checkout oqgraph_mdev_6282_6345_5748 ... build and test as normal ...

            I just rebased my work on latest 10.1 and intending to submit pull request after re-testing

            andymc73 Andrew McDonnell added a comment - I just rebased my work on latest 10.1 and intending to submit pull request after re-testing

            10.1 version pull request: https://github.com/MariaDB/server/pull/16

            I'll be trying to get 10.0 sorted soon

            andymc73 Andrew McDonnell added a comment - 10.1 version pull request: https://github.com/MariaDB/server/pull/16 I'll be trying to get 10.0 sorted soon
            andymc73 Andrew McDonnell added a comment - https://github.com/MariaDB/server/pull/17
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.18 [ 18702 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 55021 ] MariaDB v3 [ 66140 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 66140 ] MariaDB v4 [ 148254 ]

            People

              andymc73 Andrew McDonnell
              b.lavallee@globaltank.jp Brian LaVallee
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.