[CONJ-491] Crossed Queries and "Packets out of order when reading field packets, expected was EOF stream." Created: 2017-06-01  Updated: 2019-03-13  Resolved: 2019-03-13

Status: Closed
Project: MariaDB Connector/J
Component/s: Other
Affects Version/s: 1.2.3, 1.5.9
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Jamie Jackson Assignee: Diego Dupin
Resolution: Won't Do Votes: 1
Labels: None
Environment:

Red Hat Enterprise Linux Server release 6.9 (Santiago)
Linux 2.6.32-696.1.1.el6.x86_64 #1 SMP Tue Mar 21 12:19:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Relates
relates to CONJ-504 java.lang.ArrayIndexOutOfBoundsExcept... Closed

 Description   

I'm getting issues lately with my web app:

"ERROR","http-nio-8888-exec-30","05/31/2017","10:15:07",""," The specific sequence of files included or processed is: /home/mysite/public_html/wwwroot/requirements/mura/content/contentGatewayAdobe.cfc, line: 731 "
(conn:47726) Packets out of order when reading field packets, expected was EOF stream. Packet contents (hex) = 03
Query is:
 
SELECT
 
title, releasedate, menuTitle, tcontent.lastupdate,summary, tags,tcontent.filename, type,subType, tcontent.siteid,
tcontent.contentid, tcontent.contentHistID, target, targetParams,
restricted, restrictgroups, displaystart, displaystop, orderno,sortBy,sortDirection,
tcontent.fileid, tcontent.credits, tcontent.remoteSource, tcontent.remoteSourceURL, tcontent.remoteURL,
tfiles.fileSize,tfiles.fileExt, audience, keypoints
,tcontentstats.rating,tcontentstats.totalVotes,tcontentstats.downVotes,tcontentstats.upVotes
,tcontentstats.comments, '' as parentType, null as kids,tcontent.path, tcontent.created, tcontent.nextn,
tcontent.majorVersion, tcontent.minorVersion, tcontentstats.lockID, tcontentstats.lockType, tcontent.expires,
tfiles.filename as AssocFilename,tcontent.displayInterval,tcontent.display,tcontentfilemetadata.altText as fileAltText
 
FROM tcontent
 
Left Join tfiles ON (tcontent.fileID=tfiles.fileID)
left Join tco...;
at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:150):150
at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101):101
at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77):77
at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226):226
at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeInternal(MariaDbClientPreparedStatement.java:233):233
at org.mariadb.jdbc.MariaDbClientPreparedStatement.execute(MariaDbClientPreparedStatement.java:163):163
at com.intergral.fusionreactor.jdbc.jdbc42.PreparedStatementSurrogate2.execute(PreparedStatementSurrogate2.java:200):200
at lucee.runtime.db.driver.state.StateUtil.execute(StateUtil.java:145):145
at lucee.runtime.db.driver.state.StatePreparedStatement.execute(StatePreparedStatement.java:135):135
at lucee.runtime.type.util.QueryUtil.execute(QueryUtil.java:240):240
at lucee.runtime.type.QueryImpl.<init>(QueryImpl.java:252):252
at lucee.runtime.tag.Query.executeDatasoure(Query.java:815):815
at lucee.runtime.tag.Query.doEndTag(Query.java:610):610
at content.contentgatewayadobe_cfc$cf.udfCall1(/mura/content/contentGatewayAdobe.cfc:731):731
at content.contentgatewayadobe_cfc$cf.udfCall(/mura/content/contentGatewayAdobe.cfc):-1

These explicit, fatal errors are one of the better types of outcomes during these overall incidents. The most disturbing type of problem that I get is when I don't get an error, but I get crossed queries: Say my app queries, "select a, b, c from mytable", but it ends up with a completely unrelated result (e.g., with columns x, y, and z from some other table), probably from some concurrent connection.

At a glance, it looked like a networking issue to me, so I engaged my CIT department to do some troubleshooting on the network, and although they were monitoring during one of the incidents, they saw nothing wrong.

There is a section of my application that is slow (due to slow queries), and the problem is almost always precipitated by slow requests to this section; however, we haven't been able to reproduce the issue outside of production.

For what it's worth, I've found that the minimum steps to recover from the problem are to:

  • Restart the MariaDB service
  • Re-initialize the web app (seems to clear out the issues resulting from crossed queries)

This problem is beyond my troubleshooting abilities, since it seems to be low-level (i.e., not related to my application code), but rather related to MariaDB (5.5.56), the JDBC driver (1.2.3 & 1.5.9 – I upgraded to try to fix it, but no luck), or my app server (Lucee--an abstraction layer on top of Java). For what it's worth I also filed a ticket with Lucee, but it's languishing: https://luceeserver.atlassian.net/browse/LDEV-1316

data source definition, fwiw

<data-source allow="127" blob="false" class="org.mariadb.jdbc.Driver" clob="false" connectionLimit="40" 
connectionTimeout="20" custom="" database="" dbdriver="Other"
dsn="jdbc:mysql://myhost:3306/my_schema?connectTimeout=60000&amp;socketTimeout=180000" host=""
metaCacheTimeout="60000" name="my_datasource" password="encrypted:mypass" port="-1" storage="false"
username="my_user" validate="false"/>



 Comments   
Comment by Diego Dupin [ 2017-07-12 ]

Query cache is normally disable by default, but this correspond to an issue detected today : MDEV-13300 in query cache.
Could you verify that your server configuration has query cache enable (SELECT @@have_query_cache, @@query_cache_type, @@query_cache_size) ?

Comment by Jamie Jackson [ 2017-07-12 ]

Yup:

MariaDB [(none)]> SELECT @@have_query_cache, @@query_cache_type, @@query_cache_size\G;
*************************** 1. row ***************************
@@have_query_cache: YES
@@query_cache_type: ON
@@query_cache_size: 67108864
1 row in set (0.03 sec)
 
ERROR: No query specified

Comment by Diego Dupin [ 2017-07-13 ]

closing since it's a server issue : this will be handled with MDEV-13300.
workaround is to disable query cache or using only drivers that have CLIENT_DEPRECATE_EOF (or driver's that haven't CLIENT_DEPRECATE_EOF)

Comment by Jamie Jackson [ 2017-07-13 ]

Thanks, Diego, but could you try explaining that workaround again? (It's confusing, as written.) Maybe write it as pseudocode with if statements?

Comment by Diego Dupin [ 2017-07-13 ]

i've changed MDEV-13300 description, but the best is to disable query cache :

SET global query_cache_type=0
RESET QUERY CACHE

that will disable immediatly the query cache.

Server configuration must be changed accordingly (query_cache_type=0) in configuration file, to avoid reenabling it is server is rebooted.

Comment by Diego Dupin [ 2017-07-17 ]

Sorry, I forgot that this was about this version of server (5.5). Since this error corresponded perfectly to the MDEV-13300 error, I was too quick to assign this to this error.

I reopen this issue.

Comment by Jamie Jackson [ 2017-07-17 ]

Yes, it's suspicious that my symptoms are also packet errors and/or getting results back from unrelated queries. For what it's worth, I disabled the query cache on 7/13 (when I'd thought MDEV-13300 was the culprit). I don't know whether it's had any effect yet.

Comment by Diego Dupin [ 2017-07-17 ]

The thing is that's very weird : error message with "Packet contents (hex) = 03" means that packet content first byte is 03.
0x03 is the first byte of a column definition packet that describe column.
driver is waiting for an EOF packet then throw an exception, because protocol is not respected.

Before sending column definition packets, server indicate the number of column. This error would indicate that there server send more column than expected. I don't see how that can occur.

So i have a few question :

  • Do you use java 8 ? if so, you use driver 2.0.3 and activate option "enablePacketDebug" (i.e. use connection string jdbc:mysql://myhost:3306/my_schema?connectTimeout=60000&socketTimeout=180000*&enablePacketDebug=true*) That won't correct the problem, but at least, stacktrace will contain the last hexadecimal exchanges with server. That would permit to identify this issue.
  • do you use any proxy (HAProxy / maxscale ? )
  • please join server configuration
Comment by Jamie Jackson [ 2017-07-17 ]
  • I'm pushing the 2.0.3 driver update & enablePacketDebug=true through our tiers. I'll have it up this evening.
  • We don't have any intermediate proxy; the connections are direct.
  • Config:

    /etc/my.cnf.d/server.cnf

    #
    # These groups are read by MariaDB server.
    # Use it for options that only the server (but not clients) should see
    #
    # See the examples of server my.cnf files in /usr/share/mysql/
    #
     
    # this is read by the standalone daemon and embedded servers
    [server]
     
    # this is only for the mysqld standalone daemon
    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    user=mysql
    group=mysql
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    max_allowed_packet=16384000
    group_concat_max_len=4294967295
     
    innodb_buffer_pool_size=2560M
    innodb_file_format = Barracuda
    innodb_file_per_table = 1
     
    innodb_open_files = 4096
     
     
    # disable query cache 7-13-2017
    query_cache_type=0
    query_cache_size=0
     
    tmp_table_size = 16M
    max_heap_table_size = 16M
     
    max_connections = 1000
    open_files_limit = 4096
    table_cache = 4096
     
    #changes 03-23-2017
    #changes 03-23-2017
    slow_query_log = 0
    slow_query_log_file = /var/log/mysql/redactedhostname-slow.log
    log_output = FILE
    long_query_time = 30
    log_queries_not_using_indexes = 1
    local-infile = 0
    ignore-db-dir=lost+found
    ignore-db-dir=lost@002bfound
    log_warning = 2
    skip-symbolic-links
    log-error=/var/log/mysql/redactedhostname.err
    secure_auth=1
    expire_logs_days=3
    character-set-server=utf8
    collation-server=utf8_general_ci
    init_connect='SET collation_connection = utf8_general_ci'
    init_connect='SET NAMES utf8'
    skip-character-set-client-handshake
     
     
    # this is only for embedded server
    [embedded]
     
    # This group is only read by MariaDB-5.5 servers.
    # If you use the same .cnf file for MariaDB of different versions,
    # use this group for options that older servers don't understand
    [mysqld-5.5]
     
    # These two groups are only read by MariaDB servers, not by MySQL.
    # If you use the same .cnf file for MySQL and MariaDB,
    # you can put MariaDB-only options here
    [mariadb]
     
    [mariadb-5.5]
    

For what it's worth, we've been running without query caching since 7/13 (when we'd thought this was caused by MDEV-13300), but I see more incidents of the error* from my app since then.

* (conn:3425) Packets out of order when reading field packets, expected was EOF stream. Packet contents (hex) = 03

Comment by Jamie Jackson [ 2017-07-18 ]

I pushed the 2.0.3 driver update & enablePacketDebug=true to production last night.

Twenty minutes ago, we suffered another outage due to this bug. Unfortunately, we universally got the "crossed queries" manifestation this time, versus the "EOF" error. When we get crossed queries, we don't get JDBC errors, we just see a slew of downstream effects (e.g., "column not found, `blah`) of the wrong query being returned (because the wrong query's results are returned).

Unfortunately, this manifestation doesn't generate its own exception, so I didn't get anything out of the enablePacketDebug=true setting.

Comment by Jamie Jackson [ 2017-07-25 ]

As I mentioned previously, I'd been having a mixture of two kinds of problems:

  1. Crossed queries (that don't throw JDBC exceptions, but which cause problems in my app)
  2. Packet errors (which do throw exceptions)

Since upgrading the driver, I only seem to get the first problem, which isn't helpful, since there's no exception now for enablePacketDebug to expose.

I need a plan B. Any ideas?

Comment by Diego Dupin [ 2017-11-21 ]

closing as requested

Comment by Diego Dupin [ 2017-11-21 ]

sorry closed wrong task

Comment by Diego Dupin [ 2019-03-13 ]

closing since not reproduced and no news since a long time

Generated at Thu Feb 08 03:16:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.