[MDEV-10287] mariadb server crashes for no apparent reason Created: 2016-06-25  Updated: 2020-10-01  Resolved: 2020-10-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.1.14
Fix Version/s: 10.1.32

Type: Bug Priority: Critical
Reporter: Robert Dyas Assignee: Anel Husakovic
Resolution: Incomplete Votes: 0
Labels: None
Environment:

centOS7 on Google Compute Engine


Attachments: Text File log.txt     File my.cnf    

 Description   

Server crashes frequently now (once or twice per day) for no apparent reason. A previous report looked like it was linked to CONNECT engine or unixODBC issues, but this one I can't tell.

Jun 25 10:19:04 si-mariadb-3 mysqld: 160625 10:19:04 [ERROR] mysqld got signal 11 ;
Jun 25 10:19:04 si-mariadb-3 mysqld: This could be because you hit a bug. It is also possible that this binary
Jun 25 10:19:04 si-mariadb-3 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Jun 25 10:19:04 si-mariadb-3 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Jun 25 10:19:04 si-mariadb-3 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jun 25 10:19:04 si-mariadb-3 mysqld: We will try our best to scrape up some info that will hopefully help
Jun 25 10:19:04 si-mariadb-3 mysqld: diagnose the problem, but since we have already crashed,
Jun 25 10:19:04 si-mariadb-3 mysqld: something is definitely wrong and this may fail.
Jun 25 10:19:04 si-mariadb-3 mysqld: Server version: 10.1.14-MariaDB
Jun 25 10:19:04 si-mariadb-3 mysqld: key_buffer_size=134217728
Jun 25 10:19:04 si-mariadb-3 mysqld: read_buffer_size=131072
Jun 25 10:19:04 si-mariadb-3 mysqld: max_used_connections=1
Jun 25 10:19:04 si-mariadb-3 mysqld: max_threads=153
Jun 25 10:19:04 si-mariadb-3 mysqld: thread_count=2
Jun 25 10:19:04 si-mariadb-3 mysqld: It is possible that mysqld could use up to
Jun 25 10:19:04 si-mariadb-3 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467113 K  bytes of memory
Jun 25 10:19:04 si-mariadb-3 mysqld: Hope that's ok; if not, decrease some variables in the equation.
Jun 25 10:19:04 si-mariadb-3 mysqld: Thread pointer: 0x0x7f137ee70008
Jun 25 10:19:04 si-mariadb-3 mysqld: Attempting backtrace. You can use the following information to find out
Jun 25 10:19:04 si-mariadb-3 mysqld: where mysqld died. If you see no messages after this, something went
Jun 25 10:19:04 si-mariadb-3 mysqld: terribly wrong...
Jun 25 10:19:04 si-mariadb-3 mysqld: stack_bottom = 0x7f13ae025130 thread_stack 0x48400
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f13aebc71be]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f13ae6f2f0d]
Jun 25 10:19:04 si-mariadb-3 mysqld: /lib64/libpthread.so.0(+0xf100)[0x7f13add11100]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(+0x82dcb3)[0x7f13ae96fcb3]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(+0x83220d)[0x7f13ae97420d]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(+0x83254b)[0x7f13ae97454b]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(+0x71708b)[0x7f13ae85908b]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(+0x5b1ad1)[0x7f13ae6f3ad1]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x206)[0x7f13ae582906]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(_ZN3THDD1Ev+0x20e)[0x7f13ae547bae]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(_ZN3THDD0Ev+0x11)[0x7f13ae548181]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(_Z10unlink_thdP3THD+0x172)[0x7f13ae4dc7c2]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x20)[0x7f13ae4dc820]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x113)[0x7f13ae6415b3]
Jun 25 10:19:04 si-mariadb-3 mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f13ae641800]
Jun 25 10:19:04 si-mariadb-3 mysqld: /lib64/libpthread.so.0(+0x7dc5)[0x7f13add09dc5]
Jun 25 10:19:04 si-mariadb-3 mysqld: /lib64/libc.so.6(clone+0x6d)[0x7f13ac12bced]
Jun 25 10:19:04 si-mariadb-3 mysqld: Trying to get some variables.
Jun 25 10:19:04 si-mariadb-3 mysqld: Some pointers may be invalid and cause the dump to abort.
Jun 25 10:19:04 si-mariadb-3 mysqld: Query (0x0):
Jun 25 10:19:04 si-mariadb-3 mysqld: Connection ID (thread ID): 164
Jun 25 10:19:04 si-mariadb-3 mysqld: Status: KILL_CONNECTION
Jun 25 10:19:04 si-mariadb-3 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_
pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_
table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimi
ze_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
Jun 25 10:19:04 si-mariadb-3 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Jun 25 10:19:04 si-mariadb-3 mysqld: information that should help you find out what is causing the crash.



 Comments   
Comment by Elena Stepanova [ 2016-06-25 ]

Could you please provide a bigger part of the error log, preferably from the server startup and till the end of the crash?
Please also attach your cnf file(s) and the output of SHOW PLUGINS.

Thanks.

Comment by Robert Dyas [ 2016-06-26 ]

Resuts of SHOW PLUGINS below:

binlog	ACTIVE	STORAGE ENGINE		GPL
mysql_native_password	ACTIVE	AUTHENTICATION		GPL
mysql_old_password	ACTIVE	AUTHENTICATION		GPL
wsrep	ACTIVE	STORAGE ENGINE		GPL
CSV	ACTIVE	STORAGE ENGINE		GPL
MEMORY	ACTIVE	STORAGE ENGINE		GPL
MyISAM	ACTIVE	STORAGE ENGINE		GPL
MRG_MyISAM	ACTIVE	STORAGE ENGINE		GPL
CLIENT_STATISTICS	ACTIVE	INFORMATION SCHEMA		GPL
INDEX_STATISTICS	ACTIVE	INFORMATION SCHEMA		GPL
TABLE_STATISTICS	ACTIVE	INFORMATION SCHEMA		GPL
USER_STATISTICS	ACTIVE	INFORMATION SCHEMA		GPL
InnoDB	ACTIVE	STORAGE ENGINE		GPL
XTRADB_READ_VIEW	ACTIVE	INFORMATION SCHEMA		GPL
XTRADB_INTERNAL_HASH_TABLES	ACTIVE	INFORMATION SCHEMA		GPL
XTRADB_RSEG	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_TRX	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_LOCKS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_LOCK_WAITS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CMP	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CMP_RESET	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CMPMEM	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CMPMEM_RESET	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CMP_PER_INDEX	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CMP_PER_INDEX_RESET	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_BUFFER_PAGE	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_BUFFER_PAGE_LRU	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_BUFFER_POOL_STATS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_METRICS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_FT_DEFAULT_STOPWORD	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_FT_DELETED	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_FT_BEING_DELETED	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_FT_CONFIG	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_FT_INDEX_CACHE	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_FT_INDEX_TABLE	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_TABLES	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_TABLESTATS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_INDEXES	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_COLUMNS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_FIELDS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_FOREIGN	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_FOREIGN_COLS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_TABLESPACES	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_DATAFILES	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_CHANGED_PAGES	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_MUTEXES	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_SYS_SEMAPHORE_WAITS	ACTIVE	INFORMATION SCHEMA		GPL
INNODB_TABLESPACES_ENCRYPTION	ACTIVE	INFORMATION SCHEMA		BSD
INNODB_TABLESPACES_SCRUBBING	ACTIVE	INFORMATION SCHEMA		BSD
CHANGED_PAGE_BITMAPS	ACTIVE	INFORMATION SCHEMA		GPL
SEQUENCE	ACTIVE	STORAGE ENGINE		GPL
PERFORMANCE_SCHEMA	ACTIVE	STORAGE ENGINE		GPL
Aria	ACTIVE	STORAGE ENGINE		GPL
FEEDBACK	DISABLED	INFORMATION SCHEMA		GPL
partition	ACTIVE	STORAGE ENGINE		GPL
CONNECT	ACTIVE	STORAGE ENGINE	ha_connect.so	GPL

Comment by Robert Dyas [ 2016-06-26 ]

Log file from last week or so attached. You will see several crashes, most with different back traces. When we look at the SQL statements before this occurs (logs I can't share as has customer info in it) the statements before the crash are not similar.

Log file attached.
log.txt

Comment by Robert Dyas [ 2016-06-26 ]

My.cnf file attached.
my.cnf

Comment by Robert Dyas [ 2016-06-28 ]

Some additional info we have found in the last two days of testing and trying to track this down:

When we modified our app to use only local tables (no CONNECT tables) but made absolutely zero other changes, the server is now up for 48 hours and counting under load. Before it would go down after 6 to 14 hours at most. So it is looking like the root cause is either in CONNECT engine, unixODBC, or the Simba Salesforce ODBC driver itself.

How does one track down the root cause in this situation?

Comment by Elena Stepanova [ 2016-06-29 ]

Since you mentioned before that you actually have an SQL log (which you cannot share), it might be useful to check what the allegedly guilty connection was doing before the crash happened.
For example, if you look at the last crash in the error log, the one from Jun 26 04:38:31, it happened when connection no. 83 was killed.

The crash happened not upon a query execution, but upon certain plugin-related actions a connection is supposed to perform when it exits.

You can check your SQL log for the activity that the connection was performing during its life time. It must have done something related to the CONNECT engine that triggered a delayed problem which popped up when the connecting was being closed.

If it so happens that the connection didn't do too many different things, it might help to pinpoint the point of failure in CONNECT (or maybe not even CONNECT specifically, but plugins in general).

We have similar reports, but no known root cause so far.

Comment by Robert Dyas [ 2016-06-29 ]

RE "It must have done something related to the CONNECT engine that triggered a delayed problem which popped up when the connecting was being closed."

I think you are on the right path. During the connection's lifetime it performs a large number of queries, many join a CONNECT table with a normal table (innodb). This seems to work fine. The statements immediately leading up to the crash are often different (not necessarily same tables involved). Note that we are using unixODBC pooled connections, so possibly when those connections are cleaned up it crashes? I wonder if you can find out if the people having similar problems are using pooled connections?

Comment by Elena Stepanova [ 2016-07-01 ]

rdyas,

Can you try to check in your logs whether the failed connection was killed by some other thread (by a KILL <connection_id> command), or was it exiting on its own (<connection_id> Quit record is present in the general log)?

Comment by Robert Dyas [ 2016-07-02 ]

Elena,

I don't think I have this log enabled... the queries are logged by our own software on google cloud with some additional context info, not via the built in maria db logging. Where would the general log file be by default? If I don't have it enabled, what would be the best way to config it in my.cnf so we can track this in the future with whatever info would be most helpful to you?

Comment by Elena Stepanova [ 2016-07-03 ]

The question about the connection quitting vs being killed might still apply to your custom logging. If it logs queries, it will log the KILL query as well; and quite probably it also records connect/disconnect.

If not, then, to enable general log temporarily till the first crash, I think the best way is just to run SET GLOBAL general_log=1. It will save you server restart, and also after the server is crashed and restarted, the general log will be disabled automatically again, which is probably what you want.

Please note however that enabling general log can come at the cost of some performance on a busy server; so, if you have high traffic and performance is of the essence, maybe you shouldn't do it (at least not just yet).

Comment by Robert Dyas [ 2016-07-25 ]

So I have some more info that (might) help in finding the problem.

If we leave everything the same BUT turn off connection pooling in UnixODBC we (so far) have had zero problems.

Also, slightly off topic, is the new JDBC connect table type ready to test in 10.1.16? And if so, does it catch exceptions thrown by the driver and just report them back as normal errors? The reason I ask is if so this would eliminate in the future the possibility that a driver bug brings down the entire database server.

Comment by Elena Stepanova [ 2016-07-25 ]

rdyas, thanks for the update.

Taking into account the new information, I'm assigning it to bertrandop for further investigation. bertrandop should also be able to answer the question about the new connect table type.

Comment by Olivier Bertrand [ 2016-07-25 ]

Yes the JDBC table type should be available in version 10.1.16.

Unfortunately, the version that is included in MariaDB 10.1.15 is outdated (and perhaps buggy)

Yes, all exception thrown by the drivers are handled in the java wrappers (interfacing via JNI between MariaDB and the JDBC drivers). The error messages are retrieved by CONNECT as normal errors and should not crash the server.

By the way, in the above crash report, I don't see anything involving CONNECT. So what make you think that it is due to ODBC tables? It would be interresting to know if such crashes have been observed on other machines.

Comment by Robert Dyas [ 2016-10-13 ]

This appears to be the same issue as MDEV-10273.

The root cause appears to be something related to ODBC connection pooling, as once the connection pooling was turned off this problem is 100% gone for several months in production with zero crashes.

Also, it appears to be related closing (or attempting to close and it fails) the ODBC connection.
This is no longer a concern for me as I am switching to the JDBC implementation for everything which so far has caused zero server crashes.

Generated at Thu Feb 08 07:41:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.