[MDEV-19794] Spider crash with XA (10.4.x only) Created: 2019-06-18  Updated: 2020-10-06  Resolved: 2020-08-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.4.5
Fix Version/s: 10.4.16, 10.5.7, 10.6.0

Type: Bug Priority: Critical
Reporter: Gert van Dijk Assignee: Kentoku Shiba (Inactive)
Resolution: Fixed Votes: 1
Labels: crash
Environment:

MariaDB 10.4.5 with Spider plugin, Ubuntu 18.04/bionic.



 Description   

Using Spider with sharding as in the documentation use cases, with a simple INSERT in a XA transaction realiably crashes it.

It only affects 10.4.x, not reproducible with 10.3.x.

2019-06-18  9:19:11 0 [Note] mysqld: ready for connections.
Version: '10.4.5-MariaDB-1:10.4.5+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
190618  9:19:57 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.4.5-MariaDB-1:10.4.5+maria~bionic
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=1
max_threads=102
thread_count=27
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 760240 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f5800000c08
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f58cc11add8 thread_stack 0x49000
mysqld(my_print_stacktrace+0x2e)[0x5627474462ce]
mysqld(handle_fatal_signal+0x515)[0x562746ec1275]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f58d3611890]
mysqld(_Z16xid_cache_deleteP3THDP9XID_STATE+0x16)[0x562746e48e26]
/usr/lib/mysql/plugin/ha_spider.so(_Z25spider_internal_xa_commitP3THDP21st_spider_transactionP5xid_tP5TABLES6_+0x537)[0x7f58cd2ae637]
/usr/lib/mysql/plugin/ha_spider.so(_Z13spider_commitP10handlertonP3THDb+0xa4)[0x7f58cd2b1554]
mysqld(+0x851ef5)[0x562746ec1ef5]
mysqld(_Z15trans_xa_commitP3THD+0x34d)[0x562746e495cd]
mysqld(_Z21mysql_execute_commandP3THD+0x3980)[0x562746cb6510]
mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x22a)[0x562746cbbcca]
mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x166d)[0x562746cbe48d]
mysqld(_Z10do_commandP3THD+0x148)[0x562746cbf8f8]
mysqld(_Z24do_handle_one_connectionP7CONNECT+0x2c2)[0x562746d994c2]
mysqld(handle_one_connection+0x3d)[0x562746d9958d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f58d36066db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f58d2a0488f]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f5800011f80): XA COMMIT '_sa_70fbd0dd1c31f43bb9a16f18b670eaa6'
Connection ID (thread ID): 29
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files     
Max locked memory         16777216             16777216             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       62717                62717                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/share/apport/apport %p %s %c %d %P

Please note that spider_internal_xa is set to OFF (default), but the above traceback shows internal_xa function names.

Backend schema:

CREATE TABLE `mytable_shard0` (
  id INT unsigned NOT NULL,
  some_field TEXT(100) DEFAULT NULL, 
  PRIMARY KEY (`id`),
) ENGINE=InnoDB ROW_FORMAT=COMPACT

Spider schema:

CREATE OR REPLACE SERVER srvA
FOREIGN DATA WRAPPER mysql 
OPTIONS (USER 'spider', PASSWORD 'spider', PORT 3306, HOST '192.168.199.10', DATABASE 'testdb');
 
CREATE OR REPLACE SERVER srvB
FOREIGN DATA WRAPPER mysql 
OPTIONS (USER 'spider', PASSWORD 'spider', PORT 3306, HOST '192.168.199.11', DATABASE 'testdb');
 
create table mytable (
  id INT unsigned NOT NULL,
  some_field TEXT(100) DEFAULT NULL, 
  PRIMARY KEY (id),
)
ENGINE=spider
partition by list (mod(id, 2)) (
    partition p00 values in(0)  comment = 'server "srvA", table "mytable_shard0"',
    partition p01 values in(1)  comment = 'server "srvB", table "mytable_shard1"'
);

Query against Spider:

XA START 'test';
INSERT INTO mytable (id, some_field) VALUES (0, "foo"), (1, "bar");
XA END 'test';
XA PREPARE 'test';
XA COMMIT 'test';

Upon the last statement (XA COMMIT 'test';), Spider crashes. I'm performing XA and not a regular transaction against Spider because of MDEV-19754.

Observations on the network:

  • XA transaction gets handled all well with the backend servers and report their final commit just fine.
  • Spider reports to the client sometimes, but not always. Seems like a race between network socket being killed with the process. Mostly seeing Lost connection to MySQL server during query with the last statement.


 Comments   
Comment by Leandro Pacheco (Inactive) [ 2019-06-26 ]

I've also hit this bug and investigated it a bit, the problem seems to be related to the following:
Spider uses the xid_cache to keep track of its "internal xa" transactions.
In 10.3, doing a xid_cache_delete was conditional in the entry being present.
In 10.4, that became an assert DBUG_ASSERT(xid_state->is_explicit_XA());.
It seems that spider calls xid_cache_delete even for non-internal xa transactions, as in the reporter's case.
That was not a problem in 10.3 (a conditional), but in 10.4 it is (breaks the assert).

Comment by Gert van Dijk [ 2019-06-26 ]

Interesting.
By the way, I am wondering why it performs an internal XA in the first place, because all my backends can do XA on the storage engine, and as per documentation it should not have to do internal XA in such a case (I'm was not using Galera in this test, but pure InnoDB). Moreover, internal_xa is turned OFF, see MDEV-19754 for more details which uses the same setup.

Comment by Leandro Pacheco (Inactive) [ 2019-06-26 ]

Right, as I mentioned I don't think your case triggers an internal xa, but spider's code still seems to call xid_cache_delete anyways (probably an oversight, given that in 10.3 that was not a problem).

Comment by Kentoku Shiba (Inactive) [ 2020-08-15 ]

Reproduce, investigate, fix, test, commit, push to 10.4, 10.5, 10.6

Generated at Thu Feb 08 08:54:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.