[MCOL-605] Crash when running table_usage Created: 2017-03-04  Updated: 2017-03-22  Resolved: 2017-03-22

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.0.7
Fix Version/s: 1.0.8, 1.1.0

Type: Bug Priority: Major
Reporter: Alexander Rubin Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 16.04.1 LTS, single node install


Sprint: 2017-5, 2017-6

 Description   

{{MariaDB [o1543]> call columnstore_info.table_usage('o1543', 'query_class_metrics');
ERROR 2013 (HY000): Lost connection to MySQL server during query

Error log:

2017-03-04 16:13:23 140515920578368 [Note] /usr/local/mariadb/columnstore/mysql//bin/mysqld: ready for connections.
Version: '10.1.21-MariaDB' socket: '/usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock' port: 3308 Columnstore 1.0.7-1
terminate called after throwing an instance of 'std::runtime_error'
what(): InetStreamSocket::connect: connect() error: Cannot assign requested address to: InetStreamSocket: sd: 64 inet: 127.0.0.1 port: 8630
170304 16:13:45 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.21-MariaDB
key_buffer_size=536870912
read_buffer_size=4194304
max_used_connections=1
max_threads=153
thread_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1780777 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x564639e78c88
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fcc30270ed8 thread_stack 0x80000
/usr/local/mariadb/columnstore/mysql//bin/mysqld(my_print_stacktrace+0x29)[0x5646380eccc9]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(handle_fatal_signal+0x2f5)[0x564637cdd7a5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fcc69012390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fcc67d31428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fcc67d3302a]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x16d)[0x7fcc6845d84d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7fcc6845b6b6]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7fcc6845b701]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_rethrow+0x49)[0x7fcc6845b969]
/usr/local/mariadb/columnstore/lib/libmessageqcpp.so.1(_ZNK11messageqcpp18MessageQueueClient5writeERKNS_10ByteStreamEPK8timespecPNS_5StatsE+0x716)[0x7fcc3e303fa6]
/usr/local/mariadb/columnstore/mysql/lib/plugin/is_columnstore_files.so(+0xcc2f)[0x7fcc39a22c2f]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x287)[0x564637bef9a7]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN4JOIN10exec_innerEv+0x6aa)[0x564637bd75ca]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN4JOIN4execEv+0x44)[0x564637bd9504]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xa24)[0x564637bd6834]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x130)[0x564637bd6b50]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(+0x3f57d8)[0x564637b7c7d8]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z21mysql_execute_commandP3THD+0x584f)[0x564637b8872f]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x14)[0x564637dc6b24]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN13sp_lex_keeper23reset_lex_and_exec_coreEP3THDPjbP8sp_instr+0x7d)[0x564637dcd38d]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x204)[0x564637dcd934]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN7sp_head7executeEP3THDb+0x757)[0x564637dc9b07]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x72a)[0x564637dcb13a]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(+0x3f5dbe)[0x564637b7cdbe]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z21mysql_execute_commandP3THD+0x1bf4)[0x564637b84ad4]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x2ba)[0x564637b8a2da]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z18idb_vtable_processP3THDyP9Statement+0x1fc4)[0x564637b8c584]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1881)[0x564637b8f561]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z10do_commandP3THD+0xd1)[0x564637b901f1]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_Z24do_handle_one_connectionP3THD+0x1b4)[0x564637c47c84]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(handle_one_connection+0x37)[0x564637c47d17]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fcc690086ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fcc67e0282d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fcbf4041818): is an invalid pointer
Connection ID (thread ID): 3
Status: NOT_KILLED

Optimizer switch: index_merge=off,index_merge_union=off,index_merge_sort_union=off,index_merge_intersection=off,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=off,derived_merge=off,derived_with_keys=off,firstmatch=off,loosescan=off,materialization=off,in_to_exists=on,semijoin=off,partial_match_rowid_merge=off,partial_match_table_scan=off,subquery_cache=off,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=off,semijoin_with_cache=off,join_cache_incremental=off,join_cache_hashed=off,join_cache_bka=off,optimize_join_buffer_size=off,table_elimination=off,extended_keys=off,exists_to_in=on,orderby_uses_equalities=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

We think the query pointer is invalid, but we will try to print it anyway.
Query: SELECT TABLE_SCHEMA, TABLE_NAME, format_filesize(sum(cf.file_size)) DATA_DISK_USAGE, format_filesize(sum(IFNULL(ccf.file_size, 0))) DICT_DISK_USAGE, format_filesize(sum(cf.file_size) + sum(IFNULL(ccf.file_size, 0))) TOTAL_USAGE FROM INFORMATION_SCHEMA.COLUMNSTORE_COLUMNS cc
JOIN INFORMATION_SCHEMA.COLUMNSTORE_FILES cf ON cc.object_id = cf.object_id
LEFT JOIN INFORMATION_SCHEMA.COLUMNSTORE_FILES ccf ON cc.dictionary_object_id = ccf.object_id
WHERE table_name = NAME_CONST('t_name',_latin1'query_class_metrics' COLLATE 'latin1_swedish_ci') and (table_schema = NAME_CONST('t_schema',_latin1'o1543' COLLATE 'latin1_swedish_ci') or NAME_CONST('t_schema',_latin1'o1543' COLLATE 'latin1_swedish_ci') IS NULL) GROUP BY table_schema, table_name

170304 16:13:45 mysqld_safe Number of processes running now: 0
170304 16:13:45 mysqld_safe mysqld restarted}}



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-03-05 ]

This is due to port exhaustion due to ports going into a TIME_WAIT state. The code needs modifying to reuse connections rather than binning them and reopening. It also needs to handle this as an error better.

As a workaround for now you can increase the available ports and/or decrease the port TIME_WAIT state for your OS:

# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
# echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range

Comment by Andrew Hutchings (Inactive) [ 2017-03-05 ]

The best way I can see to test the new behaviour in this patch is to watch the tcp-tw output of this whilst querying I_S.COLUMNSTORE_FILES:

sudo sar -n SOCK 1

It should only increase slightly instead of a large jump.

Comment by Alexander Rubin [ 2017-03-05 ]

Andrew, thank you, workaround worked.

Comment by David Thompson (Inactive) [ 2017-03-07 ]

Reopening to consider for 1.0.8 backport

Comment by Andrew Hutchings (Inactive) [ 2017-03-07 ]

New review for 1.0 backport

Comment by Daniel Lee (Inactive) [ 2017-03-21 ]

Build tested: 1.0.8-1 1um2pm stack

MariaDB [(none)]> select * from information_schema.COLUMNSTORE_FILES;
ERROR 1105 (HY000): Unknown error
MariaDB [(none)]> use information_schema
Database changed
MariaDB [information_schema]> desc COLUMNSTORE_FILES;
------------------------------------------------------+

Field Type Null Key Default Extra

------------------------------------------------------+

OBJECT_ID int(11) NO   0  
SEGMENT_ID int(11) NO   0  
PARTITION_ID int(11) NO   0  
FILENAME varchar(1024) NO      
FILE_SIZE bigint(19) YES   NULL  
COMPRESSED_DATA_SIZE bigint(19) YES   NULL  

------------------------------------------------------+
6 rows in set (0.00 sec)

MariaDB [information_schema]> select * from COLUMNSTORE_FILES;
ERROR 1105 (HY000): Unknown error

No crit.log and err.log in all three nodes. No errors have been logged yet.

Comment by Andrew Hutchings (Inactive) [ 2017-03-22 ]

2 new branches (develop-1.0 and develop). Fixes problem found by Daniel

Comment by Daniel Lee (Inactive) [ 2017-03-22 ]

Build verified: 1.0.8-1

mcsadmin> getsoftware
getsoftwareinfo Wed Mar 22 21:53:39 2017

Name : mariadb-columnstore-platform
Version : 1.0.8
Release : 1
Architecture: x86_64
Install Date: Wed 22 Mar 2017 09:37:39 PM UTC
Group : Applications/Databases
Size : 10012539
License : Copyright (c) 2016 MariaDB Corporation Ab., all rights reserved; redistributable under the terms of the GPL, see the file COPYING for details.
Signature : (none)
Source RPM : mariadb-columnstore-platform-1.0.8-1.src.rpm
Build Date : Wed 22 Mar 2017 08:41:24 PM UTC

Loaded a 10g DBT3 database and executed the following query:

MariaDB [(none)]> select * from information_schema.COLUMNSTORE_FILES;
------------------------------------------------------------------------------------------------------------------------------------------------------+

OBJECT_ID SEGMENT_ID PARTITION_ID FILENAME FILE_SIZE COMPRESSED_DATA_SIZE

------------------------------------------------------------------------------------------------------------------------------------------------------+

3005 0 0 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/011.dir/189.dir/000.dir/FILE000.cdf 33562624 3080192
3005 1 0 /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/189.dir/000.dir/FILE001.cdf 33562624 3080192
3006 0 0 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/011.dir/190.dir/000.dir/FILE000.cdf 33562624 3080192
3006 1 0 /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/190.dir/000.dir/FILE001.cdf 33562624 3080192

.
.
.

Hear is the sar output:

09:50:32 PM totsck tcpsck udpsck rawsck ip-frag tcp-tw
09:50:33 PM 239 35 3 0 0 15
09:50:34 PM 239 35 3 0 0 15
09:50:35 PM 239 35 3 0 0 15
09:50:36 PM 239 35 3 0 0 15
09:50:37 PM 239 35 3 0 0 15
09:50:38 PM 239 35 3 0 0 15
09:50:39 PM 239 35 3 0 0 15
09:50:40 PM 239 35 3 0 0 15
09:50:41 PM 239 35 3 0 0 15
09:50:42 PM 239 35 3 0 0 15
09:50:43 PM 241 35 3 0 0 15
09:50:44 PM 241 35 3 0 0 15
09:50:45 PM 241 35 3 0 0 10
09:50:46 PM 241 35 3 0 0 10
09:50:47 PM 241 35 3 0 0 10
09:50:48 PM 241 35 3 0 0 10
09:50:49 PM 241 35 3 0 0 10
09:50:50 PM 245 39 3 0 0 10
09:50:51 PM 243 37 3 0 0 12
09:50:52 PM 243 37 3 0 0 12
09:50:53 PM 243 37 3 0 0 12
09:50:54 PM 243 37 3 0 0 12
09:50:55 PM 243 37 3 0 0 12
09:50:56 PM 243 37 3 0 0 12
09:50:57 PM 243 37 3 0 0 12
09:50:58 PM 243 37 3 0 0 12
09:50:59 PM 243 37 3 0 0 12
09:51:00 PM 243 37 3 0 0 12

Still need to test for 1.1.0

Comment by Daniel Lee (Inactive) [ 2017-03-22 ]

Build tested: 1.1.0 GitHub source

[root@localhost mariadb-columnstore-server]# git show
commit 24f5065dba978ef4ec6ea8cb4bfd9728321941bd
Author: david hill <david.hill@mariadb.com>
Date: Mon Mar 20 09:57:25 2017 -0500

[root@localhost mariadb-columnstore-engine]# git show
commit ff36e8e774fd961625d42b45070f56bead83a5bc
Merge: c1d38e0 6ab434e
Author: dhall-InfiniDB <david.hall@mariadb.com>
Date: Wed Mar 22 15:16:53 2017 -0500

Merge pull request #139 from mariadb-corporation/MCOL-605-hotfix

MCOL-605 Fix multi-node "Unkown error"

Execute the same test for 1.0.8-1

MariaDB [(none)]> select * from information_schema.COLUMNSTORE_FILES;
------------------------------------------------------------------------------------------------------------------------------------------------------+

OBJECT_ID SEGMENT_ID PARTITION_ID FILENAME FILE_SIZE COMPRESSED_DATA_SIZE

------------------------------------------------------------------------------------------------------------------------------------------------------+

3005 0 0 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/011.dir/189.dir/000.dir/FILE000.cdf 33562624 3080192
3005 1 0 /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/189.dir/000.dir/FILE001.cdf 33562624 3080192
3006 0 0 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/011.dir/190.dir/000.dir/FILE000.cdf 33562624 3080192

Linux 3.10.0-327.36.1.el7.x86_64 (localhost.localdomain) 03/22/2017 x86_64 (4 CPU)

10:07:16 PM totsck tcpsck udpsck rawsck ip-frag tcp-tw
10:07:17 PM 239 31 3 0 0 30
10:07:18 PM 239 31 3 0 0 24
10:07:19 PM 239 31 3 0 0 24
10:07:20 PM 239 31 3 0 0 24
10:07:21 PM 239 31 3 0 0 24
10:07:22 PM 239 31 3 0 0 24
10:07:23 PM 239 31 3 0 0 24
10:07:24 PM 243 35 3 0 0 24
10:07:25 PM 243 35 3 0 0 24
10:07:26 PM 243 35 3 0 0 22
10:07:27 PM 241 33 3 0 0 24
10:07:28 PM 241 33 3 0 0 24
10:07:29 PM 241 33 3 0 0 24
10:07:30 PM 241 33 3 0 0 24
10:07:31 PM 241 33 3 0 0 24
10:07:32 PM 241 33 3 0 0 24

Generated at Thu Feb 08 02:22:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.