[MCOL-1008] LDI and INSERT...SELECT causes mysqld to crash with long VARCHAR entries Created: 2017-11-03  Updated: 2017-11-13  Resolved: 2017-11-13

Status: Closed
Project: MariaDB ColumnStore
Component/s: MDB Plugin
Affects Version/s: 1.0.12, 1.1.1
Fix Version/s: 1.0.12, 1.1.2

Type: Bug Priority: Major
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: relnote

Sprint: 2017-22, 2017-23

 Description   

Build tested: 1.1.1-1

Testing stack: Single server, with 1 dbroot, using local VM.

This issue was identified by the Autopilot test concurrency.concurDML. The same test worked for 1.1.0-1 beta, not failed on 1.1.1-1.

The test suite run's the same test for update, delete, and ldi. For update and delete tests, dbt3 tables were loaded with 1gb data before test. ldi test starts with empty tables. update and delete tests worked fine, but ldi tests failed.

The test basically does the following:

1) run ldi, which uses cpimport, concurrently for all 8 dbt3 tables loading 1gb data
2) concurrently, run select count queries on the tables being tested

The purpose of this test is to ensure data consistency, commands should commit none or all changes

During ldi tests, mysqld crashed, leaving tables locks from cpimport processes. Sometimes, queries also return "missing data block" errors.

I spent hours to debug this issue and this is what I found:

1) instead of testing with all 8 tables, testing nation and region went fine
2) lineitem and order also fine
3) When testing lineitem, orders, and partsupp, it failed
4) when testing nation, region, customer, it worked some times. Nation and region tables are small and could finish quickly.

I thing this could be related to persisting data blocks, update or invalidating cache, updating extent map, or timing of doing these operations, etc.



 Comments   
Comment by Daniel Lee (Inactive) [ 2017-11-11 ]

mysqld log file content after mysqld died.

Version: '10.2.10-MariaDB-log' socket: '/usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock' port: 3306 Columnstore 1.1.2-1
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::assign
171111 2:39:48 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.2.10-MariaDB-log
key_buffer_size=536870912
read_buffer_size=4194304
max_used_connections=9
max_threads=153
thread_count=16
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1780877 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fa27c0009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa2e5564dd8 thread_stack 0x80000
/usr/local/mariadb/columnstore/mysql//bin/mysqld(my_print_stacktrace+0x29)[0x7fa320fbdec9]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(handle_fatal_signal+0x30d)[0x7fa320ae6ecd]
/lib64/libpthread.so.0(+0xf100)[0x7fa32008e100]
/lib64/libc.so.6(gsignal+0x37)[0x7fa31e1fb5f7]
/lib64/libc.so.6(abort+0x148)[0x7fa31e1fcce8]
/lib64/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x165)[0x7fa31eb009d5]
/lib64/libstdc++.so.6(+0x5e946)[0x7fa31eafe946]
/lib64/libstdc++.so.6(+0x5e973)[0x7fa31eafe973]
/lib64/libstdc++.so.6(+0x5eb93)[0x7fa31eafeb93]
/lib64/libstdc++.so.6(ZSt20_throw_length_errorPKc+0x77)[0x7fa31eb53967]
/lib64/libstdc++.so.6(_ZNSs6assignEPKcm+0xd6)[0x7fa31eb5e556]
mysys/stacktrace.c:268(my_print_stacktrace)[0x7fa2f7d62972]
sql/signal_handler.cc:168(handle_fatal_signal)[0x7fa2f7d454fd]
/usr/local/mariadb/columnstore/mysql//bin/mysqld(_ZN7handler12ha_write_rowEPh+0x1ef)[0x7fa320af0c3f]
sql/handler.cc:6016(handler::ha_write_row(unsigned char*))[0x7fa320968d2b]
sql/sql_load.cc:1219(mysql_load(THD*, sql_exchange*, TABLE_LIST*, List<Item>&, List<Item>&, List<Item>&, enum_duplicates, bool, bool))[0x7fa320c1857a]
sql/sql_parse.cc:4847(mysql_execute_command(THD*))[0x7fa320986227]
sql/sql_parse.cc:7904(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x7fa320988c20]
sql/sql_parse.cc:1834(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x7fa32098e03c]
sql/sql_parse.cc:1414(do_command(THD*))[0x7fa32098e884]
sql/sql_connect.cc:1354(do_handle_one_connection(CONNECT*))[0x7fa320a47224]
sql/sql_connect.cc:1262(handle_one_connection)[0x7fa320a472e4]
/lib64/libpthread.so.0(+0x7dc5)[0x7fa320086dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fa31e2bcced]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fa27c00efe0): load data infile '/data/qa/autopilot/data/source/dbt3/1g/partsupp.tbl' into table partsupp fields terminated by '|'

Connection ID (thread ID): 27
Status: NOT_KILLED

Optimizer switch: index_merge=off,index_merge_union=off,index_merge_sort_union=off,index_merge_intersection=off,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=off,derived_merge=off,derived_with_keys=off,firstmatch=off,loosescan=off,materialization=off,in_to_exists=on,semijoin=off,partial_match_rowid_merge=off,partial_match_table_scan=off,subquery_cache=off,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=off,semijoin_with_cache=off,join_cache_incremental=off,join_cache_hashed=off,join_cache_bka=off,optimize_join_buffer_size=off,table_elimination=off,extended_keys=off,exists_to_in=on,orderby_uses_equalities=off,condition_pushdown_for_derived=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
171111 02:39:49 mysqld_safe Number of processes running now: 0
171111 02:39:49 mysqld_safe mysqld restarted
2017-11-11 2:39:49 140167351113792 [Note] /usr/local/mariadb/columnstore/mysql//bin/mysqld (mysqld 10.2.10-MariaDB-log) starting as process 10222 ...
171111 2:39:49 Columnstore: Started; Version: 1.0.2-1
171111 2:39:49 InfiniDB: Started; Version: 1.0.2-1
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Uses event mutexes
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Compressed tables use zlib 1.2.7
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Using Linux native AIO
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Number of pools: 1
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Using SSE2 crc32 instructions
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Completed initialization of buffer pool
2017-11-11 2:39:49 140166276192000 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Highest supported file format is Barracuda.
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1634236
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Last binlog file '/usr/local/mariadb/columnstore/mysql/db/mysql-bin.000001', position 6930
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: 128 out of 128 rollback segments are active.
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2017-11-11 2:39:49 140167351113792 [Note] InnoDB: 5.7.20 started; log sequence number 1634245
2017-11-11 2:39:49 140165896980224 [Note] InnoDB: Loading buffer pool(s) from /usr/local/mariadb/columnstore/mysql/db/ib_buffer_pool
2017-11-11 2:39:49 140165896980224 [Note] InnoDB: Buffer pool(s) load completed at 171111 2:39:49
2017-11-11 2:39:49 140167351113792 [Note] Plugin 'FEEDBACK' is disabled.
2017-11-11 2:39:49 140167351113792 [Warning] /usr/local/mariadb/columnstore/mysql//bin/mysqld: unknown variable 'loose-server_audit_syslog_info=cols1umnstore-1'
2017-11-11 2:39:49 140167351113792 [Note] Recovering after a crash using /usr/local/mariadb/columnstore/mysql/db/mysql-bin
2017-11-11 2:39:49 140167351113792 [Note] Starting crash recovery...
2017-11-11 2:39:49 140167351113792 [Note] Crash recovery finished.
2017-11-11 2:39:49 140167351113792 [Note] Server socket created on IP: '::'.
2017-11-11 2:39:49 140167351113792 [ERROR] mysqld: Incorrect information in file: '/tmp/#sql1adb_21_5.frm'
2017-11-11 2:39:49 140167351113792 [Note] Reading of all Master_info entries succeded
2017-11-11 2:39:49 140167351113792 [Note] Added new Master_info '' to hash table
2017-11-11 2:39:49 140167351113792 [Note] /usr/local/mariadb/columnstore/mysql//bin/mysqld: ready for connections.
Version: '10.2.10-MariaDB-log' socket: '/usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock' port: 3306 Columnstore 1.1.2-1

Comment by Andrew Hutchings (Inactive) [ 2017-11-11 ]

I believe this is a regression triggered by MCOL-877 because I think we aren't handling the MariaDB VARCHAR binary length data correctly (MCOL-877 didn't break it, but it caused a crash instead of corruption).

Comment by Andrew Hutchings (Inactive) [ 2017-11-11 ]

Pull request for 1.0 and 1.1

Problem was with length casting. We cast a single byte length to int8_t so a length of greater than 127 turned into a negative. We then did an assign() on a std::string with that length which if successful would have eaten around 8-16EB of RAM (roughly as much RAM as in all computers on earth combined right now).

For QA: it breaks when LDI or INSERT...SELECT insert some text into a VARCHAR > 127 chars when the field is defined with a length of less than 255. So just doing a single LDI for partsupp would cause the crash without concurrent.

Comment by Daniel Lee (Inactive) [ 2017-11-13 ]

Builds verified: Github source for 1.0.12 and 1.1.2

1.0.12

/root/columnstore/mariadb-columnstore-server
commit a42eb6d1e74e44c9e8fd9bb8290e6ce7dbf909f5
Merge: 2965fc8 6a14ced
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Oct 3 10:12:33 2017 -0500

Merge pull request #69 from mariadb-corporation/MCOL-940

MCOL-940

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 478fb0df927b65e5c6ec0a91efef59f15f50baa8
Merge: 024c96a 766f0b8
Author: David.Hall <david.hall@mariadb.com>
Date: Mon Nov 13 08:35:21 2017 -0600

Merge pull request #318 from mariadb-corporation/MCOL-1008-1.0

MCOL-1008 Fix bad length on bulk DML insert

1.1.12

/root/columnstore/mariadb-columnstore-server
commit ed21e674cfd70db421957d0212ff7fc1835d06d5
Author: david hill <david.hill@mariadb.com>
Date: Mon Nov 6 08:41:53 2017 -0600

Update README.md

updated version

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 969b6d282616acebbea1e4a1baf20e3c2d6318c6
Merge: fda4495 854d325
Author: David.Hall <david.hall@mariadb.com>
Date: Mon Nov 13 08:35:46 2017 -0600

Merge pull request #317 from mariadb-corporation/MCOL-1008

MCOL-1008 Fix bad length on bulk DML insert

Generated at Thu Feb 08 02:25:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.