[MDEV-12477] MariaDB can't connect 100MB JSON file Created: 2017-04-10  Updated: 2017-04-12

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.1.21
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Simon Tulett Assignee: Olivier Bertrand
Resolution: Unresolved Votes: 1
Labels: connect-engine, json, memory
Environment:

Windows 10, 64 Bit, 16GB RAM



 Description   

When trying to load a JSON file produce by mongodump in the format where each row is a separate JSON document as opposed to a single array I hit an out of memory error.

I am using the following syntax;

CREATE TABLE test
ENGINE=CONNECT
table_type=JSON
file_name='test.json'
lrecl=128
option_list='level=2, PRETTY=0'
catfunc= COLUMNS;

I have experimented with lrecl sizes of up to 4096 but it made no difference and eventually crashed the database engine. Since crashing the database engine I now get a different error that I can't get rid of even with smaller files now.

Below is some configuration detail and the stack trace from the error logs;

Server version: 10.1.21-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=1001
thread_count=1

ha_connect.dll!_setjmp()[setjmp.asm:95]
ha_connect.dll!ParseJson()[json.cpp:62]
ha_connect.dll!TDBJSN::ReadDB()[tabjson.cpp:745]
ha_connect.dll!JSONColumns()[tabjson.cpp:180]
ha_connect.dll!TDBCAT::Initialize()[table.cpp:436]
ha_connect.dll!TDBCAT::OpenDB()[table.cpp:493]
ha_connect.dll!CntOpenTable()[connect.cc:360]
ha_connect.dll!ha_connect::OpenTable()[ha_connect.cc:1877]
ha_connect.dll!ha_connect::rnd_init()[ha_connect.cc:3761]
mysqld.exe!handler::ha_rnd_init_with_error()[handler.cc:2754]
mysqld.exe!init_read_record()[records.cc:281]
mysqld.exe!join_init_read_record()[sql_select.cc:19222]
mysqld.exe!sub_select()[sql_select.cc:18309]
mysqld.exe!do_select()[sql_select.cc:17967]
mysqld.exe!JOIN::exec_inner()[sql_select.cc:3220]
mysqld.exe!JOIN::exec()[sql_select.cc:2511]
mysqld.exe!mysql_select()[sql_select.cc:3446]
mysqld.exe!handle_select()[sql_select.cc:384]
mysqld.exe!execute_sqlcom_select()[sql_parse.cc:5896]
mysqld.exe!mysql_execute_command()[sql_parse.cc:2971]
mysqld.exe!mysql_parse()[sql_parse.cc:7324]
mysqld.exe!dispatch_command()[sql_parse.cc:1491]
mysqld.exe!do_command()[sql_parse.cc:1109]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:271]
mysqld.exe!io_completion_callback()[threadpool_win.cc:568]
KERNEL32.DLL!TermsrvOpenRegEntry()
ntdll.dll!RtlReleasePebLock()
ntdll.dll!RtlReleaseSRWLockExclusive()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()



 Comments   
Comment by Simon Tulett [ 2017-04-10 ]

I decided to try a simplified version, not using the catfunc option and just importing two columns from my collection but now I get an unexpected EOF error thrown. the JSON file I am trying to connect is coming straight from mongoexport and has been untouched by me.

CREATE TABLE test
(
firstName char(32) field_format='clients:[x]:firstName',
lastName char(32) field_format='clients:[x]:lastName'
)
engine=CONNECT
table_type=JSON
file_name='test.json'
lrecl=128
option_list='pretty=0, level=2';

select
*
from test

The exact error message is.

SQL Error (1296): Got error 122 'Unexpected EOF in String' from CONNECT

This doesn't appear to have triggered a stack trace in the .err file

Comment by Simon Tulett [ 2017-04-10 ]

One further update on this, the EOF issues appear to be something to do with the format of the file. I tried using the mongoexport command line parameters --skip and --limit to create a smaller pretty=0 file and attempt an import and I hit an error

Unexpected character near f near d":f which appears to be referring to the first character of the value false which is stored in one of my fields? Can MariaDB not handle JSON with true and false unquoted?

Comment by Simon Tulett [ 2017-04-11 ]

I've been playing around with a number of different parameters today to try and narrow this issue down.

I have discovered that I can successfully create a table when using a PRETTY=2 json file, but can't load more than a few 100 rows of data without hitting an out of memory problem as expected, but when I use the same create table specification for pretty=0 or pretty=1 format files, when trying to select data from the table no matter how big the file is or what mongo collection I have exported from the source database or how small a column set I limit the create table statement to I always get the same result,

ERROR 1296 (HY000): Got error 122 'Unexpected EOF in String' from CONNECT.

My current create statement below;

CREATE TABLE test
(
firstName char(150) field_format='clients:[x]:firstName',
lastName char(150) field_format='clients:[x]:lastName'
)
engine=CONNECT
DATA_CHARSET='UTF8'
HUGE=1
table_type=JSON
file_name='test.json'
lrecl=512
option_list='pretty=0';

I have also updated some config parameters in an attempt to get this working, they are as follows;

[mysqld]
connect_work_size=134217728
connect_use_tempfile=YES
connect_json_grp_size=1000

Comment by Simon Tulett [ 2017-04-11 ]

Hopefully my final update on this. In the end I managed to get it to work by increasing the lrecl size to 2097152. The collections I was trying to load had images stored in them.

Although this now does work it seems to be working intermittently. Sometimes the select statement will work and return the results other times mysqld will crash with the following stack trace.

ha_connect.dll!_setjmp()[setjmp.asm:95]
ha_connect.dll!ParseJson()[json.cpp:62]
ha_connect.dll!TDBJSN::ReadDB()[tabjson.cpp:745]
ha_connect.dll!JSONColumns()[tabjson.cpp:180]
ha_connect.dll!TDBCAT::Initialize()[table.cpp:436]
ha_connect.dll!TDBCAT::OpenDB()[table.cpp:493]
ha_connect.dll!CntOpenTable()[connect.cc:360]
ha_connect.dll!ha_connect::OpenTable()[ha_connect.cc:1877]
ha_connect.dll!ha_connect::rnd_init()[ha_connect.cc:3761]
mysqld.exe!handler::ha_rnd_init_with_error()[handler.cc:2754]
mysqld.exe!init_read_record()[records.cc:281]
mysqld.exe!join_init_read_record()[sql_select.cc:19222]
mysqld.exe!sub_select()[sql_select.cc:18309]
mysqld.exe!do_select()[sql_select.cc:17967]
mysqld.exe!JOIN::exec_inner()[sql_select.cc:3220]
mysqld.exe!JOIN::exec()[sql_select.cc:2511]
mysqld.exe!mysql_select()[sql_select.cc:3446]
mysqld.exe!handle_select()[sql_select.cc:384]
mysqld.exe!execute_sqlcom_select()[sql_parse.cc:5896]
mysqld.exe!mysql_execute_command()[sql_parse.cc:2971]
mysqld.exe!mysql_parse()[sql_parse.cc:7324]
mysqld.exe!dispatch_command()[sql_parse.cc:1491]
mysqld.exe!do_command()[sql_parse.cc:1109]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:271]
mysqld.exe!io_completion_callback()[threadpool_win.cc:568]
KERNEL32.DLL!TermsrvOpenRegEntry()
ntdll.dll!RtlReleasePebLock()
ntdll.dll!RtlReleaseSRWLockExclusive()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()

Comment by Olivier Bertrand [ 2017-04-11 ]

Indeed, 'Unexpected EOF in String' refers to a LRECL that is smaller than one MongoDB document.

About the error on "ha_connect.dll!_setjmp()[setjmp.asm:95]" this is a strange error that I experienced recently, meaning since I use visual studio 15. It may be a compiler error. By the way, which compiler are you using?

Finally I fixed it by getting rid of all these setjmp/longjmp. CONNECT is now using try/catch and throw to achieve the same goal.

I am not sure of which MariaDB versions include or will include that change.

Anyway, could you attach the json files you exported from MongoDB so I can do some tests with them?

For your information: I am working on a new version of CONNECT that will directly work on MongoDB collections via their MongoDB C Driver. This is not distributed yet, and as all new additional features, will probably be distributed only with MariaDB 10.2 and 10.3.

Comment by Simon Tulett [ 2017-04-11 ]

I'm on Windows and am using the pre compiled msi and dll.
Good news on the direct connection to MongoDB, I look forward to that feature.

I can't really pass on the exact json I'm using without anonymizing it 1st but will see what I can do and upload when I can.

Out of interest I downloaded the most recent msi 10.2.5 using connect version 1.05.0003 and am hitting a similar but different error, so it looks like the latest pre compiled version of the plugin doesn't have your try catch fix in it yet.

ha_connect.dll!__intrinsic_setjmp()
ha_connect.dll!ParseJson()[json.cpp:97]
ha_connect.dll!TDBJSN::ReadDB()[tabjson.cpp:760]
ha_connect.dll!CntReadNext()[connect.cc:464]
ha_connect.dll!ha_connect::rnd_next()[ha_connect.cc:3884]
mysqld.exe!handler::ha_rnd_next()[handler.cc:2584]
mysqld.exe!rr_sequential()[records.cc:484]
mysqld.exe!sub_select()[sql_select.cc:18454]
mysqld.exe!do_select()[sql_select.cc:18001]
mysqld.exe!JOIN::exec_inner()[sql_select.cc:3465]
mysqld.exe!JOIN::exec()[sql_select.cc:3267]
mysqld.exe!mysql_select()[sql_select.cc:3660]
mysqld.exe!handle_select()[sql_select.cc:361]
mysqld.exe!execute_sqlcom_select()[sql_parse.cc:6431]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3448]
mysqld.exe!mysql_parse()[sql_parse.cc:7879]
mysqld.exe!dispatch_command()[sql_parse.cc:1814]
mysqld.exe!do_command()[sql_parse.cc:1361]
mysqld.exe!threadpool_process_request()[threadpool_common.cc:346]
mysqld.exe!tp_callback()[threadpool_common.cc:192]
ntdll.dll!RtlReleaseSRWLockExclusive()
ntdll.dll!RtlReleaseSRWLockExclusive()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()

Do you have any indication at all when this may be available in beta?

Comment by Olivier Bertrand [ 2017-04-12 ]

Do you have any indication at all when this may be available in beta
This does not depend on me.

About direct connection to MongoDB: there is little chance that the machines generating the pre-compiled MariaDB versions have the MongoDB C Driver installed. To use this feature, it will be probably necessary to download and compile MariaDB from source.

Generated at Thu Feb 08 07:58:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.