[MDEV-9228] CONNECT: Can not load JSON files over 12MB (roughly) Created: 2015-12-02 Updated: 2015-12-10 Resolved: 2015-12-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - Connect |
| Affects Version/s: | 10.1.9, 10.0, 10.1 |
| Fix Version/s: | 10.0.23, 10.1.10 |
| Type: | Bug | Priority: | Major |
| Reporter: | Nick Fries | Assignee: | Olivier Bertrand |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows Server 2012 R2 (x64) with 16GB of memory |
||
| Attachments: |
|
| Description |
|
CONNECT is not parsing JSON files over roughly 12MB. I receive the following error when I try to select from the table- _Error Code: 1296. Got error 174 'ParseString: Out of memory' from CONNECT 0.406 sec_ I've tried every option available, and also tried increasing the amount of memory available. I am not sure if there is a way to increase the memory blocks available to 'Sarea' through either a patch or server variable? I have tried each option, including- _SET GLOBAL connect_work_size = 4199999999; I tried to use multiple files using a wildcard (this is what I really need - I have 2GB of files, smallest being under 1MB and largest being over 100MB in size.) My use case is around parsing some operations data from a custom application and just pull some counts. CONNECT seemed like it was fit for this. My data is an array of objects. My table creation statement looks like the following (I tried everything, from the most simple autodetection to manual mapping, and also trying to force the row size to see if I could get it to allocate more memory - nothing works)- CREATE TABLE `raw_sr` ( It failed loading multiple, it also fails loading a single large file. None of the options make a difference. Is CONNECT JSON designed for this use case? Is there a way to get it to allocate more memory for JSON parsing? I have 16GB, MySQL usage is around ~2GB. I don't care if it uses all the memory, I just want the data to be exposed in the database engine. Please advise. Thanks you. |
| Comments |
| Comment by Elena Stepanova [ 2015-12-02 ] | |
|
Attached a sample JSON file. Create a table as described but with this file name instead, try to run count(*) from raw_sr, observe the error. | |
| Comment by Olivier Bertrand [ 2015-12-03 ] | |
|
Thanks for raising this issue showing that CONNECT incorrectly handles memory for JSON tables. Indeed, if handling big files whose format is PRETTY=2 requires a big memory to parse the whole file, PRETTY = 0 and PRETTY=1 tables should not have problems because the parsing is done only for the currently read record. Unfortunately, CONNECT presently does not recover the storage used by parsing between records. This is a bug that must and will be fixed in next MariaDB releases. (the same for multiple files) Meanwhile, the only workaround is to increase the memory used by CONNECT. For instance doing:
I was able to handle the attached file. Count a memory approximatively 6 to 7 times bigger than the jason file size. Note: A table based on the attached file should specify PRETTY=1 and JMODE=0. | |
| Comment by Nick Fries [ 2015-12-03 ] | |
|
Hi Elena / Olivier, Thanks for confirming the issue. I did not realize that the human readable JSON would be a poor design choice for my application - that's something I can fix on my side. My use for CONNECT isn't mission critical - just seemed like it could work in a pinch to parse some JSON and summarize the data without having to read / transform / insert. Looking forward to the fix and I will keep in mind not to use human readable JSON formatting in the future. | |
| Comment by Olivier Bertrand [ 2015-12-10 ] | |
|
Hi Nick, After this fix, all tests about json pass on buildbot. However I could not make tests on really big files because I don't have any. I would appreciate if you could spend some spare time to test this fix with your data files and let me know the result. BTW, I would not be so drastic about using readable Json data. It is a big gain on most json data that are of reasonable size. Of course in your case it is different because of the size of your application data. |