[MCOL-1091] crash with large writes on java binding of write sdk Created: 2017-12-09  Updated: 2023-10-26  Resolved: 2018-01-30

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.2
Fix Version/s: 1.1.3

Type: Bug Priority: Major
Reporter: David Thompson (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Zip Archive debug.zip     Text File hs_err_pid226487.log    
Sprint: 2017-25, 2018-01, 2018-02, 2018-03

 Description   

I got round to porting the million row test to java and it seems to crash consistently both on 1.1.2 and develop-1.1. It seems to be fine up to about 700-800k rows then after that it crashes. Will attach the java crash file but the stack is:

Stack: [0x00007fdec2a8d000,0x00007fdec2b8e000],  sp=0x00007fdec2b8a780,  free space=1013k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libmcsapi.so.1+0xa1b3]  mcsapi::ColumnStoreSystemCatalogColumn::ColumnStoreSystemCatalogColumn(mcsapi::ColumnStoreSystemCatalogColumn const&)+0xb3
C  [libmcsapi.so.1+0xe6b7]  mcsapi::ColumnStoreBulkInsert::setColumn(unsigned short, long, mcsapi::columnstore_data_convert_status_t*)+0x67
J 703  com.mariadb.columnstore.api.javamcsapiJNI.ColumnStoreBulkInsert_setColumn__SWIG_9(JLcom/mariadb/columnstore/api/ColumnStoreBulkInsert;II)J (0 bytes) @ 0x00007fded92d1b95 [0x00007fded92d1ac0+0xd5]
J 743% C2 com.mariadb.columnstore.api.MillionRowTest.testLoadMillionRows()V (310 bytes) @ 0x00007fded92fc9c4 [0x00007fded92fc840+0x184]

use branch java_crash to reproduce and then run ./gradlew --info test from the java directory



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2017-12-09 ]

I got following error when executing ./gradelw --info test on Debian 8.9

> Task :test
Putting task artifact state for task ':test' into context took 0.0 secs.
Executing task ':test' (up-to-date check took 0.024 secs) due to:
Task has failed previously.
Starting process 'Gradle Test Executor 4'. Working directory: /home/jens/mariadb-columnstore-api/java Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.library.path=/home/jens/mariadb-columnstore-api/java -Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager -Dorg.gradle.native=false -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant -ea -cp /home/jens/.gradle/caches/4.2.1/workerMain/gradle-worker.jar worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test Executor 4'
Successfully started process 'Gradle Test Executor 4'
terminate called after throwing an instance of 'mcsapi::ColumnStoreUsageError'
what(): Not all the columns for this row have been filled

:test (Thread[Daemon worker Thread 4,5,main]) completed. Took 2.216 secs.

Comment by David Thompson (Inactive) [ 2017-12-09 ]

This must be tied to the java binding and maybe some sort of visiblity / gc issue. I tried breaking the logic into 2 smaller commits and it works. And it continued to work with larger commit size so i modifiy the outer loop to just be 1 loop so it should be functionaly equivalent to the old and it still works. However commenting out the outer j loop it crashes again:
https://github.com/mariadb-corporation/mariadb-columnstore-api/commit/bb4ec76c60961d7ec8e73e239cd4a949f1e7db60#diff-49ad0d3aee5329d4768a3a8a4c9f67e2

Comment by David Thompson (Inactive) [ 2017-12-12 ]

Changing my vm to have more memory makes the problem go away. The way that JNI works is that any c++ heap allocation is in addition to the java heap. This is likely contending with the single server columnstore running on my dev vm. Still it is concerning that python can handle this just fine so i think this is likely still caught up in java gc and how swig objects are mapped. Swig does handle java in a different way than most others. Also on combined deployments probably also reducing the cs memory settings would likely also work.

Comment by Jens Röwekamp (Inactive) [ 2018-01-22 ]

Bug isn't JDK specific. Reproducible on OracleJDK and OpenJDK.

Comment by Jens Röwekamp (Inactive) [ 2018-01-23 ]

Seems to be garbage collection related, after manually suggesting the JVM to gc after every 1000 rows [System.gc()] the bug also occurs on row sizes where it before didn't occur. Seems that gc kills a C object which is internally still used. cf. http://www.swig.org/Doc3.0/Java.html#Java_memory_management_member_variables

Relevant code to look at seems to be mcsapi::ColumnStoreSystemCatalogTable and mcsapi::ColumnStoreSystemCatalogColumn.

Comment by Jens Röwekamp (Inactive) [ 2018-01-23 ]

Was able to narrow it down to the premature gc of ColumnStoreDriver that causes trouble.
Seems like ColumnStoreDriver is storing an object that is needed but can't be accessed after it is garbage collected.

[Dynamic-linking native method com.mariadb.columnstore.api.javamcsapiJNI.new_ColumnStoreDriver__SWIG_1 ... JNI]
[Dynamic-linking native method com.mariadb.columnstore.api.javamcsapiJNI.ColumnStoreDriver_createBulkInsert ... JNI]
[Dynamic-linking native method com.mariadb.columnstore.api.javamcsapiJNI.ColumnStoreBulkInsert_setColumn__SWIG_9 ... JNI]
[Dynamic-linking native method com.mariadb.columnstore.api.javamcsapiJNI.ColumnStoreBulkInsert_writeRow ... JNI]
[GC (Allocation Failure)  8704K->5158K(31744K), 0.0100784 secs]
[GC (Allocation Failure)  13862K->13430K(40448K), 0.0114439 secs]
[GC (Allocation Failure)  30838K->30558K(48128K), 0.0418677 secs]
[Full GC (Ergonomics)  30558K->29929K(81408K), 0.2503217 secs]
[Dynamic-linking native method com.mariadb.columnstore.api.javamcsapiJNI.delete_ColumnStoreDriver ... JNI]

Temporary ad hoc fix is to call ColumnStoreDriver.getVersion() at the end of your program to keep it in memory.

Comment by Jens Röwekamp (Inactive) [ 2018-01-24 ]

Fixed the premature garbage collection of ColumnStoreDriver by adding a reference of ColumnStoreDriver into ColumnStoreBulkInsert and ColumnStoreSystemCatalog.

Therefore, changed the Swig build configuration in javamcsapi.i

To test that the garbage collection doesn't collect ColumnStoreDriver any more one can unzip attached debug.zip into mariadb-columnstore-api/example/debug and build the test file using ./gradlew build.

Afterwards go to build/libs and execute java -verbose:gc -verbose:jni -jar debug-all.jar and verify.

Comment by Daniel Lee (Inactive) [ 2018-01-30 ]

Build verified: Github source for ColumnStore and API

[root@localhost ~]# cat mariadb-columnstore-1.1.3-1-centos7.x86_64.bin.tar.txt
/root/columnstore/mariadb-columnstore-server
commit f56e806556ff344effbc8fa1c9378e7c0d4a58f7
Merge: 9c81c64 446e85f
Author: david hill <david.hill@mariadb.com>
Date: Mon Jan 29 10:57:32 2018 -0600

Merge pull request #90 from mariadb-corporation/cpackFix

Change how fix to cpack for provides/requires is done.

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 1e56c7f41add94d024a20ca8f13ee55ed2b7a7f3
Merge: c03c4bc bd5daf2
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Wed Jan 24 17:39:32 2018 -0600

Merge pull request #385 from mariadb-corporation/MCOL-1137

Mcol 1137

[root@localhost mariadb-columnstore-api]# git show
commit 55c2adb225900c9d17c5f7b95eafbed5e3108bfd
Merge: 38ed340 e73b0f9
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Fri Jan 26 10:07:07 2018 +0200

Merge pull request #43 from mariadb-corporation/MCOL-1177

Mcol 1177

Verified API tests

Generated at Thu Feb 08 02:26:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.