[MCOL-1091] crash with large writes on java binding of write sdk Created: 2017-12-09 Updated: 2023-10-26 Resolved: 2018-01-30 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.1.2 |
| Fix Version/s: | 1.1.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | David Thompson (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Sprint: | 2017-25, 2018-01, 2018-02, 2018-03 |
| Description |
|
I got round to porting the million row test to java and it seems to crash consistently both on 1.1.2 and develop-1.1. It seems to be fine up to about 700-800k rows then after that it crashes. Will attach the java crash file but the stack is:
use branch java_crash to reproduce and then run ./gradlew --info test from the java directory |
| Comments |
| Comment by Jens Röwekamp (Inactive) [ 2017-12-09 ] | |||||||||
|
I got following error when executing ./gradelw --info test on Debian 8.9 > Task :test :test (Thread[Daemon worker Thread 4,5,main]) completed. Took 2.216 secs. | |||||||||
| Comment by David Thompson (Inactive) [ 2017-12-09 ] | |||||||||
|
This must be tied to the java binding and maybe some sort of visiblity / gc issue. I tried breaking the logic into 2 smaller commits and it works. And it continued to work with larger commit size so i modifiy the outer loop to just be 1 loop so it should be functionaly equivalent to the old and it still works. However commenting out the outer j loop it crashes again: | |||||||||
| Comment by David Thompson (Inactive) [ 2017-12-12 ] | |||||||||
|
Changing my vm to have more memory makes the problem go away. The way that JNI works is that any c++ heap allocation is in addition to the java heap. This is likely contending with the single server columnstore running on my dev vm. Still it is concerning that python can handle this just fine so i think this is likely still caught up in java gc and how swig objects are mapped. Swig does handle java in a different way than most others. Also on combined deployments probably also reducing the cs memory settings would likely also work. | |||||||||
| Comment by Jens Röwekamp (Inactive) [ 2018-01-22 ] | |||||||||
|
Bug isn't JDK specific. Reproducible on OracleJDK and OpenJDK. | |||||||||
| Comment by Jens Röwekamp (Inactive) [ 2018-01-23 ] | |||||||||
|
Seems to be garbage collection related, after manually suggesting the JVM to gc after every 1000 rows [System.gc()] the bug also occurs on row sizes where it before didn't occur. Seems that gc kills a C object which is internally still used. cf. http://www.swig.org/Doc3.0/Java.html#Java_memory_management_member_variables Relevant code to look at seems to be mcsapi::ColumnStoreSystemCatalogTable and mcsapi::ColumnStoreSystemCatalogColumn. | |||||||||
| Comment by Jens Röwekamp (Inactive) [ 2018-01-23 ] | |||||||||
|
Was able to narrow it down to the premature gc of ColumnStoreDriver that causes trouble.
Temporary ad hoc fix is to call ColumnStoreDriver.getVersion() at the end of your program to keep it in memory. | |||||||||
| Comment by Jens Röwekamp (Inactive) [ 2018-01-24 ] | |||||||||
|
Fixed the premature garbage collection of ColumnStoreDriver by adding a reference of ColumnStoreDriver into ColumnStoreBulkInsert and ColumnStoreSystemCatalog. Therefore, changed the Swig build configuration in javamcsapi.i To test that the garbage collection doesn't collect ColumnStoreDriver any more one can unzip attached debug.zip into mariadb-columnstore-api/example/debug and build the test file using ./gradlew build. Afterwards go to build/libs and execute java -verbose:gc -verbose:jni -jar debug-all.jar and verify. | |||||||||
| Comment by Daniel Lee (Inactive) [ 2018-01-30 ] | |||||||||
|
Build verified: Github source for ColumnStore and API [root@localhost ~]# cat mariadb-columnstore-1.1.3-1-centos7.x86_64.bin.tar.txt Merge pull request #90 from mariadb-corporation/cpackFix Change how fix to cpack for provides/requires is done. /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Merge pull request #385 from mariadb-corporation/ Mcol 1137 [root@localhost mariadb-columnstore-api]# git show Merge pull request #43 from mariadb-corporation/ Mcol 1177 Verified API tests |