[MCOL-1995] javamcsapi - MillionRow test fails on Ubuntu 16.04 and 18.04 Created: 2018-11-30  Updated: 2023-10-26  Resolved: 2019-03-21

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2.2
Fix Version/s: 1.2.3

Type: Bug Priority: Major
Reporter: Jens Röwekamp (Inactive) Assignee: Andrew Hutchings (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 16.04, Ubuntu 18.04


Attachments: Text File javamcsapi Ubuntu18 MillionRowError.txt    
Issue Links:
Relates
relates to MCOL-2252 million-row java test is not included... Closed
Sprint: 2018-21, 2019-01, 2019-02, 2019-03

 Description   

Javamcsapi's MillionRow test fails consistently on Ubuntu operating systems. Debian, CentOS and Windows don't have that issue. I assume some kind of premature garbage collection as it appears around the same number of writeRows() on different Ubuntus.

See attached detailed test execution log from Ubuntu 18.04.
Regression test suite output excerpt:

32: Test command: /home/jens/mariadb-columnstore-api/java/gradlew "-p" "/home/jens/mariadb-columnstore-api/java" "-Pversion=1.2.2" "-Pjava.library.path=/home/jens/mariadb-columnstore-api/build/java" "test"
32: Test timeout computed to be: 1500
32: Starting a Gradle Daemon (subsequent builds will be faster)
32: > Task :compileJava UP-TO-DATE
32: > Task :processResources NO-SOURCE
32: > Task :classes UP-TO-DATE
32: Download https://jcenter.bintray.com/junit/junit/4.12/junit-4.12.pom
32: Download https://jcenter.bintray.com/org/mariadb/jdbc/mariadb-java-client/2.1.2/mariadb-java-client-2.1.2.pom
32: Download https://jcenter.bintray.com/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.pom
32: Download https://jcenter.bintray.com/org/hamcrest/hamcrest-parent/1.3/hamcrest-parent-1.3.pom
32: Download https://jcenter.bintray.com/junit/junit/4.12/junit-4.12.jar
32: Download https://jcenter.bintray.com/org/mariadb/jdbc/mariadb-java-client/2.1.2/mariadb-java-client-2.1.2.jar
32: Download https://jcenter.bintray.com/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
32: > Task :compileTestJava
32: > Task :processTestResources NO-SOURCE
32: > Task :testClasses
32:
32: > Task :test
32: terminate called after throwing an instance of 'mcsapi::ColumnStoreServerError'
32:   what():  Error rolling back BRM
32:
32: > Task :test FAILED
32:
32: FAILURE: Build failed with an exception.
32:
32: * What went wrong:
32: Execution failed for task ':test'.
32: > Process 'Gradle Test Executor 1' finished with non-zero exit value 134
32:   This problem might be caused by incorrect test process configuration.
32:   Please refer to the test execution section in the user guide at https://docs.gradle.org/4.8/userguide/java_plugin.html#sec:test_execution
32:
32: * Try:
32: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
32:
32: * Get more help at https://help.gradle.org
32:
32: BUILD FAILED in 27s
32: 3 actionable tasks: 2 executed, 1 up-to-date
32/36 Test #32: Java_BasicTest .........................***Failed   27.69 sec

It is strange though, that a manually compiled million row test with following example succeeds without errors on the Ubuntus':

import com.mariadb.columnstore.api.*;
 
public class MCSAPITest {
 
        public static void main(String[] args) {
        ColumnStoreDriver d = new ColumnStoreDriver();
        ColumnStoreBulkInsert b = d.createBulkInsert("test", "t1", (short)0, 0);
        long rows = 100000000;
        try {
            for(long i=0; i < rows; i++){
                b.setColumn(0, i);
                b.setColumn(1, rows-i);
                if (i % 1000 == 0){
                   System.out.println(i);
                }
                b.writeRow();
            }
            b.commit();
        }
        catch (ColumnStoreException e) {
            b.rollback();
            e.printStackTrace();
        }
    }
}



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2019-01-04 ]

I was able to identify the root-cause of the bug. It is not very severe as it's not caused by premature garbage collection as in MCOL-1091 and further only effects our testing pipeline.

With MCOL-1094 view and clear table lock functionality was introduced. Also tests to verify this functionality were included. One of this tests causes (under certain conditions) this error encountered.

In particular it's the test ``java/src/test/java/com/mariadb/columnstore/api/LockTest.java`` testClearTableLock() that is responsible.

This test checks if the clearTableLock() function works properly. It therefore initializes two ColumnStoreBulkInsert objects b1, b2 and the ColumnStoreDriver d. The tests works more or less in four steps:

  • In step 1, b1 is used to lock a table and write some junk data to it, but not commit it
  • In step 2, d is used to clear the table lock of the table b1 wrote to and rolls back its changes
  • In step 3, b2 is used to write some reference data to the table and commit it
  • And finally in step 4, the test verifies that only the data written by b2 is written to the table and that it is an unlocked state

This works fine on its own, as the test is small enough to don't trigger the garbage collector.

But as Gradle executes the tests sequentially in one JVM based on the file order determined by the operating system, the MillionRowTest can be executed after the testClearTableLock() test. In this case the MillionRowTests executes long enough to trigger Java's garabage collector and the bug occurres. The garbage collector collects former b1 and therefore triggers its C++ destructor. As b1 already wrote some data to the table and didn't commit, it tries to roll it back. But, as d already performed the rollback and cleared b1's lock on the table the error occures and the test crashes ungracefully.
But as the test pipeline is already in the MillionRowTest, it seems that the MillionRowTest failed and not the testClearTableLock() test.

To deal with this bug, we have a couple of possibilities.

  • make sure that the clearTableLock() test is always executed at the end. Therefore, no garbage collection would trigger the error. (This seems hacky and I haven't found any documentation if it is possible through gradle)
  • change the destructor of ColumnStoreBulkInsert to crash more gracefully. (could be a viable option)
  • create b1 in a different Swig memory mode so that it won't be automatically garbage collected. (Tried that, but without success. One can only change the entire memory mode which we don't want)
  • lock the table through a different command (we have to implement) and don't misuse ColumnStoreBulkInsert for it. (could be a viable option)
Comment by Jens Röwekamp (Inactive) [ 2019-01-07 ]

Outsourced the Java test LockTest.java testClearTableLock() into a separate Gradle test so that its first ColumnStoreBulkInsert object won't automatically be garbage collected and triggers an error later on.

Further added debug printouts for constructor and destructor calls for ColumnStoreDriver and ColumnStoreBulkInsert, so that this kind of errors can be detected more easily from a debug log.

Java test suite successfully executed on CentOS 7 and Windows 10.

For QA

  • execute mcsapi's regression test suite on Debian, Ubuntu, CentOS and Windows.
Generated at Thu Feb 08 02:32:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.