[CONJ-421] Spark error: java.sql.SQLException: Out of range value for column 'i' : value i is not in Integer range Created: 2017-02-01  Updated: 2018-08-03

Status: Open
Project: MariaDB Connector/J
Component/s: None
Affects Version/s: 1.5.7
Fix Version/s: None

Type: Task Priority: Minor
Reporter: David Thompson (Inactive) Assignee: Diego Dupin
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
PartOf
includes CONJ-423 Permit to have MySQL driver and Maria... Closed

 Description   

Install Spark 2.0 using docker for simplicity following this image:
https://hub.docker.com/r/singularities/spark/

copy the sample docker-compose.yml locally and run:
docker-compose up

After a while the master and worker will both start and logging stops. Now:

docker exec -it sparkdocker_master_1 bash
curl -O https://downloads.mariadb.com/Connectors/java/connector-java-1.5.7/mariadb-java-client-1.5.7.jar
pyspark --driver-class-path mariadb-java-client-1.5.7.jar --jars  mariadb-java-client-1.5.7.jar
 
from pyspark.sql import DataFrameReader
url = 'jdbc:mariadb://172.21.21.2:3306/test?useServerPrepStmts=false'
properties = {'user': 'root', 'driver': 'org.mariadb.jdbc.Driver', 'useServerPrepStmts':'false'}
df = DataFrameReader(sqlContext).jdbc(url='%s' % url, table='tmp1', properties=properties)
df.show()

the df.show will result in a stack trace with the error:

Caused by: java.sql.SQLException: Out of range value for column 'i' : value i is not in Integer range
        at org.mariadb.jdbc.internal.queryresults.resultset.MariaSelectResultSet.parseInt(MariaSelectResultSet.java:3233)
        at org.mariadb.jdbc.internal.queryresults.resultset.MariaSelectResultSet.getInt(MariaSelectResultSet.java:992)
        at org.mariadb.jdbc.internal.queryresults.resultset.MariaSelectResultSet.getInt(MariaSelectResultSet.java:969)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:446)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:544)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        ... 1 more

If instead i use the mysql connector (mysql-connector-java-5.1.40.tar.gz) it works.

This docker image use java 8, however i modified this to use Java 7 and rebuilt and the error still happens so not a java 8 specific issue.

Table definition is very simple:
create table tmp1 (i int, ip int);
insert into tmp1 values (1,1);



 Comments   
Comment by David Thompson (Inactive) [ 2017-02-01 ]

This also happens with spark 2.1 which is the latest version of spark.

Comment by Diego Dupin [ 2017-02-01 ]

reproduced, so i'll be able to debug the issue

Comment by Diego Dupin [ 2017-02-03 ]

back on it : somehow, sparck execute the query 'SELECT "i","ip" FROM tmp', not 'SELECT i,ip FROM tmp'
trying to see why

Comment by Diego Dupin [ 2017-02-03 ]

All works well with spark using connection string with "jdbc:mysql:...", but not using "jdbc:mariadb:..." because MySQL dialect is then not used.

when not used, defaut quote is ", not `

So, some internal query generated by spark like "SELECT `i`,`ip` FROM tmp" will then be executed as "SELECT "i","ip" FROM tmp" with dataType previously retrieved, causing the exception

I'll make a pull request to spark so "jdbc:mariadb:" connection string can be handle

Comment by Diego Dupin [ 2017-02-03 ]

(pull request is dependant of CONJ-423. waiting for CONJ-423)

Comment by Russell Spitzer [ 2017-05-25 ]

I couldn't find a corresponding Spark Jira for this. Has one been made? If so can you please link it here

Comment by Dieter Vekeman [ 2018-08-03 ]

I ran into the problem recently.

I created one now
https://issues.apache.org/jira/browse/SPARK-25013

Not sure if the patch has been submitted and is hanging somewhere?

Generated at Thu Feb 08 03:15:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.