[CONJ-369] Encoding on clob column Created: 2016-10-13  Updated: 2016-10-25  Resolved: 2016-10-20

Status: Closed
Project: MariaDB Connector/J
Component/s: Other
Affects Version/s: 1.4.6, 1.5.3, 1.5.4
Fix Version/s: 1.5.5

Type: Bug Priority: Major
Reporter: Peter Rosendahl Assignee: Diego Dupin
Resolution: Fixed Votes: 0
Labels: None
Environment:

$ mysql -V
mysql Ver 15.1 Distrib 10.0.27-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2


Issue Links:
Relates
relates to CONJ-253 Text encoding concorrect on blob column Closed

 Description   

We seem to have a similar problem to CONJ-253. When writing CLOBs (longtext) to a latin1 table, the driver doesn't use latin1 encoding.

The following test works with 1.2.3 and fails with 1.4.6, 1.5.3, 1.5.4.

Using useUnicode=false&characterEncoding=latin1 in the jdbc url makes no difference.

package de.kdo.mariadbtest;
 
import static org.junit.Assert.assertEquals;
 
import java.sql.Clob;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.stream.Collectors;
 
import org.junit.Test;
 
public class Latin1ClobTest {
 
   /**
    * 
    * Writes and reads a clob (longtext) of a latin1 table.
    * 
    * @throws SQLException
    *
    */
   @Test
   public void insertAndSelectShouldBothUseLatin1Encoding() throws SQLException {
      // German Umlaute (ÄÖÜ) U+00C4, U+00D6, U+00DC
      final String latin1String = "\u00c4\u00d6\u00dc";
 
      try (Connection connection = DriverManager.getConnection("jdbc:mariadb://localhost:3306/testlatin1?user=mft&password=mft")) {
 
         connection.prepareStatement("CREATE TABLE foo (x longtext) DEFAULT CHARSET=latin1").execute();
 
         final Clob insertClob = connection.createClob();
 
         insertClob.setString(1, latin1String);
         final String insertSQL = "INSERT INTO foo " + "VALUES(?)";
         PreparedStatement pstmt = connection.prepareStatement(insertSQL);
         pstmt.setClob(1, insertClob);
         pstmt.executeUpdate();
 
         final String selectSQL = "select x " + "from foo";
         pstmt = connection.prepareStatement(selectSQL);
         ResultSet rs = pstmt.executeQuery();
         rs.next();
         final Clob retrieveClob = rs.getClob(1);
 
         final String stringFromSelect = retrieveClob.getSubString(1, (int) retrieveClob.length());
 
         System.out.printf("INSERT: %s %s\n", latin1String,
               latin1String.codePoints().mapToObj(Integer::toHexString).collect(Collectors.toList()));
         System.out.printf("SELECT: %s %s\n", stringFromSelect, stringFromSelect.codePoints().mapToObj(Integer::toHexString)
               .collect(Collectors.toList()));
 
         assertEquals(latin1String, stringFromSelect);
 
      } catch (SQLException e) {
         throw e;
      } finally {
      }
   }
}

Output:

INSERT: ÄÖÜ [c4, d6, dc]
SELECT: ÄÖÜ [c3, 201e, c3, 2013, c3, 153]

This is what's actually written to the table:

MariaDB [testlatin1]> select x, hex(x) from foo;
+----------------+--------------+
| x              | hex(x)       |
+----------------+--------------+
| ÄÖÜ         | C384C396C39C |
+----------------+--------------+
1 row in set (0.00 sec)

"C384C396C39C" is the UTF-8 encoded form of "ÄÖÜ", so the INSERT seems to write UTF-8 instead of latin1.



 Comments   
Comment by Diego Dupin [ 2016-10-19 ]

issue confirmed.
This occur when useServerPrepStmts is set to true (and so using binary protocol).
This work on 1.2.3 since useServerPrepStmts default value was false.

In binary format, Clob data are send in UTF-8 format and not encoded according to Table/column charset.
(MySQL Driver has the same issue when option useServerPrepStmts is set to true).

Generated at Thu Feb 08 03:15:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.