End to end UTF-8 encoding usage with MySql and Spring

4 min read >

End to end UTF-8 encoding usage with MySql and Spring

Engineering Insights & Enterprise solutions

Setting up a solution to store, manage and display UTF-8 data using MySql was quite a challenge. There are a few steps that you have to get right, otherwise, garbage data will come out at various stages in your application.

Inserting the data

First, make sure your data is UTF-8. You can easily check this with a text editor or maybe your favorite IDE. Next, create the database and tables specifying the character encoding is UTF-8. Here is a sample.

CREATE TABLE `word` ( `Id` int(11) NOT NULL auto_increment, `Name` varchar(30) NOT NULL default ”, `Dname` varchar(40) NOT NULL default ”, UNIQUE KEY `counter` (`Id`), KEY `alfabetic` (`Name`), KEY `dname` (`Dname`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Unfortunately, I have not found a Mysql tool to handle UTF-8 files properly. Opening such a file would just result in unreadable characters in the tool display and also in the database. Still, using the command line console would result in the same unreadable characters. The solution: make sure you are setting the character set of the imported data. You can do that using:

SET NAMES UTF-8;

Selecting inserted data

The Mysql JDBC driver allows setting several parameters as a connection string parameter. The connection URL would look like this:

connection.url=jdbc:mysql://localhost/t1?useUnicode=true&characterEncoding=UTF-8

Still, setting up Spring using default BasicDataSource would do affect character encoding settings.

<bean id=”dataSource” class=” org.apache.commons.dbcp.BasicDataSource”>
<property name=”driverClassName” value=”${connection.driver_class}”/>
<property name=”url” value=”${connection.url }”/>
<property name=”username” value=”${connection.username}”/>
<property name=”password” value=”${connection.password}”/>
</bean>

In order to check used parameters for a particular connection, check using a statement like: “SHOW VARIABLES LIKE ‘char%’” Most variables should be UTF-8, at least the client connection property must be UTF-8. In order to have it working, I’ve added a custom data source that would directly add parameters to the driver when creating the connection:

public class CustomDataSource extends BasicDataSource {
public void init() {
addConnectionProperty(“useUnicode”, “true”);
addConnectionProperty(“characterEncoding”, “UTF-8”);
}
}

That can be easily wired in Spring same as the data source above, only with an “init-method” definition that would add the desired connection properties. The really hard part, that took me a while to figure out, was selecting data using a UTF-8 string in the WHERE clause. It all comes down to the collation Mysql is using in comparing the data. Without any collation specified, searching a UTF-8 string would result in ignoring the accents or special characters and match more results.

The solution, specify the collation when you are doing the query: select Id, Name, Dname from the word where w.Dname=#param# COLLATE utf8_bin;

Display data in a web page

Spring MVC proves a simple solution for implementing the web layer. There are several things to be done: – set proper encodings in JSP pages. This can be easily done with a filter or by including the following in all your pages.

<%@ page language=”java” pageEncoding=”UTF-8″ contentType=”text/html;charset=utf-8″ isELIgnored=”false”%>

– set meta-information in the “head” section of the JSP file <META http-equiv=”Content-Type” content=”text/html; Submit and handle data from a web page

UTF-8 characters have to be URL encoded before submission, this is easily done in three steps.

First, set the URIEncoding property in the Connector definition:

<Connector port=”8080″ URIEncoding=”UTF-8″ useBodyEncodingForURI=”true” maxThreads=”150″ minSpareThreads=”25″ maxSpareThreads=”75″ enableLookups=”false” redirectPort=”8443″ acceptCount=”100″ debug=”0″ connectionTimeout=”20000″ disableUploadTimeout=”true” />

That allows the conversion of URLs to UTF-8 and correct submission of parameters using the GET method. Second, use encodeURIComponent() in your JavaScript code to encode all submitted data. Without this, even if your page is UTF-8, your parameters will not be handled properly (at least not by IE, Firefox seems to have no problem with this). Third, do not read any parameters in your Controller classes unless you’ve set the right encoding for your request. This can be done using a filter or simply by setting:

request.setCharacterEncoding(“UTF-8”);

Versions: Mysql 5.0.19/ Mysql Connector 3.1.13/ Spring 1.2.7