Setting up a solution to store, manage and display UTF-8 data using MySql was quite a challenge.There are some a few steps that you have to get right, otherwise garbage data will come out at various stages in your application.

Inserting the data

First, make sure your data is UTF-8. You can easily check this with a text editor or maybe your favourite IDE. Next, create the database and tables specifying the character encoding as UTF-8. Here is a sample.

CREATE TABLE `word` ( `Id` int(11) NOT NULL auto_increment, `Name` varchar(30) NOT NULL default ”, `Dname` varchar(40) NOT NULL default ”, UNIQUE KEY `counter` (`Id`), KEY `alfabetic` (`Name`), KEY `dname` (`Dname`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Unfortunately, I have not found a Mysql tool to handle UTF-8 files properly. Opening such file would just result in unreadable characters in tool display and also in database. Still, using command line console would result in the same unreadable charaters. The solution: make sure you are setting the character set of the imported data. You can to that using:

SET NAMES UTF-8;

Selecting inserted data

The Mysql JDBC driver allows setting several parameters as connection string parameter. The connection URL would look like:

connection.url=jdbc:mysql://localhost/t1?useUnicode=true&characterEncoding=UTF-8

Still, setting up Spring using default BasicDataSource would do affect character encoding settings.

<bean id=”dataSource” class=” org.apache.commons.dbcp.BasicDataSource”>
<property name=”driverClassName” value=”${connection.driver_class}”/>
<property name=”url” value=”${connection.url }”/>
<property name=”username” value=”${connection.username}”/>
<property name=”password” value=”${connection.password}”/>
</bean>

In order to check used parameters for a particular connection, check using a statement like: “SHOW VARIABLES LIKE ‘char%'” Most variables should be UTF-8, at least the client connection property must be UTF-8. In order to have it working, I’ve added a custom data source that would directly add parameters to the driver when creating the connection:

public class CustomDataSource extends BasicDataSource {
public void init() {
addConnectionProperty(“useUnicode”, “true”);
addConnectionProperty(“characterEncoding”, “UTF-8”);
}
}

That can be easily wired in Spring same as the data source above, only with a “init-method” definition that would add the desired connection properties. The really hard part, that took me a while to figure out, was selecting data using a UTF-8 string in the WHERE clause. It all comes down to the collation Mysql is using in comparing the data. Without any collation specified, searching a UTF-8 string would result in ignoring the accents or special characters and match more results.

The solution, specify the collation when you are doing the query: select Id, Name, Dname from word where w.Dname=#param# COLLATE utf8_bin;

Display data in a web page

Spring MVC proves a simple solution for implementing the web layer. There are several things to be done: – set proper encodings in JSP pages. This can be easily done with a filter or including the following in all your pages. <%@ page language=”java” pageEncoding=”UTF-8″ contentType=”text/html;charset=utf-8″ isELIgnored=”false”%>

– set meta information in the “head” section of the JSP file <META http-equiv=”Content-Type” content=”text/html; Submit and handle data from a web page

UTF-8 characters have to be URL encoded before submission, this is easily done in three steps.

First, set the URIEncoding property in the Connector definition:

<Connector port=”8080″ URIEncoding=”UTF-8″ useBodyEncodingForURI=”true” maxThreads=”150″ minSpareThreads=”25″ maxSpareThreads=”75″ enableLookups=”false” redirectPort=”8443″ acceptCount=”100″ debug=”0″ connectionTimeout=”20000″ disableUploadTimeout=”true” />

That allows the conversion of URLs to UTF-8 and correct submission of parameters using GET method. Second, use encodeURIComponent() in your JavaScript code to encode all submitted data. Without this, even if your page is UTF-8, your parameters will not be handled properly (at least not by IE, Firefox seems to have no problem with this). Third, do not read any parameters in your Controller classes unless you’ve set the right encoding for your request. This can be done using a filter or simply by setting:

request.setCharacterEncoding(“UTF-8”);

Versions: Mysql 5.0.19/ Mysql Connector 3.1.13/ Spring 1.2.7

3 responses to “End to end UTF-8 encoding usage with MySql and Spring

  1. Good suggestions. I found that using Spring’s CharacterEncodingFilter was a vital part of getting this process right. I’m sure I could have explicitly set the character encoding on the request object, but where do you do it when using something like a SimpleFormController? You have to make sure you set the encoding before doing anything with the request.

Leave a Comment:

Your email address will not be published. Required fields are marked *