How to set the default charset to utf-8 for create table when using hibernate with java persistence annotations

August 14th, 2007 by spostelnicu in Java, General

Yesterday I encountered a problem when trying to persist a String value into a MySQL column of type ‘text‘ (the problem also occurs for column types ‘tinytext‘, ‘mediumtext‘ etc.)

The first confusing thing was that the error message returned by mysql was
java.sql.BatchUpdateException: Data truncation: Data too long for column 'my_column'

The confusing thing was that the column was of type ‘text‘ (length=65536), and I then changed it to type ‘mediumtext‘ (length=16777215), but the value that I was trying to persist had only around 6000 characters, so it was not a problem with the data length.
Instead the one thing that was noticeable about my String value was that it contained non-ASCII characters (particularly Romanian characters with diacritics).

After a little search on the web, I found the following mysql bug description http://bugs.mysql.com/bug.php?id=17872
A comment on that page ([8 Mar 2006 9:53] [ name withheld ]) also contains some more links to similar bug descriptions.

Now it was a little clearer that the problem was caused by the Romanian characters (and the error message was just stupid), so the next thing to do was to figure out why the database table didn’t store UTF-8 characters.

Although I created the database with
create database mydb character set utf8 collate utf8_general_ci;
still the tables created inside it had the collation latin1_swedish_ci.

My application uses Hibernate for persistence and uses java persistence annotations to specify the hibernate mappings.

In my case the code is something like:


import javax.persistence.*;

@Entity
@Table(name = "my_table")
public class MyClass implements Serializable {
    private String myValue;

    @Column(name = "my_column", columnDefinition = "mediumtext", length = 16777215)
    public String getMyValue() {
        return this.myValue;
    }

    public void setMyValue(String value) {
        this.myValue = value;
    }
}

The database schema is automatically generated based on the classes and hibernate mappings, by using the following ant target:


    <taskdef name="hibernatetool"
             classname="org.hibernate.tool.ant.HibernateToolTask"
             classpathref="project.classpath"/>

    <target name="hbm2ddl-schema"
            description="Generates the database schema from hibernate mappings">
        <hibernatetool destdir="">
            <classpath refid="project.classpath"/>
            <annotationconfiguration configurationfile="${classes.dir}/hibernate.cfg.xml"/>
            <hbm2ddl export="true" drop="false" create="true" haltonerror="true"/>
        </hibernatetool>
    </target>

If I had created the database tables by writing the SQL DDL by hand, I would have used the following script:


  create table `my_table` (
      `Id` int(11) NOT NULL auto_increment,
      `my_column` mediumtext NOT NULL default ”,
      ......
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

But in my case the database schema is automatically generated by HibernateTools, so I have to specify the default charset for the table somewhere in the @Table annotation, or hibernate configuration, or hibernatetool parameters.

After trying a few options (and I must admit I did not search thoroughly, so if you know of some better way to specify the default charset, please leave a comment), I chose the following solution:
I implemented my own custom org.hibernate.dialect.Dialect, by subclassing org.hibernate.dialect.MySQL5InnoDBDialect:


import org.hibernate.dialect.MySQL5InnoDBDialect;

/**
 * Extends MySQL5InnoDBDialect and sets the default charset to be UTF-8
 * @author Sorin Postelnicu
 * @since Aug 13, 2007
 */
public class CustomMysqlDialect extends MySQL5InnoDBDialect {

    public String getTableTypeString() {
        return " ENGINE=InnoDB DEFAULT CHARSET=utf8";
    }
}

and used it in my hibernate.cfg.xml:


<hibernate-configuration>
    <session-factory>
        <property name="hibernate.dialect">my.package.CustomMysqlDialect</property>
        .....
    </session-factory>
</hibernate-configuration>

You might also like

Hibernate annotations – default value If you wanted to set the default value using hibernate annotations, you've probably had some difficulties,...
Avoiding SQL joins with java enums Lets say you have a Coffee object and 3 sizes for coffee (small, medium and large).You can create a Coffee...
Hibernate Annotations 3.2.1 – Lucene sort Keeping database and Lucene index in sync is easily done using the new Lucene-Hibernate integration. Just...
Serializable java objects in MySQL How to save a complex java object in a MySQL table If you want to save complex java objects to MySQL...
.

10 Responses

  1. chris Says:

    I’ve tried a few other options, but I had no luck and I dont know why…

    I tried changing the con.url

    i tried adding a hibernate charset property

    the only sollition that actually changed my tables to UTF-8 was yours..

  2. chris Says:

    Ik kicked out my tags *smile*..

    First try:
    hibernate.connection.url = jdbc:mysql://localhost:3306/kaartenbalie_test2?characterEncoding=UTF-8

    Second try:
    hibernate.connection.charSet = UTF-8

    no luck..

  3. Nirav Patel Says:

    I too was facing similar problem [though a bit different].

    For me, supplying character encoding to connection URL resolved the issue!

    For details – http://neoexpedition.wordpress.com/2007/11/01/persisting-japanese-characters-in-db-using-hibernate/

    Thanks guys!

  4. Max Says:

    If your application is the only application on DB, than you may try to find the property for default charset in DB server config.

    For example in MySQL it is: “default-character-set=utf8”

  5. Shawn Hartsock Says:

    You should consider submitting this as a patch.

  6. Shervin Asgari Says:

    This is the way you do it.

    In your hibernate.cfg.xml you use another dialect.

    or

    depending on if you use mysql 4.x

    Then to get utf8 on the tables you have to create the database as utf8.
    create database someDatabase default charset utf8;

    Then hibernate will automatically set the charset for the tables to utf8.
    Hope this helps

    Shervin

  7. Shervin Asgari Says:

    Ops, I see my properties didnt appear. Here it is:

    property name=”hibernate.dialect” value=”org.hibernate.dialect.MySQL5InnoDBDialect”

    or

    property name=”hibernate.dialect” value=”org.hibernate.dialect.MySQLInnoDBDialect”

  8. Mura Andrei Petru Says:

    I have also used your technic, because I encountered the same problem, and it works very well.

  9. posi Says:

    Easier than using your own dialact might be switching the database to UTF-8 e.g.

    ALTER DATABASE my_table DEFAULT CHARACTER SET = ‘UTF-8’

  10. Alvaro Says:

    That works fine. Thanks you!

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.