Convert MySQL Database Character Encoding to UTF8

Create a Database

To create a database that will default to UTF8 without modifying the server defaults run the following command

CREATE DATABASE dbName DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci

Broken down, it means:

CREATE DATABASE dbName - create a database named dbName (best name, ever)

DEFAULT CHARACTER SET utf8 - this database will use utf8 as its default character encoding. All tables created, will be configured to use utf8 for character columns.

DEFAULT COLLATE utf8_general_ci -  when comparing characters, ignore case (ci=case insensitive)

Convert an Existing Database

If you already have a database in some other character set, such as latin1, or ISO 8859-1, you can easily convert it to UTF8 by walking through a few simple steps.

1. Export your database

$> mysqldump -u -root -p \
       --complete-insert \
       --extended-insert \
       --default-character-set=utf8 \
       --single-transaction \
       --add-drop-table \
       --skip-set-charset \
 > dump.sql

This will include column names for your insert statements, efficient insert statements that don't repeat column names, export in utf, not include setting the existing charset, and include drop table statements prior to create table statements.

2. Edit the Dump to Use UTF-8

cat dump.sql | \
    sed '/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8/' | \
    sed '/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/'
  > dump-utf8.sql

This will just swap out latin1 for utf8. You can accomplish however you want to. I'm a fan of sed and awk and all things command line but the goal is clear; make latin1 stuff, utf8 stuff. Just be careful to qualify your  replacements. You don't want to accidentally change data.

3. Drop Your Database

From the mysql prompt, run

> drop database dbName;

4. Import UTF8 Version

mysql -u root -p < dump-utf8.sql

 

That's it. You could easily make a little shell script to do this and when need to convert a database from one encoding to another, just run it. Perhaps the command would look something like:

swap-encoding dbName fromEncoding toEncoding