Now a days, UTF-8 is the most used data encoding format, and the fact that your database is not using UTF8 encoding is really annoying, specially additionally when it comes to integrating different systems, that has no one unified encoding format.
So if you think it’s time to change your data encoding to utf8 format, then here what this post is all about.
I’ll list here the steps to do so, i just have to clarify that the main data encoding here is windows-1256 (which is the main Arabic encoding used in web applications), but it’s saved in latin1 encoding in the database (mydata ->windows-1256 -> latin1) ,also note that i’m using Mysql database.
Here are the steps:
- Export (only) the schema of the db,without “set Names” phrase in the outputted sql file, this will bring you back the data in the original encoding (windows-1256)
Shell1mysqldump --default-character-set=latin1 --skip-set-charset -d -uusername -ppassword db_name > db_name_schema.sql
- Export the data of the db without “set Names” phrase in the outputted sql file, this will bring you back the data in the original encoding (windows-1256):
Shell1mysqldump --default-character-set=latin1 --skip-set-charset -t -uusername -ppassword db_name > db_name_data.sql
- Change the encoding of both files from arabic to utf8 -check the notes if you r using windows
Shell12iconv -f windows-1256 -t utf8 db_name_schema.sql > db_name_schema_utf8.sqliconv -f windows-1256 -t utf8 db_name_data.sql > db_name_data_utf8.sql
- Open the file ‘db_name_schema.sql’ with any editor and replace each “DEFAULT CHARSET=latin1” phrase with “DEFAULT CHARSET=utf8” one
- Make a new db ,encoded in utf8:
Shell1mysql -uusername -ppassword -e 'CREATE DATABASE new_db DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci'
- Import the schema and data in utf8 encoding:
Shell12mysql --default-character-set=utf8 -uusername -ppassword new_db < db_name_schema_utf8.sqlmysql --default-character-set=utf8 -uusername -ppassword new_db < db_name_data_utf8.sql
If you are wondering why to separate schema from data upon exporting , then the answer is that the operation of replacing “DEFAULT CHARSET=latin1” phrase with “DEFAULT CHARSET=utf8” one , is taking place only on schema files, so it’s recommended to separate them so that you dont stuck when loading the big data files.
If you are a windows user and can’t use iconv , then u can use any editor to do the job for u, try scite or note++ or even dreamweaver