Random Ramblings: MySQL Delete Duplicate Records with Alter Table

So your application has been humming along live for a couple of months and suddenly you find that one of your database tables contains duplicate records. You review the table structure and find that it does not have a unique constraint defined on a field (or group of fields) that should be unique. So how can these duplicates be eliminated?

I've encountered many solutions for joining a table to itself to identify and delete groups of duplicates, but on large (1GB+) tables this will cause a full table scan (unless you run it in batches) and could be very difficult on a production server. Another option is process the deletions through a temporary table and then rename the tables. So after you've eliminated the duplicates you still need to add the unique constraint to the table; wouldn't it be better if you could handle everything in one shot?

MySQL includes an option for that in the ALTER TABLE syntax; ALTER IGNORE TABLE. The IGNORE option works a bit like 'INSERT IGNORE INTO table1 SELECT * FROM table2...' in that the first record remains in the table but subsequent records are silently dropped.

Here's a "fun" example of how this all works:

mysql> create table tp_test(
  a int(11) auto_increment primary key,
  b varchar(25) not null);
Query OK, 0 rows affected (0.05 sec)

mysql> show create table tp_test;
+---------+-----------------------------------------------+
+---------+-----------------------------------------------+
| tp_test | CREATE TABLE `tp_test` (
`a` int(11) NOT NULL auto_increment,
`b` varchar(25) NOT NULL,
PRIMARY KEY  (`a`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
+---------+-----------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into tp_test (b) 
values 
  ('Chickens'),('eat'),('corn'),('corn'),
  ('and'),('other'),('grains'),('grains');
Query OK, 4 rows affected (0.00 sec)

mysql> select * from tp_test;
+---+----------+
| a | b        |
+---+----------+
| 1 | Chickens |
| 2 | eat      |
| 3 | corn     |
| 5 | corn     |
| 6 | and      |
| 7 | other    |
| 8 | grains   |
| 9 | grains   |
+---+----------+
8 rows in set (0.00 sec)

mysql> ALTER IGNORE TABLE tp_test 
ADD UNIQUE INDEX `test1` (`b`);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 2  Warnings: 0

mysql> select * from tp_test;
+---+----------+
| a | b        |
+---+----------+
| 1 | Chickens |
| 2 | eat      |
| 3 | corn     |
| 6 | and      |
| 7 | other    |
| 8 | grains   |
+---+----------+
6 rows in set (0.00 sec)

Random Ramblings

Thursday, February 26, 2009

MySQL Delete Duplicate Records with Alter Table

No comments: