gilfster: Index

I got an email from Peter Mescalchin a few weeks ago asking me to check a problem he was having with a select statment. He has two fairly large tables one of 14,000 rows and another of 79,994 he is joining these two tables together using a column on the second which has an index. With the index in place the query executes quickly (0.19 seconds on my machine) but when the index is removed his machine hangs and on mine a P4 Hyperthread 2.8Mhz machine one of the CPU threads is kept at 70-80%, it finally completed 1 Hour, 16 Minutes, 2.16 seconds later. A similar thing happened on the Mac but the connect timed out before it returned.

I'm not a performance expert so I wasn't able to give Peter much of an answer other than to confirm what he was seeing. Anybody have any idea why there is such a dramatic difference?

For those looking to test here are the details Peter was using.


Two tables:

 CREATE TABLE `first` (
   `id` smallint(5) unsigned NOT NULL,
   `name` varchar(20) collate utf8_unicode_ci NOT NULL,
   PRIMARY KEY  (`id`)
 ) ENGINE=MyISAM DEFAULT 
   CHARSET=utf8 COLLATE=utf8_unicode_ci;

 CREATE TABLE `second` (
   `id` int(10) unsigned NOT NULL auto_increment,
   `firstid` smallint(5) unsigned NOT NULL,
   PRIMARY KEY  (`id`),
   KEY `firstid` (`firstid`)
 ) ENGINE=MyISAM DEFAULT 
   CHARSET=utf8 COLLATE=utf8_unicode_ci;

Table `first` holds 14,000 rows, table `second` holds 79994 with
`second.firstid` a forgien key to `first.id`. Now to execute a query
to return rows from `first` that are not referenced in `second`.


 SELECT `first`.*
 FROM `first` LEFT JOIN `second` 
 ON (`first`.id = `second`.firstid)
 WHERE (`second`.firstid IS NULL)

Then delete the index on second.firstid.

gilfster

Saturday, February 18, 2006

Index

1 Comments:

About Me

Previous Posts