gilfster: February 2006

Tuesday, February 28, 2006

The MySQL Prompt

Following on from my previous post about tips I have been picking up from the MySQL certification study guide I have another which I think is worth mentioning.

Personally I had never noticed that the MySQL prompt changes in relation to what went before it. How often have you done something similar to the following.


mysql> select * from emps where emp_name = 'john ;
    '> ;
    '> ';
Empty set (0.01 sec)

You type in a select statement only to open a quote but never close it, personally I always stick in another ; before I realise the mistake and terminate the quote and end the statement correctly. But what I didn't realise before reading the study guide (and making me very unobservant) was that the mysql> prompt changes in response to the currently open quote. In the above example this isn't obvious because it just looks like a continuation but in fact the prompt has been changed to show us that it is expecting a closing single quote. Take for example the following.


mysql> select * from emps
    -> where emp_name = 'john';

In this case it gives us the standard -> continuation prompt. Firstly I never even noticed this and secondly even if I did I doubt I would have seen the significance, generally I'm more focused on why the five semi-colons I entered have failed to be recognised. The certification study guide points out that there are in fact 5 prompts.


mysql> 
    ->
    '>
    ">
    `>

This becomes much more relevant when we are entering large complex SQL statements where it might not be obvious where the mistake has occurred. Take this SQL for example.


mysql> select e.emp_id, e.emp_name, d.dept_name
    -> from emps e, dept d
    -> where e.dept_id = d.dept_id
    -> and e.emp_name = 'john
    '> and e.dept_id = 1;
    '>

While this isn't as complex as it could be it’s easy to see in this instance where the problem occurred because our prompt changes from -> to '> on line 5.

We can use this in combination with the \c tip I mentioned earlier.


mysql> select e.emp_id, e.emp_name, d.dept_name
    -> from emps e, dept d
    -> where e.dept_id = d.dept_id
    -> and e.emp_name = 'john
     '> '\c

Study and You Shall Learn

Originally this blog started as a reference for people moving from Oracle to MySQL, over the course of the 5 or so months it's been going I have at times moved away from that and on to other topics. I have tended however to keep it technical and avoided commenting on anything related to the business side of MySQL. Since Christmas my MySQL activities have decreased somewhat but over the last few weeks I've been doing more and more, be that attempting to update this blog or work on new content on www.mysqldevelopment.com. One of the things I have decided to do is take the MySQL certification exams, this isn't some great announcement and I'm a little wary of doing so publicly because I don't want to set a time limit on doing so.

Anyway on to the point, I have been developing with databases from the very first day of my IT career, mostly this has been with Oracle but also with Microsoft SQL Server, FoxPro, DB2 and Ingress. This has meant that I have picked up a lot of knowledge about databases and also SQL. On the whole this has meant that moving to MySQL has been painless, the few things that I needed to know specifically for MySQL have been easy to pick up. But having started to read the certification study guide I can see that there are gaps in my knowledge. Most of these are just small things that while are good to know are not a problem in the day to day operation of MySQL. However there are some really useful things that I have picked up, things which I just wouldn't have known or thought about had I not read the guide or been told by somebody in the know. I'm sure I'll be posting about more this in the future but here is a small tip I picked up.

Firstly it's \c, that may be something you use all the time but having worked with Oracle for so long I just got used to entering some rubbish to make the client give me a fresh line. If you like me are unfamiliar with \c then all it does is cancel and clear the command line if you make a mistake while entering a statement. In the old days I would have done something like this....


mysql> selec t * from emps
    -> where ;
ERROR 1064 (42000): You have an error in your SQL 
syntax; check the manual that corresponds to 
your MySQL server version for the right syntax to 
use near 'selec t * from emps where' at line 1
mysql>

While I was on the second line I noticed that I had incorrectly typed the SELECT keyword so I terminated the statement by simply entering the ; delimiter. While this doesn't do any harm it does produce an error. Using \c we can do this gracefully like so.

mysql> selec t * from emps
    -> where \c
mysql>

Wednesday, February 22, 2006

MySQL Getting More Popular?

My web hosting company provide some great statistics for me to look at with relation to my site www.mysqldevelopment.com. It's been interesting watching visitor and hit numbers grow over the last year since the site went live. A year ago for the month of February we had just 251 visits, this year however we have had 3446 in February as of about 5 minutes ago, and there are still 6 days to go in the month.

But this figure is up drastically from January 2006 and late last year. Each month last year was better than the previous one but growth was steady and we peaked in terms of traffic in November when the site had a lot of activity due to the release of 5.0 and me winning the first week of the 5.0 challenge. But visitor numbers only reached 2847 for November and were around 2500 for the months either side.

So I was just wondering if this was something thats happened just on www.mysqldevelopment.com or if it was a trend other people in the community were seeing.

Saturday, February 18, 2006

WiFi Woes.

I have a small network at home running a Windows XP laptop, a Windows desk top and a Mac Mini. I recently move the Mac over to the TV so I could connect it up and watch TV programs I saved using a digital TV gizmo I purchased for my Mac, it's great because it has pretty much made the video machine redundant and recording is a breeze.

However this meant that the Mac was too far away from the ADSL gateway to be connected via a cable. Given that it was a first generation Mac Mini it didn't come with a wireless card installed. I decided that given that an Airport card would have cost almost twice as much as a windows wireless card and that the Mac Mini at least seems difficult to upgrade yourself, I choose to move the gateway closer to the Mac and get a wireless card for the PC.

This seemed great for a while, but I have been having plenty of problems with the wireless connection going down which needs a reboot to fix, this as you can imagine is a real pain especially given that my CVS server and MySQL database are on the Mac. This seems to only be a problem with the desk top as the laptop doesn't seem to have the same issues.

I was hoping to write a fair amount of content for www.mysqldevelopment.com today as I have been a little lax at updating it of late. But the connection has dropped a total of 3 times already. So as I sit here typing this I have the gateway back over near the Windows desktop connected via a cable. This means I'm not able to connect to the Mac but does allow me to vent spleen.

It's tempting to think that the more complex things become the more problems we have, but then I think back to when I had a spectrum as a kid and you would have to sit through 8 minutes of yellow and blue lines and binary sounds coming out of the tape recorder only to find that at the end it didn't recognise a section of that 48K of code and failed to load that game of Daley Thompsons Decathlon, or the cat decided sitting on the break button was a good idea when you left the thing on all night because you didn't have the energy to finish Jet Set Willy the night before.

Index

I got an email from Peter Mescalchin a few weeks ago asking me to check a problem he was having with a select statment. He has two fairly large tables one of 14,000 rows and another of 79,994 he is joining these two tables together using a column on the second which has an index. With the index in place the query executes quickly (0.19 seconds on my machine) but when the index is removed his machine hangs and on mine a P4 Hyperthread 2.8Mhz machine one of the CPU threads is kept at 70-80%, it finally completed 1 Hour, 16 Minutes, 2.16 seconds later. A similar thing happened on the Mac but the connect timed out before it returned.

I'm not a performance expert so I wasn't able to give Peter much of an answer other than to confirm what he was seeing. Anybody have any idea why there is such a dramatic difference?

For those looking to test here are the details Peter was using.


Two tables:

 CREATE TABLE `first` (
   `id` smallint(5) unsigned NOT NULL,
   `name` varchar(20) collate utf8_unicode_ci NOT NULL,
   PRIMARY KEY  (`id`)
 ) ENGINE=MyISAM DEFAULT 
   CHARSET=utf8 COLLATE=utf8_unicode_ci;

 CREATE TABLE `second` (
   `id` int(10) unsigned NOT NULL auto_increment,
   `firstid` smallint(5) unsigned NOT NULL,
   PRIMARY KEY  (`id`),
   KEY `firstid` (`firstid`)
 ) ENGINE=MyISAM DEFAULT 
   CHARSET=utf8 COLLATE=utf8_unicode_ci;

Table `first` holds 14,000 rows, table `second` holds 79994 with
`second.firstid` a forgien key to `first.id`. Now to execute a query
to return rows from `first` that are not referenced in `second`.


 SELECT `first`.*
 FROM `first` LEFT JOIN `second` 
 ON (`first`.id = `second`.firstid)
 WHERE (`second`.firstid IS NULL)

Then delete the index on second.firstid.

5.1.6 On Mac OS X

I have finally found sometime this weekend to update www.mysqldevelopment.com. So in the spirit of the site I'm looking forward and adding content on some of the new features in 5.1, first up it's the event scheduler.

I have been using my Mac less and less recently simply because I have been using it as a media centre and it now lives over by the TV, I generally connect remotely via putty or VNC. So given that I was on my Windows PC this morning I decided to install the latest release of 5.1 on there.

The download and install went perfectly and I started up the MySQL command line ready to have a play with the event scheduler. So I ran


mysql> SHOW GLOBAL VARIABLES LIKE 'event%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| event_scheduler | OFF   |
+-----------------+-------+
1 row in set (0.01 sec)

Great off to a good start, next up I have to set the event_scheduler variable to ON....


ERROR 2013 (HY000): Lost connection to MySQL server during query

Oops.. memories of the early work I did on mysqldevelopment.com with 5.0.1 came flooding back, having worked with recent versions of MySQL 5.0 which are totally stable it's easy to forget that 5.1.6 is still in the development phase. I remembered Markus Popp mentioning that the event scheduler had problems on windows and it seems it does.

So I connected to the Mac via VNC and downloaded 5.1.6 for the Mac OS X 10.3 (yep behind the times at bit on that as well). In the past I have had plenty of problems upgrading MySQL on the Mac, I'll be honest and say that was more to do with my lack of experience of the Darwin command line than anything else, but it seems I must have picked something up as this time it was really simple. I tend to make things easy for myself in that I just to clear out the old system and install the new one right on top, I don't have anything worth keeping on the Mac's MySQL database so that makes things relatively easy.

So I rebooted, went into the terminal and typed MySQL....


~ $ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4 to server version: 5.1.6-alpha

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

Then I tried to turn on the event scheduler


mysql> set global event_scheduler = ON;
Query OK, 0 rows affected (0.00 sec)

So that’s it, I'm all set ready to rock and roll with 5.1.6 and ready to test the event scheduler.

Thursday, February 09, 2006

Cartesian Joins are Useful (Honest)

For those that don't know a cartesian join is one that produces a cartesian product. A cartesian product is the result of joining two sets of data in such a way that all rows in one set are joined with all rows in the other. In MySQL this happens when we have two tables in an SQL statement but no join statement. For example...


mysql> select * from emp_dept;
+--------+---------+
| emp_id | dept_id |
+--------+---------+
|      1 |       1 |
|      2 |       1 |
|      3 |       2 |
|      4 |       2 |
+--------+---------+
4 rows in set (0.00 sec)

mysql> select * from dept;
+---------+------+
| dept_id | name |
+---------+------+
|       1 | IT   |
|       2 | HR   |
+---------+------+
2 rows in set (0.00 sec)

Normally these two tables would be joined using a join condition on dept_id so that we return only the rows where a match is found.


mysql> select emp_id, name 
       from emp_dept e join dept d 
       on e.dept_id = d.dept_id;
+--------+------+
| emp_id | name |
+--------+------+
|      1 | IT   |
|      2 | IT   |
|      3 | HR   |
|      4 | HR   |
+--------+------+
4 rows in set (0.26 sec)

However if we remove the join each row in emp_dept is joined with dept if a match exists or not.


mysql> select emp_id, name from emp_dept, dept;
+--------+------+
| emp_id | name |
+--------+------+
|      1 | IT   |
|      1 | HR   |
|      2 | IT   |
|      2 | HR   |
|      3 | IT   |
|      3 | HR   |
|      4 | IT   |
|      4 | HR   |
+--------+------+
8 rows in set (0.01 sec)

This is a cartesian product. In most cases this isn't a great idea, I had a problem last year where the assumption was that there was a 1 to 1 relationship in a table join, the introduction of some code broke this rule and it resulted in a select returning 3 million rows where once it only returned 150 or so. The users were not happy when their application took 5 minutes to load when previously it took seconds.

Result sets can get large quickly because the amount of data in the select is the number of rows in Table A * the number of rows in Table B, if you have more than two tables this multiplies at an alarming rate.

But there are times when we can use this to our advantage. I'm working on a site which needs some combinations stored for a game engine, there are 6561 different combinations and I didn't fancy having to type them in by hand. So what I did was create a dummy table with all values I needed in each position, I then used a cartesian join by having a copy of the table for as many columns as I needed.

So lets take a simplified example, if we want all the combinations of 1,2 and 3 using 2 columns we can do this like so.


mysql> select * from dummy_table;
+------+
| nums |
+------+
| 1    |
| 2    |
| 3    |
+------+
3 rows in set (0.00 sec)

mysql> select concat(d1.nums,d2.nums) 
       from dummy_table d1, dummy_table d2;
+-------------------------+
| concat(d1.nums,d2.nums) |
+-------------------------+
| 11                      |
| 21                      |
| 31                      |
| 12                      |
| 22                      |
| 32                      |
| 13                      |
| 23                      |
| 33                      |
+-------------------------+
9 rows in set (0.00 sec)

Wednesday, February 08, 2006

More in More Out

I've been busy of late, most of my time seems to be taken up with fixing layout bugs in my web applications introduced when I spend all day testing in FireFox which works perfectly, then only to have a quick test on I.E. to notice that there is a tiny portion of the page which is out and in the most ugly way possible. I then tend to mess the whole page up trying to track down where it might have come from. I suppose I should have learnt to test in I.E. early in the process but that means leaving the debugger and CSS edit facilities of FireFox behind.

Anyway back to the point, this means I have less and less time to devote to MySQL these days, but that doesn't stop me popping into the MySQL forums a couple of times a day (for no other reason than to stop me going crazy messing around with I.E.). I've posted a few times here recently on the need to be as descriptive as possible but one thing that seems to be becoming very important when answering is version information.

Firstly there are many people keen to try the new features of MySQL such as stored procedures, triggers and views, as we know they are only available from version 5 (triggers only from 5.0.3) so if people included the version number when asking a question related to the subject it would be immediately obvious they were using the wrong version. If nothing else it may even prompt the person asking the question to consider if the feature they are using is available in their version.

Secondly there are plenty of questions which relate to fairly complex SQL statements, many of those can be quickly answered using a sub select but as you may know this is only available from version 4.1, many of the people using the forums are using MySQL via their web hosting account and many of those are still on version 4.0 and below. The obvious thing to do in such cases is to give an answer for both 4.0 and 4.1 versions but often I don't have enough time to answer for one version let alone two, especially when the query is complex and needs a bit of care to give a full answer with an example.

This leads to me having to filter the questions based on how much information the user has given or more of a problem the first answer is simply a request for version information, with the way the site operates this means people in different time zones can wait up to a day for a decent reply.

I can see one of two ways to deal with this, try and educate users of the forums to include as much information as possible including the version of MySQL they are using or add a selection box on the forum page to specify the version number prior to asking the question, make this mandatory (with a unknown option, and also a default option in your profile) so that it prompts people to at least consider the question may be version related.

The problem of course with the first is that many people asking questions only do so once, they get the answer to their question and they never come back. The problem with the second is that this is something only MySQL AB themselves can control.

Friday, February 03, 2006

Keeping it short.

A recent post in the MySQL Stored Procedure forum reminded me of the old days, the post was with regard to a problematic stored procedure which had a rather long name. The name had nothing to do with the problem but it got me thinking about naming conventions and documenting code to make it more readable.

I'm all for naming things with an appropriate name but there are times when it can be taken too far. One particular example was a system I worked on a few years ago, every table had two columns called unique_identifier and parent_identifier. Every record had a unique identifying number and the ability to store the unique identifier of it's parent, I'll ignore the fact this wasn't the greatest way to design the database for now.

The problem was that these columns were used on a regular basis, pretty much in every SQL statement that joined two tables in the system. This meant that the consulting team spent most of their programming time typing these things out in full (which also included typos on a regular basis). The development team could have used unique_id or even uid and everybody would still have know what the columns were used for but they went for the fully monty.

Don't get me wrong I'm definitely into using descriptive names but there comes a time when you need to think about how practical it's going to be when you have to type 60 extra characters per SQL statement.

I had a meeting with one of the original development team and he informed me the decision was taken because Visual FoxPro allowed 32 character column names where FoxPro 2.0 only ever allowed 8 characters, he said it was a decision that they wish the had never taken.

gilfster