Brendan Enrick

Daily Software Development

Sql Table Refactoring Solution

Steve Smith posted an interesting SQL Table Refactoring Challenge on his blog. I am prepared to go out on a limb and share my solution for how to make a table more efficient.

The first step I would take is to change the columns of the table a little bit.

  • Change CountryCode to a char(2) since they are all 2-character ISO standards. The varchar does limit itself based on the size of the data, but has an overhead of 2 bytes. This will also prevent anything longer from being entered into the table.
  • If you’re feeling very ambitious you can make a CountryId column that is of type smallint and uses the id of the country instead of the 2-characters. This will require an update to all of the existing data so I would say this isn’t worth the effort.
  • AboveFold can be changed to a tinyint. Note: these values are 0 to 255 so we will need to remap the values in the database.
  • A smallint can be used for the Clicks column, since it doesn’t need to get very large.
  • Period can change to a date. This will take up 3 bytes which is much smaller than the 8 for the datetime.

As an extra note I believe that int is required to get up to 50,000 for the tables which require it. If we could drop the limit to around 30000 we could use a smallint for those.

Once I have that table created I create 2 duplicate tables (duplicated schema not all of the data). [lq_ActivityLogLoad] and [lq_ActivityLogLoadNext] At any given time we will write to one of these two tables and not the huge table.

Then I create a job which will switch to which of these two tables we are writing. After switching the job will load all of this data directly into the large table using an Upsert (Update or Insert). After loading the data from the table it can empty out that load table and wait a minute and perform this action over again. Doing this loading pattern will make it so the writes will be on these small tables which aren’t being read from.

I would change the primary key to be based on the ID and put a unique non-clustered index on the columns currently used for the primary key preserving the safety from the unique constraint.

Here is the table I created for this.

CREATE TABLE [dbo].[lq_ActivityLog](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[PlacementID] [int] NOT NULL,
[CreativeID] [int] NOT NULL,
[PublisherID] [int] NOT NULL,
[CountryCode] [char](2) NOT NULL,
[RequestedZoneID] [int] NOT NULL,
[AboveFold] [tinyint] NOT NULL,
[Period] [date] NOT NULL,
[Clicks] [smallint] NOT NULL,
[Impressions] [int] NOT NULL,
CONSTRAINT [PK_lq_ActivityLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
 

Table Variables vs. Temporary Tables in SQL

Temporary tables and table variables can be used in pretty much the same way. One question which is asked, "Which one is better". To answer that it is important to understand a few of the differences between the two. Once we understand how these two types of tables are different we will better understand which we need to use when.

Creating a temporary table is very similar to creating a regular table. In order to do this we use the following code

create table #myTempTable (columnA typeA, columnB typeB, ...)

Table variables are created as regular variables are declared. The type of the variable is just defined as a table.

declare @myTableVariable table (columnA typeA, columnB typeB, ...)
Scope is another important factor to consider. With a table variable its scope is similar to other variables. A temporary table however does not share this scope, and can even be accessed in a stored procedure called by your code. This is one very nice feature of temporary tables over table variables.

Table variables do not record any entries in transaction logs. Temporary tables do store their transactions. Depending on what you are trying to accomplish this can be a benefit for either of the two types of tables. It is often very important to keep track of the transaction log, and in these instances a temporary table is far better for your needs, but other times the transaction log is simply unnecessary and a table variable would be the optimal choice.

I've heard of great performance gains in temporary tables because of their ability to be pre-compiled. I've also heard that table variables are faster than temporary tables in general. I believe this has to do with the nicely defined scope of table variables. Because of this they may use fewer resources than temporary tables. An example of this is the transaction log I mentioned earlier.

For the most part a table variable is just as flexible as a temporary table, and in most instances it also out performs temporary tables in my experience.

One thing to watch out for when using temporary tables is the possibility of causing your store procedure to recompile. Microsoft has a Knowledge Base Article about Troubleshooting stored procedure recompilation which discusses this danger. Table variables will not cause this problem.

I would recommend the use of table variables for everyday use, and only using temporary tables when they're required. Instances of this include the need for a transaction log. Or any time you feel like writing extra lines of code simply to perform cleanup on your temporary table. But that is just my $0.02 on this issue.

Happy SQL writing!

Using a Numbers Table to Iterate Over Dates

I recently wrote some SQL which would do some work for a single day, but I wanted my code to be able to run once for each day in a date range. Luckily for me, I read Gregg Stark's SQL Blog. He recently posted about how to Iterate Over a List of Dates using a Numbers table in SQL. The SQL he gives is pretty much a cookie cutter solution to the problem.

The basic idea of this trick is to have a cursor which is going to allow you to fetch each of the dates into a variable for use in your code.

You first need to have a Numbers table in your database. This is basically just a table which contains a single number for each row. Use the following short snippet to create your Numbers table.

 
CREATE TABLE dbo.Numbers
(
    Number int IDENTITY(1, 1) PRIMARY KEY
)
GO
 
declare @MaxNumber int
set @MaxNumber = 65535
WHILE 1 = 1
BEGIN
    INSERT INTO dbo.Numbers DEFAULT VALUES
    
    IF scope_identity() = @MaxNumber 
    BEGIN
        BREAK
    END
END
GO

Once you have this numbers table you can follow this cookie cutter solution Gregg created which is actually a very nicely written solution to the problem.

Declare @StartDate datetime
Declare @EndDate datetime
 
-- Note: This StartDate is Exclusive and this EndDate is Inclusive
set @StartDate = '10/1/2007'
set @EndDate = '10/9/2007'
 
declare @CurrentDate datetime
 
-- Create the cursor with the dates using the numbers table
declare datecursor cursor forward_only 
    for Select dateadd(d,Number,@StartDate) from Numbers
          where Number <= datediff(d, @StartDate, @EndDate)
          order by Number
open datecursor
 
-- Loop which will exit when we are out of dates to check
while (1=1)
begin
    fetch next from datecursor into @CurrentDate 
    if @@fetch_status <> 0
        break;
    
    -- This is the code which will run for each date
    select * from some_table where DateRecorded = @CurrentDate
 
end
 
-- Cursor Cleanup
close datecursor
deallocate datecursor

Happy SQL writing!

Numbering the Rows Returned from a SQL Query

Outside of SQL it would be quite easy to number the rows returned from query, but to do so inside of the SQL is a bit more difficult. I recently wanted to be able to number the rows returned from a query in a stored procedure. I am using SQL Server 2000, so I cannot use row_number().

I was going to ask my go to SQL guy, Gregg Stark, how to do this, but while in the process of asking him for a good method of doing this, I came up with the solution. My cool trick for numbering the rows returned from a query is useful because I don't actually want to return these rows to other code. I want to store them back into my database ordered differently than the numbering, and using a table variable will allow me to do exactly what I am trying to do.

First we need to create a table variable. We will define that table as having an identity column plus any other columns we happen to need. We then insert into that table all of the values returned from our query.

declare @t table (Rank int identity(1,1), UserName nvarchar(256), Points int)
insert into @t
select UserName, sum(PointsEarned) as Points
from UserRecognition
group by UserName

After running this part we have created a table variable which contains the information we want. We can then select data out of this as if it were a normal table. In this example I am able to obtain a ranking for my users based on their earned points. This lets me store that ranking without having them stored in ranking order.

It is obviously much easier to do this in SQL 2005, but this is a very cool trick for those of us using SQL Server 2000. Let me know what you think, and if you have any cool improvements.