Brendan Enrick

Daily Software Development

Sql Table Refactoring Solution

Steve Smith posted an interesting SQL Table Refactoring Challenge on his blog. I am prepared to go out on a limb and share my solution for how to make a table more efficient.

The first step I would take is to change the columns of the table a little bit.

  • Change CountryCode to a char(2) since they are all 2-character ISO standards. The varchar does limit itself based on the size of the data, but has an overhead of 2 bytes. This will also prevent anything longer from being entered into the table.
  • If you’re feeling very ambitious you can make a CountryId column that is of type smallint and uses the id of the country instead of the 2-characters. This will require an update to all of the existing data so I would say this isn’t worth the effort.
  • AboveFold can be changed to a tinyint. Note: these values are 0 to 255 so we will need to remap the values in the database.
  • A smallint can be used for the Clicks column, since it doesn’t need to get very large.
  • Period can change to a date. This will take up 3 bytes which is much smaller than the 8 for the datetime.

As an extra note I believe that int is required to get up to 50,000 for the tables which require it. If we could drop the limit to around 30000 we could use a smallint for those.

Once I have that table created I create 2 duplicate tables (duplicated schema not all of the data). [lq_ActivityLogLoad] and [lq_ActivityLogLoadNext] At any given time we will write to one of these two tables and not the huge table.

Then I create a job which will switch to which of these two tables we are writing. After switching the job will load all of this data directly into the large table using an Upsert (Update or Insert). After loading the data from the table it can empty out that load table and wait a minute and perform this action over again. Doing this loading pattern will make it so the writes will be on these small tables which aren’t being read from.

I would change the primary key to be based on the ID and put a unique non-clustered index on the columns currently used for the primary key preserving the safety from the unique constraint.

Here is the table I created for this.

CREATE TABLE [dbo].[lq_ActivityLog](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[PlacementID] [int] NOT NULL,
[CreativeID] [int] NOT NULL,
[PublisherID] [int] NOT NULL,
[CountryCode] [char](2) NOT NULL,
[RequestedZoneID] [int] NOT NULL,
[AboveFold] [tinyint] NOT NULL,
[Period] [date] NOT NULL,
[Clicks] [smallint] NOT NULL,
[Impressions] [int] NOT NULL,
CONSTRAINT [PK_lq_ActivityLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
 

Loading