Brendan Enrick

Daily Software Development

Sql Table Refactoring Solution

Steve Smith posted an interesting SQL Table Refactoring Challenge on his blog. I am prepared to go out on a limb and share my solution for how to make a table more efficient.

The first step I would take is to change the columns of the table a little bit.

  • Change CountryCode to a char(2) since they are all 2-character ISO standards. The varchar does limit itself based on the size of the data, but has an overhead of 2 bytes. This will also prevent anything longer from being entered into the table.
  • If you’re feeling very ambitious you can make a CountryId column that is of type smallint and uses the id of the country instead of the 2-characters. This will require an update to all of the existing data so I would say this isn’t worth the effort.
  • AboveFold can be changed to a tinyint. Note: these values are 0 to 255 so we will need to remap the values in the database.
  • A smallint can be used for the Clicks column, since it doesn’t need to get very large.
  • Period can change to a date. This will take up 3 bytes which is much smaller than the 8 for the datetime.

As an extra note I believe that int is required to get up to 50,000 for the tables which require it. If we could drop the limit to around 30000 we could use a smallint for those.

Once I have that table created I create 2 duplicate tables (duplicated schema not all of the data). [lq_ActivityLogLoad] and [lq_ActivityLogLoadNext] At any given time we will write to one of these two tables and not the huge table.

Then I create a job which will switch to which of these two tables we are writing. After switching the job will load all of this data directly into the large table using an Upsert (Update or Insert). After loading the data from the table it can empty out that load table and wait a minute and perform this action over again. Doing this loading pattern will make it so the writes will be on these small tables which aren’t being read from.

I would change the primary key to be based on the ID and put a unique non-clustered index on the columns currently used for the primary key preserving the safety from the unique constraint.

Here is the table I created for this.

CREATE TABLE [dbo].[lq_ActivityLog](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[PlacementID] [int] NOT NULL,
[CreativeID] [int] NOT NULL,
[PublisherID] [int] NOT NULL,
[CountryCode] [char](2) NOT NULL,
[RequestedZoneID] [int] NOT NULL,
[AboveFold] [tinyint] NOT NULL,
[Period] [date] NOT NULL,
[Clicks] [smallint] NOT NULL,
[Impressions] [int] NOT NULL,
CONSTRAINT [PK_lq_ActivityLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
 

Keep Binaries in Source Control

No, not all of them. Just the ones required for your build process. Doing this will make your life a lot easier. You want to reference all of your assemblies locally and keep them in source control. You also want all of your testing executables kept there. If you have anything extra your build process depends on make sure you include that as well. A great example is a post I did about building a library which uses OpenAccess on a build server. The build server of course doesn’t have OpenAccess installed. That would be crazy.

You get a lot of great benefits out of keeping these files local and in source control. Since libraries change infrequently, they will not require much source control storage space. This also means that when updating from source control you will rarely have to wait for an update. If you do have to wait you probably needed it anyway, and would have just been waiting on an Internet download instead.

Build Server Benefits

If you have a build server this will allow you to build your application without having to install your tools and libraries on the server. When the server gets the latest version of your project from your source control server, it will already have the current versions of all of your tools and libraries. This keeps things simple and easy to work with.

If you don’t go this route you will have to install everything on the build server and make sure that all of the paths still match. Ever dealt with the Program Files (x86) folder?

New Computer Benefits

Have a new developer joining the project or just a fresh install of Windows on your computer? Great! Now just go and find the 10 different libraries and tools that are required to build this project. Once you have those make sure that they’re in the correct folders… Wouldn’t that be annoying?

If you keep those tools in source control and always reference with relative paths, no one will have to hunt for all of the tools required for the build. If you were referencing libraries in their install locations you would have all kinds of problems if something installed in a bad place. For example I’ve seen libraries try to install in a user-specific directory. Yes, that is a worst case example, but it can happen.

Keep everything relative and you can move your code anywhere. I want to be able to copy everything in source control onto a freshly installed copy of Windows and have it build. If I can do that, I know the project has a good structure.

Old Versions of the Project

Ever want to go back to an old version of your project and look at something? I bet you had some trouble building if you’ve changed the versions of any libraries you’re using. If the specific version you need is with that code you never have to worry. At any snapshot in time the repository has the correct build scripts, libraries, tools, utilities, and source code. You’re all set to go back to any point in the history of the application and take a look.

You might be saying, “what if I have changed versions of my IDE?” It shouldn’t matter. If you’ve got build scripts in place you can use those to get that app running.

Some people try to keep track of which versions of which tools are used for which release of an application, but if you’ve got them in source control you don’t have to worry about it. Just get the version you want, and the required files come with it.

Now that’s what I call a clean, self-contained build.

Agile Story Estimation

One challenging part of software development is estimating the amount of time that tasks will take. Why is this a challenge? There are plenty of reasons: sometimes you’re dealing with legacy code, sometimes there are a great deal of unknowns, and sometimes you aren’t doing the whole story and can’t estimate the other work. There are plenty of other reasons why estimating stories is difficult, so how do you resolve the difficulties and get things estimated.

 

Improving Estimates

I find it is best to bring the whole group together to make estimates. With a group, you’re less likely to forget about different aspects of the project. You’ll have everyone there who will be doing the tasks. Someone there may know more about the legacy system or a certain part of it than you do.The group should help prevent you from making any unrealistic estimates.

Another great way of improving your story estimates is to do the estimating shortly before you’re going to be doing the work. The best estimates you can make can only be done immediately before doing the task. Any earlier and you don’t know exactly how the system will be architected. Estimate any earlier and the system may have become easier or harder to change before you start the task.

To help build group consensus on the estimates you want to limit the number of options available. With too many numbers it will be very hard to get everyone to agree. You also want to have less options as the numbers get larger. The reason being that larger estimates are less accurate anyway, so you really just want a magnitude for the estimate. Is this a day-long task? More?

Find some nice system to limit the numbers. A great way that I’ve seen used in the past is Fibonacci numbers only.

1, 2, 3, 5, 8, 13, 21…

That set allows for some accuracy at the bottom and some good general choices up higher. 8 is great because it is a full-day.

My favorite set of numbers for estimating is using the powers of 2.

1, 2, 4, 8, 16, 32…

I’ll explain later why I like this set of numbers.

Another easy set to use is Prime Numbers (with 1 included).

1, 2, 3, 5, 7, 11, 13

Whatever set you choose make sure that the group knows which ones you’re using and stick with it.

Implementing Estimates

Once you select your strategy for estimating you’ll need to implement it. Estimating should be done close to when the task will be done, so I recommend estimating on each Monday all of the stories for the week.

Bring everyone together and make sure they understand how the estimating will be done. You want to start by reading one of the stories. Let the team ask any questions about the story. Once everyone understands the task you get to estimate.

Make sure that everyone selects their estimate secretly this is important so that no one’s estimate is influenced by others’ estimates. I’ve seen a few interesting ways of doing this.

The cheapest approach is to have each person hold up a number of fingers equal to his estimate, and then all estimators reveal their estimates at the same time. This is kind of like a game of rock-paper-scissors. We can call this approach Rock-Paper-Estimate.

Another easy approach is to have everyone write down the number and reveal at the same time. You could use paper for this or little whiteboards.

A very cool trick that was brought to my attention recently came from someone handing me a deck of cards from the Visual Studio 2010 launch event in Las Vegas. Planning Poker. It is basically a deck of cards with estimates on the cards. There are 4 different colors, so 4 people could be estimating at any given time with a single deck.

IMG_0359

IMG_0360

Whatever method you use, make sure you work for the group's agreement and try to keep things fast-paced.

April What Day?

The day I’ve been waiting for has finally arrived. The Internet has made the first day of April into a fun day, which is different from the opinions of many who believe the day to be mean-spirited. Well I recommend you surf around today so you don’t miss these.

XKCD

One of the best comics I’ve ever read has a great little gag for today. The whole site is UNIX. You can use commands like “ls” and “clear” you can also “cat” files. Navigating between comics has commands. And if you want to get someone to make you a sandwich. You can use this command, “sudo make me a sandwich”.

UnixSudo

Go try some commands. I was impressed by what I found there. I don’t know what else will work. You can read the files on there. Who knows what other “programs” you can run. Check the comic archives maybe there is another command.

I wonder if it supports regular expressions…

Google

I think we all knew that Google was going to do something interesting. Luckily for them, Topeka, Kansas helped them come up with something. You may have heard that the city recently changed its name to “Google, Kansas”. So to honor that, Google has decided to change their name to “Topeka”.

topeka-hp

There is a nice article about it and everything. I guess now I can “Topeka it on Bing” instead of “Google it on Bing”. I know that “Bing” will never win the verb war. They didn’t factor in the lack of a past tense. If you type “Binged” you’re talking about something completely different and you seem to be a glutton.

Think Geek

If you’re looking for something to do with an Apple iPad I have just the thing. ThinkGeek is selling the iCade, which will let you turn your iPad into a mini arcade! Now you have a reason to get the flash-lacking device.

icade_main_zoom

More Google (of course)

Ever want to know what an animal is saying? Now you can know the answer. Just use Translate For Animals from Google Topeka. This is a beta application which will work on your Android phone.

infographic-sm

 

YouTube

Playing videos all day long leaves YouTube with some very expensive bandwidth costs. To help cut these costs, YouTube’s service can now be viewed in text. The claim is $1/second savings. Kind of hard to see though.

Youtube

Have a great day! I hope you enjoy all of the gags. Make sure you look for more of them. They're fun!