Brendan Enrick

Daily Software Development

Keep Databases in Source Control

Well, sort of. Your database can be rather large, so you probably don’t want to actually have the database file in source control. I do think you should have your database in source control, but really just your schema. Remember in my post about keeping your binaries in source control I mentioned that it is important to be able to go back to any point in time for your application. This means the database will need to match as well. How can we allow ourselves to do that? Keep databases in source control.

We want to be storing the subset of SQL usually defined as the Data Definition Language. These are all of the statements that create tables, define columns, create indexes, and make any alterations to our database schema. There are plenty of ways to store these. In fact I recommend after reading this post that you attend a local user group and offer this topic as a discussion for an open space. We’ve done this topic at HudsonSC and you get to find out a lot about what solutions other teams are using and why they’re using them.

Why store the database in source control?

We get a few advantages here. This is not an exhaustive list, but it does include some good reasons for following this practice.

  • You can set up your project on any computer and have a local database to test against.
  • Always be able to rebuild your database if anything happens to it.
  • Go back to any point in time and be able to create the database.
  • Incrementally change things and be able to see what changes have been made at any point.

How can I store my database in source control?

There are plenty of ways in which you can store your database in source control. I’ve used a few different techniques for doing this. There are some tools which can help you do this. Keep in mind that you can build your own as well. They’re not very difficult to create.

I have used Visual Studio Database projects before. These are nice because they’re a full package for handling the changes. This is the heavyweight approach to managing your database changes. This keeps each table, index, key, view, stored procedure, etc. as its own file. This built-in Visual Studio system allows you to update the project and use the project to update database.

The lightweight approach is to make each change using a sql script. You can write the alter, create, and drop statements by hand and store each one in a file. This approach makes it harder to see how each table has changed over time, but gives a better view of what changes were made at what points in time. In order to deploy the database updates you’ll either need to write your own system or use a tool like Tarantino to keep track of the changes.

There are of course other tools for handling this situation. What tools are you using for managing your database?

Other considerations

You will probably also want to have some code which can fill your test database with data to be used for locally running your application. I recommend having this code be compiled code (not SQL scripts). The reason for this is so that you can have the compiler assist you in ensuring that the scripts aren’t out of date. You also want this to be part of your build process, so that no one can forget to update it. It should be run by your continuous integration server, so that the build will break if there are any issues.

This will also make updating your production database a lot easier since you will have stored scripts which define exactly which operations need to be performed in order to update your database according to the changes you’ve made.

Sql Table Refactoring Solution

Steve Smith posted an interesting SQL Table Refactoring Challenge on his blog. I am prepared to go out on a limb and share my solution for how to make a table more efficient.

The first step I would take is to change the columns of the table a little bit.

  • Change CountryCode to a char(2) since they are all 2-character ISO standards. The varchar does limit itself based on the size of the data, but has an overhead of 2 bytes. This will also prevent anything longer from being entered into the table.
  • If you’re feeling very ambitious you can make a CountryId column that is of type smallint and uses the id of the country instead of the 2-characters. This will require an update to all of the existing data so I would say this isn’t worth the effort.
  • AboveFold can be changed to a tinyint. Note: these values are 0 to 255 so we will need to remap the values in the database.
  • A smallint can be used for the Clicks column, since it doesn’t need to get very large.
  • Period can change to a date. This will take up 3 bytes which is much smaller than the 8 for the datetime.

As an extra note I believe that int is required to get up to 50,000 for the tables which require it. If we could drop the limit to around 30000 we could use a smallint for those.

Once I have that table created I create 2 duplicate tables (duplicated schema not all of the data). [lq_ActivityLogLoad] and [lq_ActivityLogLoadNext] At any given time we will write to one of these two tables and not the huge table.

Then I create a job which will switch to which of these two tables we are writing. After switching the job will load all of this data directly into the large table using an Upsert (Update or Insert). After loading the data from the table it can empty out that load table and wait a minute and perform this action over again. Doing this loading pattern will make it so the writes will be on these small tables which aren’t being read from.

I would change the primary key to be based on the ID and put a unique non-clustered index on the columns currently used for the primary key preserving the safety from the unique constraint.

Here is the table I created for this.

CREATE TABLE [dbo].[lq_ActivityLog](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[PlacementID] [int] NOT NULL,
[CreativeID] [int] NOT NULL,
[PublisherID] [int] NOT NULL,
[CountryCode] [char](2) NOT NULL,
[RequestedZoneID] [int] NOT NULL,
[AboveFold] [tinyint] NOT NULL,
[Period] [date] NOT NULL,
[Clicks] [smallint] NOT NULL,
[Impressions] [int] NOT NULL,
CONSTRAINT [PK_lq_ActivityLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
 

Keep Binaries in Source Control

No, not all of them. Just the ones required for your build process. Doing this will make your life a lot easier. You want to reference all of your assemblies locally and keep them in source control. You also want all of your testing executables kept there. If you have anything extra your build process depends on make sure you include that as well. A great example is a post I did about building a library which uses OpenAccess on a build server. The build server of course doesn’t have OpenAccess installed. That would be crazy.

You get a lot of great benefits out of keeping these files local and in source control. Since libraries change infrequently, they will not require much source control storage space. This also means that when updating from source control you will rarely have to wait for an update. If you do have to wait you probably needed it anyway, and would have just been waiting on an Internet download instead.

Build Server Benefits

If you have a build server this will allow you to build your application without having to install your tools and libraries on the server. When the server gets the latest version of your project from your source control server, it will already have the current versions of all of your tools and libraries. This keeps things simple and easy to work with.

If you don’t go this route you will have to install everything on the build server and make sure that all of the paths still match. Ever dealt with the Program Files (x86) folder?

New Computer Benefits

Have a new developer joining the project or just a fresh install of Windows on your computer? Great! Now just go and find the 10 different libraries and tools that are required to build this project. Once you have those make sure that they’re in the correct folders… Wouldn’t that be annoying?

If you keep those tools in source control and always reference with relative paths, no one will have to hunt for all of the tools required for the build. If you were referencing libraries in their install locations you would have all kinds of problems if something installed in a bad place. For example I’ve seen libraries try to install in a user-specific directory. Yes, that is a worst case example, but it can happen.

Keep everything relative and you can move your code anywhere. I want to be able to copy everything in source control onto a freshly installed copy of Windows and have it build. If I can do that, I know the project has a good structure.

Old Versions of the Project

Ever want to go back to an old version of your project and look at something? I bet you had some trouble building if you’ve changed the versions of any libraries you’re using. If the specific version you need is with that code you never have to worry. At any snapshot in time the repository has the correct build scripts, libraries, tools, utilities, and source code. You’re all set to go back to any point in the history of the application and take a look.

You might be saying, “what if I have changed versions of my IDE?” It shouldn’t matter. If you’ve got build scripts in place you can use those to get that app running.

Some people try to keep track of which versions of which tools are used for which release of an application, but if you’ve got them in source control you don’t have to worry about it. Just get the version you want, and the required files come with it.

Now that’s what I call a clean, self-contained build.

Agile Story Estimation

One challenging part of software development is estimating the amount of time that tasks will take. Why is this a challenge? There are plenty of reasons: sometimes you’re dealing with legacy code, sometimes there are a great deal of unknowns, and sometimes you aren’t doing the whole story and can’t estimate the other work. There are plenty of other reasons why estimating stories is difficult, so how do you resolve the difficulties and get things estimated.

 

Improving Estimates

I find it is best to bring the whole group together to make estimates. With a group, you’re less likely to forget about different aspects of the project. You’ll have everyone there who will be doing the tasks. Someone there may know more about the legacy system or a certain part of it than you do.The group should help prevent you from making any unrealistic estimates.

Another great way of improving your story estimates is to do the estimating shortly before you’re going to be doing the work. The best estimates you can make can only be done immediately before doing the task. Any earlier and you don’t know exactly how the system will be architected. Estimate any earlier and the system may have become easier or harder to change before you start the task.

To help build group consensus on the estimates you want to limit the number of options available. With too many numbers it will be very hard to get everyone to agree. You also want to have less options as the numbers get larger. The reason being that larger estimates are less accurate anyway, so you really just want a magnitude for the estimate. Is this a day-long task? More?

Find some nice system to limit the numbers. A great way that I’ve seen used in the past is Fibonacci numbers only.

1, 2, 3, 5, 8, 13, 21…

That set allows for some accuracy at the bottom and some good general choices up higher. 8 is great because it is a full-day.

My favorite set of numbers for estimating is using the powers of 2.

1, 2, 4, 8, 16, 32…

I’ll explain later why I like this set of numbers.

Another easy set to use is Prime Numbers (with 1 included).

1, 2, 3, 5, 7, 11, 13

Whatever set you choose make sure that the group knows which ones you’re using and stick with it.

Implementing Estimates

Once you select your strategy for estimating you’ll need to implement it. Estimating should be done close to when the task will be done, so I recommend estimating on each Monday all of the stories for the week.

Bring everyone together and make sure they understand how the estimating will be done. You want to start by reading one of the stories. Let the team ask any questions about the story. Once everyone understands the task you get to estimate.

Make sure that everyone selects their estimate secretly this is important so that no one’s estimate is influenced by others’ estimates. I’ve seen a few interesting ways of doing this.

The cheapest approach is to have each person hold up a number of fingers equal to his estimate, and then all estimators reveal their estimates at the same time. This is kind of like a game of rock-paper-scissors. We can call this approach Rock-Paper-Estimate.

Another easy approach is to have everyone write down the number and reveal at the same time. You could use paper for this or little whiteboards.

A very cool trick that was brought to my attention recently came from someone handing me a deck of cards from the Visual Studio 2010 launch event in Las Vegas. Planning Poker. It is basically a deck of cards with estimates on the cards. There are 4 different colors, so 4 people could be estimating at any given time with a single deck.

IMG_0359

IMG_0360

Whatever method you use, make sure you work for the group's agreement and try to keep things fast-paced.

April What Day?

The day I’ve been waiting for has finally arrived. The Internet has made the first day of April into a fun day, which is different from the opinions of many who believe the day to be mean-spirited. Well I recommend you surf around today so you don’t miss these.

XKCD

One of the best comics I’ve ever read has a great little gag for today. The whole site is UNIX. You can use commands like “ls” and “clear” you can also “cat” files. Navigating between comics has commands. And if you want to get someone to make you a sandwich. You can use this command, “sudo make me a sandwich”.

UnixSudo

Go try some commands. I was impressed by what I found there. I don’t know what else will work. You can read the files on there. Who knows what other “programs” you can run. Check the comic archives maybe there is another command.

I wonder if it supports regular expressions…

Google

I think we all knew that Google was going to do something interesting. Luckily for them, Topeka, Kansas helped them come up with something. You may have heard that the city recently changed its name to “Google, Kansas”. So to honor that, Google has decided to change their name to “Topeka”.

topeka-hp

There is a nice article about it and everything. I guess now I can “Topeka it on Bing” instead of “Google it on Bing”. I know that “Bing” will never win the verb war. They didn’t factor in the lack of a past tense. If you type “Binged” you’re talking about something completely different and you seem to be a glutton.

Think Geek

If you’re looking for something to do with an Apple iPad I have just the thing. ThinkGeek is selling the iCade, which will let you turn your iPad into a mini arcade! Now you have a reason to get the flash-lacking device.

icade_main_zoom

More Google (of course)

Ever want to know what an animal is saying? Now you can know the answer. Just use Translate For Animals from Google Topeka. This is a beta application which will work on your Android phone.

infographic-sm

 

YouTube

Playing videos all day long leaves YouTube with some very expensive bandwidth costs. To help cut these costs, YouTube’s service can now be viewed in text. The claim is $1/second savings. Kind of hard to see though.

Youtube

Have a great day! I hope you enjoy all of the gags. Make sure you look for more of them. They're fun!

Agile Resources - The Agile Actors

Agile development has a few roles that people will be playing. Those roles define certain responsibilities that the parties involved will need to take on. Each role serves an integral role in the development process.

Product Owner

This is the representative of the customer’s interests. The product owner is responsible for knowing the details of all the work that needs to be done. When a developer has a question about how something should work in the system the product owner is the one with the answer.

The big picture is maintained by the product owner, who will be there to remind everyone of the big picture as the project is being worked on. This role is vital to the development without a project. If you’ve ever worked on a project where the product owner was brought in at the beginning once and once at the end I am sure you know that things didn’t go well.

Knowing where the project is going and making sure it isn’t deviating from what is important cannot be stressed enough, and these are vital for the product owner.

Team Lead

Agile teams work very fluidly. During development, most team members are the same. The team lead has some extra responsibilities including: helping to resolve blocking and impeding issues, guiding stand ups, guiding retrospectives, guiding iteration planning sessions. I like the term “team lead”, because this person isn’t really the boss. This role is here to help guide the project and the progress. The leader is the person making sure that the agile process stays on the tracks.

The leader doesn’t make decisions. The leader helps keep everyone moving by clearing the path and making sure everyone sticks together.

Team Member

These are the people creating the application. The team lead is also one of these people. They will be breaking stories into tasks, estimating the size and difficulty of the stories, and working on the stories which create the application. If something needs to be fixed, created, or modified, these are the people who will be working on it.

Fluidity is the word to describe the development team. Each member of an agile team must wear multiple hats. The developers will be designing, developing, and testing the application throughout the process.

Agile Resources - Development Process Recurring Steps

Different processes are used to create software applications. Agile software development is about responding and embracing change. It is about working closely with the customer instead of the customer just saying what they want up front. Agile teams follow processes during development, but they’re flexible, quick processes which allow the continue to continue making advances.

Most agile processes involve some sort of iterative development. Iterating allows for quick changes in the allocation of developer resources, which means responding to changes when the response is needed. The following is a list of each iterating piece of the development process grouped by how often the step occurs.

Months

Releases – if you’re working on an application end users will need to update when you release then your releases will likely be done in terms of months. Releases will include collections of features as well as bug fixes. Planning a release means selecting the features you’ll want to complete within the release time period. This is not specifics, but is only a large scale decision of what features will be worked on most of the iteration. (This plan needs to be flexible as it will likely change.)

Weeks

Iterations – an iteration usually lasts one or two weeks and is the primary development cycle for the application. During an iteration planning meeting, the team and the product owner will collaboratively decide which stories will be completed within the iteration. Everyone will decide and commit to achieving these goals. Iterations short, planning session as well as a short retrospective at the end. This communication keeps the team on the same page and allows the process to change as needed.

Minor Releases – if your application allows for it, which means that your application doesn’t require an update on everyone’s computer, an update should be released with every iteration. Keeping releases small makes them a lot easier and less likely to cause problems.

Days

Standing Meeting – during this meeting no one should sit down. It should take only a few minutes, so there is no need to be sitting. This is a great time for everyone to let others know of their progress as well as their plans for the rest of the day. This is a great time to inform everyone of any issues as well as to get questions answered which have quick answers. These help keep everyone in the loop, so no members are left out.

Hours

Change Pair Programmers – I believe highly in pair programming. From what I’ve seen, it produces much higher quality code. The system works best when pairs are changed in terms of hours. The pair needs to work together long enough to avoid the costs of context switching and often enough that knowledge and technique is spread thoroughly in the application and the team.

Minutes

Unit Tests – I think everyone should be unit testing. These should be written all the time and you should be writing your next unit test within minutes of your previous one. These should be written quickly and easily. The challenge of unit testing is getting to a point where the testing is easy. Once writing tests is easy, you’re testing “the right way”.

Code Commits – commit your changes frequently. A lot can happen to a code base in an hour. Keep checking in frequently. It shouldn’t be so often that it impairs your work, but you really only want to be impacting a handful of files. Long periods between commits mean they’re large and difficult to manage. If you have to write more than one thing in the description of your commit, you’re probably committing too much. (The Single Responsibility Principle should apply to code commits as well.)

Agile Resources – Velocity and Capacity Explained

Velocity and Capacity are two terms commonly used for determining how much work a team can perform in a given amount of time. Agile development is customer-focused development, and as such it is important to know for small periods of time how much a team can achieve.

Determining Velocity

A teams velocity is an estimate of how much work it usually completes in a given time period. The figure is based on how much work a team has completed recently. Velocity is usually measured in either hours of points. Agile teams determine how they will be measuring the time and difficulty of tasks, whether they use a point system or a number of hours is up to them.

Determining Capacity

A teams capacity is an estimate of how much work it can complete in a given time period. The figure is based on the ideal amount of time available within the team. This number is created by adding together the sum total of ideal hours each developer has for the project in the given time period. If there are two developers each working 24 hours/week then the weekly capacity is 48 hours. If you’re using points and not hours you can convert that into points.

In Iterations

An iteration is a period of time where an agile team is going to move through every stage of the development process all the way from planning to deploying an update. Agile teams tend to do one or two week long iterations. Velocity and capacity are used to help teams commit to a certain amount of work to be accomplished during an iteration. Since the team knows how much they’ve completed previously and their potential capacity for work, they are able to determine if the workload they’re agreeing to is reasonable.

Author’s Note: It is important to customize agile development based or your team and the needs of your team. The content here is a suggestion based on what has worked well for in the past.

Quick Silverlight Tip: Looking at the code

Silverlight is a great technology, and one thing that really makes it a treat to work with is the ease with which one can access the code inside the XAP file. Yes, this means that someone can look at your code,  SO DON’T PUT ANYTHING SECURE IN THERE! I took Jeff Blankenburg’s Click the Button contest.

So for a while I though Jeff had tricked me on that puzzle until I opened up the XAP file and looked at the source code. On the last page of the Silverlight you see this screen.

Level 12

So what’s the big deal? Well, the level before it was titled “Level 10”, so I figured Jeff was trying to pull a fast one and there was more to this seemingly innocent page. So I tried to find how to get beyond this level 12 to the real one. I eventually took the smart route, and I opened .NET Reflector and pointed it at the DLL file for the contest.

How did I do that you ask? Here are some quick steps for looking at the source code of a Silverlight application.

Step 1: Download the .xap file. View the source of the HTML page or look at the NET tab in FireBug to get the .xap file and download it.

Step 2: Rename the file to .zip and open it. Yes, a .xap is really a .zip neat huh?

Step 3: Unzip the file and find the dll you’re looking for. In my case it is called “ClickTheButton.dll”

Step 4: Open up .NET Reflector and open up that dll file.

NewDllInReflector

Step 5: After examining the dll you’ll be able to see all of the code. In this case I can see that Jeff actually has 13 levels he created for this application.

You now have a choice to make. You can ask Jeff Blankenburg what happened to the other puzzles or just go and take a look at the code and see which levels are missing.

HudsonSC January 2010 Recap

Last night we had a great HudsonSC meeting. People started showing up around 5:30, and I had the opportunity to meet and talk with a few new attendees during that time.

Lightning Talks

At around 6:00 we got things started. Kevin Kuebler started us off with an interesting talk on BDD where he showed some interesting code using MSpec. As usual, he was unable to keep his talk under 10 minutes. We will not fault him for that though, since he discussed the topic well and showed some cool code.

Following Kevin’s talk, we had a well-timed talked from Yasir Drabu about the Unit of Work pattern, which I in no way pressured him into giving at the last minute. Yasir let us know that he was currently implementing this, and he would show us the eventual implementation at a later meeting.

Open Spaces

Our open space discussions start at the same time pizza arrives, so people can sit around eating pizza and discussing interesting topics. The selected topics of the evening were “How to promote agile practices at your workplace” and “Automated build processes”.

I’ll defer to someone else about how the agile practices open space went, but I do believe the build processes discussion was interesting. Continuous integration, build automation scripts, and database change management were the hot topics of discussion among that group.

Coding Exercises

As usual we finished up the meeting with some time spent pair programming through some coding exercises. This time we took our second group attempt through our customized version of KataPotter.

We had a few interesting solutions to the exercise, and each one handled the optimization for getting the price down differently. They all seem to work effectively according to the unit tests. I know my solution was very different from the previous one I used to solve this exercise.

In Northeast Ohio? Come to HudsonSC the third Wednesday of each month in Hudson, Ohio.