Brendan Enrick's Blog

Daily Software Development.


TODO Comments - Code Audits #9

by Brendan Enrick Friday, July 13 2012 10:00

When writing software, we often have to write comments to explain why we did something.  I try to write my code such that I don’t have to comment it. In fact, anything other than a temporary (less than a few minutes) comment I write, I consider to mean that I made a mistake.

We could get into endless discussions of when we should comment, what we should comment, and how much we should comment. I’ve had plenty of discussions about comments with people with various opinions on the subject.

An interesting feature of Visual Studio is that it lets you create a list of “tasks” by leaving yourself comments that begin with “TODO” or “HACK”. In some ways I like those comments, because they build into the code reasons for why the code is the way it is.

If you write “TODO”, it tells the reader that the code is incomplete and that wasn’t intentionally left the way it is. There just wasn’t enough time. If you write “HACK”, it admits that you wrote the code badly and it needs to be rewritten.

These types of comments fit in very well with what I tell people about comments. Each one I write and leave in the codebase means that I didn’t write the code as well as I wanted to. Each one is my mistake. The more comments I write, the worse the code is.

The Bad

One codebase that I was working on had a lot of these comments. It seemed they were all over the codebase. There was the other problem that copy-paste coding was rampant, so I am sure that many of these were copied and pasted from previous locations. Heck some of the even said to remove the duplication. The irony? Those comments about removing the duplication were copied with each new instance of the code that was copied.

I checked Visual Studio’s task list of these comments just to see how bad it was.

1131. 1131 tasks that were indicated with “TODO” and “HACK” comments. That takes more than just writing a lot of these comments. That requires writing them and never going back to fix them or complete them.

The Better

Check your codebase right now. Do you have any of these comments lying around? Is it only a handful? If so, you should go try to clean them up. That’s easy and will help you out down the road.

Anytime that you’re going to write a “TODO” comment, think about why you’re putting it there? Is it something that’s really needed? If so, make yourself a note somewhere else. Add a story to go back and make the change later. Get it in the system where you really track your work.

If you know that you’re not going to get to it any time soon, don’t bother with the comment. It will only clutter up your system. If it’s not important enough for you to come back and get it soon, why are you even recording it? If it’s important later, you will be working in that part of the codebase again and you can go back later and fix the code as part of your other work.

Certainly once you get that many of the comments, you should have long earlier realized that there is something that you need to work on. That’s just a ridiculous number of TODO and HACK comments for one codebase.

Has anyone ever see more than that? Anyone think I am crazy or wrong? Tell me about it in the comments!

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Locks and Keys - Code Audits #8

by Brendan Enrick Thursday, July 12 2012 10:00

In software development, it is common to have users authenticate using usernames and passwords. In order to handle the authentication process, we must store this information in some way. Obvious, passwords should never be stored in clear text. Toward that end, when we store them, we encrypt, hash, salt, etc. to keep them safe.

While these steps cannot protect against all forms of attack, they’re a good base of protection to have in place. We don’t want to just give this information away to anyone who wants it. Yes, some people could take the non-clear-text passwords and obtain the original password from it, but we’ve certainly set the barrier higher.

Security is all about setting more and larger hurdles in place for people to jump over. The only secure information is information that doesn’t exist, and that becomes inherently insecure as soon as it does exist. If information is accessible to someone, it means that it’s possible for someone else to access that information also.

We just try to make things more difficult and limit that damage. Or at least some of us do.

I’ve made plenty of my own mistakes in the past. Some of the applications I wrote in college have security holes, which seem glaring to me now, but I’d not even noticed at the time I wrote the code. I’m sure that most people reading this would say the same thing about their own code.

The Bad

One of the systems I had the pleasure of seeing the code for, at its core, was a customer portal. It even had its own custom ecommerce solution. (These details while talking about security issues will have some of you facepalming already.)

The developers knew that they needed to encrypt the passwords, so they wrote some sql, dbo.encrypt, to encrypt passwords before storing them in the database. Insert a password and out comes a jumbled mess that no one could read.

Users will often forget passwords, and will need some way of recovering from this situation. I recommend resetting passwords when this happens. This application took a different approach.

Also stored in the database was some other useful sql, dbo.decrypt, which would take any of the encrypted passwords and decrypt them. Well that’s obviously a security issue, but the worst thing about this isn’t just that it’s reversible. It’s that this is also in the database. That means that we have the lock and the key together. If someone gets read-only database access, but not access to the application’s source code, they still have everything.

It’s like have a lock and keeping the key inside the lock for convenience. I wouldn’t want to have to call a locksmith if I ever lost the key, so I just keep it here. It makes opening the lock faster too!

The Better

Don’t allow reversing of stored passwords. Use 1-way only when dealing with passwords. This way brute force techniques are required to figure out a password.

If the system can reverse the password back to the original, anyone who has obtained access to the system can retrieve all of the passwords.

This isn’t a new idea. You’re not the first developer to set up authentication in an application. There are plenty of existing authentication systems out there. Don’t reinvent the wheel if you don’t have to. Some systems are safer than others. Ask around. Talk with security experts (I am not one of them) about these if your application requires really high hurdles.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Security Policies - Code Audits #7

by Brendan Enrick Tuesday, July 10 2012 10:00

I think we’ve all heard about the great security policies that companies implement.

Companies want you changing your password every other day, without repeating any passwords you’ve ever used, without using any characters from your first name or last name, without having any numbers associated with your address, phone number, birth date, the current year, or your age, and they want your password to be exactly 12 characters.

All of these rules are really funny. I don’t think I need to tell too many people that there are better ways of creating passwords.

All of that aside, those rules are kind of interesting. I use a lot of different passwords. I change the important ones. I don’t change the unimportant ones.

The Bad

When companies implement these policies, they like to have software in place to enforce the policies. Now is a great time to discuss how not to manage these policies.

In this instance the company requires that passwords be changed every 2 months. The passwords cannot contain more than 2 consecutive characters from the person’s first and last name (challenging to avoid as a user creating a password). You cannot use any of your last 3 passwords.

I was going to be rewriting this logic in a new system. I asked a few questions about this. One important questions is, if the password needs to be updated if the person’s name changes. For example they could have gotten married and may not be violating the password not containing consecutive name characters issue. Since I can’t find out what their password is in code, I have to just make them change the password.

This company obviously cares about securing this information since it’s implementing this password policy.

The application is using salted, hashed passwords when storing them in the database. That’s good.

I later noticed how the existing system handled password history. Clear text passwords stored with the username all in a table in the database. Every single password every user had ever used on the site. This is a scary pile of passwords that likely get used all over the Internet by these users. It even included the current password being used by that user.

The Better

These should not be stored this way! If you have to keep passwords around, keep the hash only. Also, get a new salt with each new password. Treat these as passwords (they are!) Don’t be foolishly insecure with this stuff. If someone accessed that database, they would have usernames and passwords for people to try all over the Internet. That would be a huge disaster.

If you have to keep that kind of information around, at least don’t have it in clear text. It would also be a good idea to limit the information that is there by removing the old passwords that are no longer relevant. Since they only care about the last 3 in the history, there is no reason to keep any more than that. Remove the old ones.

To be honest, I also don’t think that those types of password policies are all that secure anyway. I am sure that there is a weak link somewhere in the human element that would render the policy useless anyway.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Static Overload - Code Audits #6

by Brendan Enrick Monday, July 9 2012 10:00

Static methods are very useful and powerful tools when writing code. As I am sure many people have pointed out, these tools should be used sparingly and cautiously.

Statics are dangerous if they have dependencies. This is because they’re hard to test. Or so I thought. Keep in mind, that just because you can mock statics does not mean that you should be using a lot of statics.

The Bad

I saw a great case of this misuse of statics in some cross-cutting code. It would be a great place to use Aspect Oriented Programming, but was instead a mess of statics and dependencies.

This cross-cutting code was being used throughout the application. There was barely a section of the site that did not use this code. Not once in the code had anyone unit tested any of the code that used these methods. It wasn’t being mocked, so there wasn’t really a way to unit test anything using that code.

Starting things off is a class. As you can probably have guessed from this post’s title, it has a static constructor. In the constructor, it accesses a database and creates an instance of the class. Each of the methods in this static class take in parameters and then write to the file system.

I was trying to write code in this code base, but I could not make any changes yet. I needed to get that monster abstracted so that I could work around it.

The Better

For cross-cutting concerns like this, it’s common to use statics. That’s OK. Not ideal, but OK. You just need to be careful. Try to limit the damage.

If done nicely, the Singleton isn’t horrible. Just keep in mind that tests will need to mock out that singleton when you’re using it.

For this class, I wrote a wrapper around it that implemented an interface. My code depended solely on that interface, and I mocked the interface in my tests. This let me minimize the impact of the change. In the long run, It will be important to make sure that the wrapper gets used throughout the codebase. Anywhere the static was used, it should not depend on the wrapper. This is a quick, easy solution that slowly improves the code.

In my opinion, that’s the best way to clean up any code base. Just fix a piece at a time. Make sure that when you touch any other part of the code that you use that same wrapper and eventually things will start looking a lot better.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Massive Multiplied Methods - Code Audits #5

by Brendan Enrick Friday, July 6 2012 10:00

I am coining a new term and have just googled to make sure that it’s not written anywhere yet. “Massive Multiplied Methods” did not appear in a Google search at all. I establish that I am likely the first person to post those 3 words together in that order on the Internet.

I am sure that everyone reading my blog knows that you should not repeat yourself and that your code should be organized in small cohesive parts.

Sometimes developers don’t do that. Occasionally, we can find places where people completely ignored those two.

The Bad

Imagine a site whose content comes from editors uploading lists of content stored in spreadsheets. Each spreadsheet, for some reason, has to have only 1 type of content (even though the data all has the same fields).

Now imagine that when the system reads each row from the spreadsheet, it needs to map each value onto a specific record from the database. The data is normalized, so what was a string in the excel file needs to link to the correct other table in the database.

The user selects which type of content is being uploaded and then the system will call the method for uploading that content. Yes, it’s just one method. Any other methods called are part of the .NET Framework. Making things worse, the methods are long. Very long. Each one was about 1,500 lines of code.

There were 5 different types. Each of these types has its own 1,500 line method. That’s about 7,500 lines of code for these.

And the best part about those 5 methods? They’re all identical except for 1 string. Literally, the only difference between these methods is the magic string literal used for the type.

The Better

I am not sure what to say here other than. Go learn about parameters in methods and how to create methods to break an algorithm into logical parts. Wow. Just. Wow.

It’s bad enough that the string could have just been in the spreadsheet to begin with, but did it really require 5 separate methods?

To top it off, you could have just accessed the value from the drop down list the user selected from to indicate the type. This was all in a code behind file anyway, so it could have just grabbed that value. (It shouldn’t have been there, but this would at least be slightly better.)

I don’t know how people let copy and paste operations get that bad. To be honest, I am amazed that the methods still matched. I guess people just avoided changing the code because it was too ugly. (That’s a mistake, because we need to clean up the scary code.)

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Hidden Inputs - Code Audits #4

by Brendan Enrick Thursday, July 5 2012 10:00

ManBehindCurtainNo user could ever find your “hidden” inputs. You’re a master of stealth and the HTML hidden fields are totally safe and secure for whatever you want to put there. That’s why they’re called “hidden”. Obviously.

Hidden Inputs

In simple terms, these are HTML controls that sit on a page and have no display for users. They’re only “hidden” by not being in plain sight. It’s like hiding something behind a curtain. It makes it easily ignored, but anyone can decide they want to pay attention to that man behind the curtain.

When a user submits a form to your server, it will also post the values of the hidden fields. These are occasionally useful for developers, but are not a place for secure information. You might have an unimportant value you want to maintain.

Never put any information in a hidden input that shouldn’t be seen or modified by a user.

The Bad

One project that I had the pleasure of looking over for someone was a homebrew ecommerce application… Yes, this one is going to get scary.

When you buy something from a site, it will often show you some prices and a total for your order before you checkout. This application had a rather genius location to store the total price that you would be paying. It kept the total in a HiddenInput, so that when the user confirmed the order it would send that up to the server for processing.

I think this was a form of making sure that the same value shown to the user was what they paid. I can only assume that the developer was trying to avoid the values changing in the system and the customer paying a price different from what was shown on the preview.

The danger, however, is that if you want to pay $0.01 for anything on the site, you can just change the hidden input value to 0.01 and then submit the page. Sure it takes someone with some web knowledge to do that, but anymore that’s not a rare thing to know about.

The really cool part is that the system used that value to send to the credit card processor and actually recorded the full detailed list when it logged it. That means that the site logged everything correctly and when it processed the order with the card company it used the amount you specified in the hidden input. Yes, you could totally have ripped them off and their own sites records wouldn’t even show it!

If you’re thinking about trying this on that site, I am sorry to burst your bubble. I told them about that issue and they fixed it.

Less cool than the above example, I’ve seen AccountIds stored in hidden inputs as well as other values that shouldn’t be kept in those locations. I’ve yet to see passwords stored there, but I am sure it’s been done somewhere by someone.

The Better

Make sure to keep all information that requires any level of security or control on the server. Never ever put something like that on the page. Information that should be shown to a user in a read-only fashion should be treated as such. When sent to the client, it should be displayed only. Those values should be left with the client and only information from the server should be trusted.

If you need to store that information somewhere, put it in server memory in some kind of state or store it in the server’s permanent storage.

User sessions are great for storing user-specific information. In the case of the shopping cart, you could use some of that in-memory storage like session or you could store a temporary record in the database of that cart. I am not going to get in the specifics here, but make sure that you get it stored somewhere that users don’t have access to it.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Massive Classes - Code Audits #3

by Brendan Enrick Wednesday, July 4 2012 10:00

Today we’re going to take a look at something that I’m sure most of you have seen before.

We’re talking about classes that have grown so large that they’ve earned the size category “massive”. These classes can sometimes house an entire layer of an application. Who would create these monstrosities? Usually it’s a team of malicious, evil developers who are trying to ruin your day. The devised a clever way to make your development experience terrible; they create a massive class and make sure that you end up having to maintain/repair/modify the code.

The real question everyone should be asking is “how big?”

The Bad

I could write all day long about different instances of these that I’ve seen, but instead of boring you, I will only talk about one of the worst offenders I’ve ever seen.

One fantastic code base we inherited was built using a “great” technique for organizing the code. It’s simple approach state that any time you needed to do any data access you could just go to the class named, “MasterDAL”. What made this so great? Well, you always knew where the method you needed would be. What is wrong with that? Well, it was quite common to have 2 or 3 methods that did the same thing in that massive class.

Sometimes you had a few methods that all did the same thing, but each one did it slightly different. The really fun part is that some of them worked and some of them didn’t. It was your job to make sure that you used one that had been updated and had its bugs fixed.

How big was MasterDAL you ask? That file alone was over 50,000 lines long. It was not auto-generated. It was written by developers.

OK, so something I told you so far about this wasn’t true. The code wasn’t really this nice. It gets better. I lied about there being 1 place for everything. MasterDAL was the master data access layer. There were two other data access layers.

Luckily those ones had much more specific names and were not as general or as large as the MasterDal. MiscellaneousDAL and OtherDAL were each only about 20,000 lines long. Obviously the miscellaneous data access and all the other data access go in those. That’s how they’re different from “Master”.

Oh and did I mention that not only did it have repeated code internally, but it also had some methods repeated in the MasterDAL. I know you’re surprised now.

The Better

Please always make sure that you’re organizing in smaller classes. Go read about the Single Responsibility Principle. It will explain that we need to organize our code. We also need to not repeat ourselves quite so much.

There are so many great ways of dealing with data access. Go get yourself an ORM or just use Entity Framework. I don’t care how you do it, but use one of those. Go read about some designs for organizing your data access code. There are so many good options out there. There’s no reason to create these big messes.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

Cookies - Code Audits #2

by Brendan Enrick Tuesday, July 3 2012 10:00

This is the second post in what I hope to make as a 30 post series of blogs about things I’ve seen in different code bases over my years as a software consultant. I will not be mentioning any client’s names or giving any easy way for people to identify the client whose code base has or did have any of these issues.

My team and I are often brought in on existing projects where the original development team failed to complete the code, created a big ball of mud that made bugs common and features slow to create, or simply where the previous team has completed the work and the customer just wants to know the quality of the code base.

Cookies

Cookies are a great way for developers to keep track of which client is communicating with them. They can store information in the browser. That sounds great doesn’t it?

The cookie gets an expiration date and is accessible from ASP.NET when the user visits your site. This gives you a chance to look at the cookie and get the information that has been stored in the cookie.

The cookie is able to expire when you want it to and will allow the user to close their browser, come back later, and still have the cookie being identified by your site.

These are most often used a means of identifying a user. When you log in to a site, you will get a cookie, which has a unique identifier allowing the site to identify you during that session.

What should be stored in a cookie?

  • Temporary GUIDs
  • Unimportant information
  • Tracking information that in no way identifies the user

Interesting things found in cookies

You can find all kinds of things getting stored in cookies. Now might be a good time to go looking through your own browser’s cookies to see what things sites have stored there.

Full Address Information

One site had an interesting way of making sure it always had access to your relevant information. It kept it all stored in a cookie. Yes, it had your name, address, phone number, email address, etc. stored in a cookie. I bet you’re really happy to know that the information is being stored in a cookie where pretty much any interested party can access it.

This means that anyone sniffing your traffic would see these values being passed when you navigate around the site. Also, it keeps everything nicely packaged up for consumption by spyware or other malware.

Username and Password Information

Ever wonder how a site lets you “Stay logged in for 30 days”? Well I hope the site you’re using doesn’t do it the same way as this one. One application used a cookie to keep your username and password… Where anyone can get it… and it wasn’t encrypted or anything… Clear text…

I think this one explains itself.

You account number and access rights

When you log in to a site, you get access to your stuff and not other people’s stuff. One site whose codebase I’ve seen kept track of this information by storing the account number in the cookie. (At least it wasn’t in the address bar.) Yes, that’s correct. If you just change the account in your cookie, you can see someone else’s account.

One might expect at least a GUID, so it would be hard to guess the other account number, but one would be wrong. These account numbers were integers starting at the number… 1. Each was incremented sequentially by one.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.

URL Parameters - Code Audits #1

by Brendan Enrick Monday, July 2 2012 10:00

This is the first post in a series that I am writing that focuses on some interesting finds from different code bases I’ve seen over my years looking at lots of different software projects. Code Audits are one place where lots of interesting bits of code can be found, since you're combing through an entire software system. I will be explaining what I’ve seen, why it’s wrong, and usually the better approach if it’s not obvious.

URL Parameters

URLs let you control the navigation as your user navigates around your site. Users may not know what URLs mean or how they work, but most people browsing the web know that URLs exist. A good number of them know that modifying that URL will take them to different pages. Even my mother knows that much about URLs.

It’s not just a tech savvy computer programmer or hacker who can figure out that changing the URL might get them to a page they’re trying to see. This is sometimes useful on site’s whose navigation is somewhat lacking. It’s also often possible to guess the page you’re looking for by adding “/login” or some other common word to find a page whose link you’re not seeing.

The Bad

On web sites that have user accounts associated with multiple accounts, it’s quite common for the current account that the user is viewing to be specified in the address bar using a query string parameter. Normally, this means that there will be something along the lines of “?AccountId=123456” on the end of the URL. It’s also very common for those to be built in as “subfolders” like this “/Account/123456/”. Either way, it’s easily visible, modifiable information presented directly to the user.

As I am sure most of you can predict, I am going to point out that I’ve seen some code that didn’t do the proper protection to make sure that only an allowed user can access any given AccountId. The site will only display the links to the allowed accounts, but the site itself does not protect against a user changing that AccountId.

If you want to get to another user’s account, you can just type in another AccountId into the URL and the site will take you there…

The circumstances where these mostly come up is when 1 user account is allowed to have access to more than one account. It’s on these sites that the user will need to be able to change accounts, and sometimes users keep multiple accounts open at the same time, so the URL is needed to keep both open at the same time. It’s important, however, that we maintain at least some level of security here.

The Better

When we build sites, it’s extremely important that we validate inputs given by users to make sure that they’re valid. It’s also important to make sure that users have access to the information they’re requesting.

The first thing you do when you receive user input is to validate that input. Obviously if you consider a URL to be user input, you would validate that immediately as well. This means that either directly or through some layer of abstraction, your code should be verifying a user has access to any information being requested.

Citigroup

There was a big fiasco last year related to Citigroup. (http://www.nytimes.com/2011/06/14/technology/14security.html) The rumor is that they had a security breach based on this type of issue. I have no real, credible information about this, but it makes for a good example even if it’s only rumor.

From the New York Times article we get these:

Once inside, they leapfrogged between the accounts of different Citi customers by inserting vari-ous account numbers into a string of text located in the browser’s address bar.

That sure sounds to me like what I just described. If it is what I described, then this security expert stretched the truth quite a bit based on what is in that same article.

One security expert familiar with the investigation wondered how the hackers could have known to breach security by focusing on the vulnerability in the browser. “It would have been hard to prepare for this type of vulnerability,” he said. The security expert insisted on anonymity because the inquiry was at an early stage.

It’s not hard to prepare for that kind of vulnerability. I guess if you have it, it’s hard to track and log the issue. What’s easy is not having the issue to begin with. It’s a simple policy of making sure that all inputs are validated for permissions before being executed. Also, that is not a “vulnerability in the browser”.

More Code Audit Nuggets

Keep watching for more interesting nuggets of stuff that I’ve seen in codebases.