1.3. Practices
Like the principles described in the previous section, there are many practices that you can employ to develop more secure applications. This list of practices is also small and focused to highlight the ones that I consider to be most important.
Some of these practices are abstract, but each has practical applications, which are described to clarify the intended use and purpose of each.
1.3.1. Balance Risk and Usability
While user friendliness and security safeguards are not mutually exclusive, steps taken to increase security often decrease usability. While it's important to consider illegitimate uses of your applications as you write your code, it's also important to be mindful of your legitimate users. The appropriate balance can be difficult to achieve, and it's something that you have to determine for yourselfno one else can determine the best balance for your applications.
Try to employ the use of safeguards that are transparent to the user. If this isn't possible, try to use safeguards that are already familiar to the user (or likely to be). For example, providing a username and password to gain access to restricted information or services is an expected procedure.
When you suspect foul play, realize that you might be mistaken and act accordingly. For example, it is a common practice to prompt users to enter their password again whenever their identity is in question. This is a minor hassle to legitimate users but a substantial obstacle to an attacker. Technically, this is almost identical to prompting users to authenticate themselves again entirely, but the user experience is much friendlier.
There is very little to gain by logging users out entirely or chiding them about an alleged attack. These approaches degrade usability substantially when you make a mistake, and mistakes happen.
In this book, I focus on providing safeguards that are either transparent or expected, and I encourage careful and sensible reactions to suspected attacks.
1.3.2. Track Data
The most important thing you can do as a security-conscious developer is keep track of data at all timesnot only what it is and where it is, but also where it's from and where it's going. Sometimes this can be difficult, especially without a firm understanding of how the Web works, and this is why inexperienced web developers are prone to making mistakes that yield security vulnerabilities, even when they have experience developing applications in other environments.
Most people who use email are not easily fooled by spam with a subject of "Re: Hello"they recognize that the subject can be forged, and therefore the email isn't necessarily a reply to a previous email with a subject of "Hello." In short, people know not to place much trust in the subject. Far fewer people realize that the From header can also be forged. They mistakenly believe that this reliably indicates the email's origin.
The Web is very similar, and one of the things I want to teach you is how to distinguish between the data that you can trust and the data that you cannot. It's not always easy, but blind paranoia certainly isn't the answer.
PHP helps you identify the origin of most datasuperglobal arrays such as $_GET, $_POST, and $_COOKIE clearly identify input from the user. A strict naming convention can help you keep up with the origin of all data throughout your code, and this is a technique that I frequently demonstrate and highly recommend.
While understanding where data enters your application is paramount, it is also very important to understand where data exits your application. When you use echo, for example, you are sending data to the client. When you use mysql_query( ), you are sending data to a MySQL database (even when the purpose of the query is to retrieve data).
When I audit a PHP application for security vulnerabilities, I focus on the code that interacts with remote systems. This code is the most likely to contain security vulnerabilities, and it therefore demands the most careful attention to detail during development and during peer reviews.
1.3.3. Filter Input
Filtering is one of the cornerstones of web application security. It is the process by which you prove the validity of data. By ensuring that all data is properly filtered on input, you can eliminate the risk that tainted (unfiltered) data is mistakenly trusted or misused in your application. The vast majority of security vulnerabilities in popular PHP applications can be traced to a failure to filter input.
When I refer to filtering input, I am really describing three different steps:
Identifying input
Filtering input
Distinguishing between filtered and tainted data
The first step is to identify input because if you don't know what it is, you can't be sure to filter it. Input is any data that originates from a remote source. For example, anything sent by the client is input, although the client isn't the only remote source of dataother examples include database servers and RSS feeds.
Data that originates from the client is easy to identifyPHP provides this data in superglobal arrays, such as $_GET and $_POST. Other input can be more difficult to identifyfor example, $_SERVER contains many elements that can be manipulated by the client. It's not always easy to determine which elements in $_SERVER constitute input, so a best practice is to consider this entire array to be input.
What you consider to be input is a matter of opinion in some cases. For example, session data is stored on the server, and you might not consider the session data store to be a remote source. If you take this stance, you can consider the session data store to be an integral part of your application. It is wise to be mindful of the fact that this ties the security of your application to the security of the session data store. This same perspective can be applied to a database because the database can be considered a part of the application as well.
Generally speaking, it is more secure to consider data from session data stores and databases to be input, and this is the approach that I recommend for any critical PHP application.
Once you have identified input, you're ready to filter it. Filtering is a somewhat formal term that has many synonyms in common parlancesanitizing, validating, cleaning, and scrubbing. Although some people differentiate slightly between these terms, they all refer to the same processpreventing invalid data from entering your application.
Various approaches are used to filter data, and some are more secure than others. The best approach is to treat filtering as an inspection process. Don't correct invalid data in order to be accommodatingforce your users to play by your rules. History has shown that attempts to correct invalid data often create vulnerabilities. For example, consider the following method intended to prevent file traversal (ascending the directory tree):
Can you think of a value of $_POST['filename'] that causes $filename to be ../../etc/passwd? Consider the following:
.../.../etc/passwd
This particular error can be corrected by continuing to replace the string until it is no longer found:
Of course, the basename( ) function can replace this entire technique and is a safer way to achieve the desired goal. The important point is that any attempt to correct invalid data can potentially contain an error and allow invalid data to pass through. Inspection is a much safer alternative.
In addition to treating filtering as an inspection process, you want to use a whitelist approach whenever possible. This means that you want to assume the data that you're inspecting to be invalid unless you can prove that it is valid. In other words, you want to err on the side of caution. Using this approach, a mistake results in your considering valid data to be invalid. Although undesirable (as any mistake is), this is a much safer alternative than considering invalid data to be valid. By mitigating the damage caused by a mistake, you increase the security of your applications. Although this idea is theoretical in nature, history has proven it to be a very worthwhile approach.
If you can accurately and reliably identify and filter input, your job is almost done. The last step is to employ a naming convention or some other practice that can help you to accurately and reliably distinguish between filtered and tainted data. I recommend a simple naming convention because this can be used in both procedural and object-oriented paradigms. The convention that I use is to store all filtered data in an array called $clean. This allows you to take two important steps that help to prevent the injection of tainted data :
Always initialize $clean to be an empty array.
Add logic to detect and prevent any variables from a remote source named clean.
In truth, only the initialization is crucial, but it's good to adopt the habit of considering any variable named clean to be one thingyour array of filtered data. This step provides reasonable assurance that $clean contains only data that you knowingly store therein and leaves you with the responsibility of ensuring that you never store tainted data in $clean.
In order to solidify these concepts, consider a simple HTML form that allows a user to select among three colors:
In the programming logic that processes this form, it is easy to make the mistake of assuming that only one of the three choices can be provided. As you will learn in Chapter 2, the client can submit any data as the value of $_POST['color']. To properly filter this data, you can use a switch statement:
This example first initializes $clean to an empty array in order to be certain that it cannot contain tainted data. Once it is proven that the value of $_POST['color'] is one of red, green, or blue, it is stored in $clean['color']. Therefore, you can use $clean['color'] elsewhere in your code with reasonable assurance that it is valid. Of course, you could add a default case to this switch statement to take a particular action in the case of invalid data. One possibility is to display the form again while noting the errorjust be careful not to output the tainted data in an attempt to be friendly.
While this particular approach is useful for filtering data against a known set of valid values, it does not help you filter data against a known set of valid characters. For example, you might want to assert that a username may contain only alphanumeric characters:
Although a regular expression can be used for this particular purpose, using a native PHP function is always preferable. These functions are less likely to contain errors than code that you write yourself is, and an error in your filtering logic is almost certain to result in a security vulnerability.
1.3.4. Escape Output
Another cornerstone of web application security is the practice of escaping outputescaping or encoding special characters so that their original meaning is preserved. For example, O'Reilly is represented as O\'Reilly when being sent to a MySQL database. The backslash before the apostrophe is there to preserve itthe apostrophe is part of the data and not meant to be interpreted by the database.
As with filtering input, when I refer to escaping output , I am really describing three different steps:
Identifying output
Escaping output
Distinguishing between escaped and unescaped data
It is important to escape only filtered data. Although escaping alone can prevent many common security vulnerabilities, it should never be regarded as a substitute for filtering input. Tainted data must be first filtered and then escaped.
To escape output, you must first identify output. In general, this is much easier than identifying input because it relies on an action that you take. For example, to identify output being sent to the client, you can search for strings such as the following in your code:
echo
print
printf
Welcome back, {$html['username']}.";
?>
The htmlspecialchars( ) function is almost identical to htmlentities( ). It accepts the same arguments, and the only difference is that it is less exhaustive.
By using $html['username'] when sending the username to the client, you can be sure that special characters are not interpreted by the browser. If the username contains only alphanumeric characters, the escaping is not actually necessary, but it is a practice that adheres to Defense in Depth. Consistently escaping all output is a good habit that dramatically increases the security of your applications.
Another popular destination is a database. When possible, you should escape data used in an SQL query with an escaping function native to your database. For MySQL users, the best escaping function is mysql_real_escape_string( ). If there is no native escaping function for your database, addslashes( ) can be used as a last resort.
The following example demonstrates the proper escaping technique for a MySQL database:
No comments:
Post a Comment