Monday, March 12, 2007

Detecting Web Application Security Vulnerabilities

Web Application Vulnerability Detection with Code Review

Web application source code, independent of languages and platforms, is a major source for vulnerabilities. One of the CSI surveys on vulnerability distribution suggests that 64% of the time, a vulnerability crops up due to programming errors and 36% of the time, due to configuration issues. According to IBM labs, there is a possibility of at least one security issue contained in every 1,500 lines of code. One of the challenges a security professional faces when assessing and auditing web applications is to identify vulnerabilities while simultaneously performing a source code review.

Problem Domain

Several languages are popular for web applications, including Active Server Pages (ASP), PHP, and Java Server Pages (JSP). Every programmer has his own way of implementing and writing objects. Each of these languages has exposed several APIs and directives to make a programmer's life easy. Unfortunately, a programming language cannot offer any guarantee on security. It is the programmer's responsibility to ensure that his own code is secure against various attack vectors, some of which may be malicious in nature.

On the other side, it is imperative to get the developed code assessed from a security standpoint, externally or in-house, prior to deploying the code on production systems. It's impossible to use only one tool to determine vulnerabilities residing in the source code, given the customized nature of applications and the many ways in which programmers can code. Source code review requires a combination of tools and intellectual analysis to determine exposure. The source code may be voluminous, running into thousands or millions of lines in some cases. It is not possible to go through each line of code manually in a short time span. This is where tools come into play. A tool can only help in determining information; it is the intellect--with a security mindset--that must link this information together. This dual approach is the one normally advocated for a source code review.

Assumption

To demonstrate automated review, I present a sample web application written in ASP.NET. I've produced a sample Python script as a tool for source code analysis. This approach can work to analyze any web application written in any language. It is also possible to write your own tool using any programming language.

Method and Approach

I've divided my method for approaching a code review exercise into several logical steps with specific objectives:

  • Dependency determination
  • Entry point identification
  • Threat mapping and vulnerability detection
  • Mitigation and countermeasures

Dependency determination

Prior to commencing a code review exercise, you must understand the entire architecture and dependencies of the code. This understanding provides better overview and focus. One of the key objectives of this phase is to determine clear dependencies and to link them to the next phase. Figure 1 shows the overall architecture of a web shop in the case study under review.

architecture for the sample web application
Figure 1. Architecture for web application [webshop.example.com]

The application has several dependencies:

  • A database. The web application has MS-SQL Server running as the backend database. This interface must be examined when performing a code review.
  • The platform and web server. The application runs on the IIS web server with the .NET platform. This is helpful from two perspectives: 1) in securing deployment, and 2) in determining the source code type and language.
  • Web resources and languages. In this example, ASPX and ASMX are web resources. They are typical web applications and web services pages, written in the C# language. These resources help to determine patterns during a code review.
  • Authentication. The application authenticates users through an LDAP server. The authentication code is a critical component and needs analysis.
  • Firewall. The application layer firewall is in place and content filtering must be enabled.
  • Third-party components. Any third-party components being consumed by the application along with the integration code need analysis.
  • Information access from the internet. Other aspects that require considerations are RSS feeds and emails, information that an application may consume from the internet.

With this information in place, you are in a better position to understand the code. To reiterate, the entire application is coded in C# and is hosted on a web server running IIS. This is the target. The next step is to identify entry points to the application.

Entry point identification

The objective of this phase is to identify entry points to the web application. A web application can be accessed from various sources (Figure 2). It is important to evaluate every source; each has an associated risk.

web app entry points
Figure 2. Web application entry points

These entry points provide information to an application. These values hit the database, LDAP servers, processing engines, and other components in the application. If these values are not guarded, they can open up potential vulnerabilities in the application. The relevant entry points are:

  • HTTP variables. The browser or end-client sends information to the application. This set of requests contains several entry points such as form and query string data, cookies, and server variables (HTTP_REFERER, etc). The ASPX application consumes this data through the Request object. During a code review exercise, look for this object's usage.
  • SOAP messages. The application is accessible by web services over SOAP messages. SOAP messages are potential entry points to the web application.
  • RSS and Atom feeds. Many new applications consume third-party XML-based feeds and present the output in different formats to an end-user. RSS and Atom feeds have the potential to open up new vulnerabilities such as XSS or client-side script execution.
  • XML files from servers. The application may consume XML files from partners over the internet.
  • Mail system. The application may consume mails from mailing systems.

These are the important entry points to the application in the case study. It is possible to grab certain key patterns in the submitted data using regular expressions from multiple files to trace and analyze patterns.

Scanning the code with Python

scancode.py is a source code-scanning utility. It is simple Python script that automates the review process. This Python scanner has three functions with specific objectives:

  • The scanfile function scans the entire file for specific security-related regex patterns:

    ".*.[Rr]equest.*[^\n]\n" # Look for request object calls
    ".*.select .*?[^\n]\n|.*.SqlCommand.*?[^\n]\n" # Look for SQL execution points
    ".*.FileStream .*?[^\n]\n|.*.StreamReader.*?[^\n]\n" # Look for file system access
    ".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n" # Look for
    cookie and session information
    "" # Look for dependencies in the application
    ".*.[Rr]esponse.*[^\n]\n" # Look for response object calls
    ".*.write.*[^\n]\n" # Look for information going back to browser
    ".*catch.*[^\n]\n" # Look for exception handling
  • The scan4request function scans the file for entry points to the application using the ASP.NET Request object. Essentially, it runs the pattern ".*.[Rr]equest.*[^\n]\n".
  • The scan4trace function helps analyze the traversal of a variable in the file. Pass the name of a variable to this function and get the list of lines where it is used. This function is the key to detecting application-level vulnerabilities.

Using the program is easy; it takes several switches to activate the previously described functions.

D:\PYTHON\scancode>scancode.py
Cannot parse the option string correctly
Usage:
scancode -
flag -sG : Global match
flag -sR : Entry points
flag -t : Variable tracing
Variable is only needed for -t option

Examples:

scancode.py -sG details.aspx
scancode.py -sR details.aspx
scancode.py -t details.aspx pro_id

D:\PYTHON\scancode>

The scanner script first imports Python's regex module:

import re

Importing this module makes it possible to run regular expressions against the target file:

p = re.compile(".*.[Rr]equest.*[^\n]\n")

This line defines a regular expression--in this case, a search for the Request object. With this regex, the match() method collects all possible instances of regex patterns in the file:

m = p.match(line)

Looking for entry points

Now use scancode.py to scan the details.aspx file for possible entry points in the target code. Use the -sR switch to identify entry points. Running it on the details.aspx page produces the following results:

D:\PYTHON\scancode>scancode.py -sR details.aspx
Request Object Entry:
22 : NameValueCollection nvc=Request.QueryString;

This is the entry point to the application, the place where the code stores QueryString information into the NameValue collection set.

Here is the function that grabs this information from the code:

def scan4request(file):
infile = open(file,"r")
s = infile.readlines()
linenum = 0
print 'Request Object Entry:'
for line in s:
linenum += 1
p = re.compile(".*.[Rr]equest.*[^\n]\n")
m = p.match(line)
if m:
print linenum,":",m.group()

The code snippet shows the file being opened and the request object grabbed using a specific regex pattern. This same approach can capture all other entry points. For example, here's a snippet to identify cookie- and session-related entry points:

# Look for cookie and session management
p = re.compile(".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n")
m = p.match(line)
if m:
print 'Session Object Entry:'

Threat mapping and vulnerability detection print linenum,":",m.group()



Discovering entry points narrows the focus for threat mapping and
vulnerability detection. An entry point is essential to a trace. It is
important to unearth where this variable goes (execution flow) and its
impact on the application.

After locating these entry points to the application, you need to trace them and search for vulnerabilities.

The previous scan found a Request object entry in the application:

22 :    NameValueCollection nvc=Request.QueryString;

Running the script with the -t option will help to trace the variables. (For full coverage, trace it right through to the end, using all possible iterations).

D:\PYTHON\scancode>scancode.py -t details.aspx nvc
Tracing variable:nvc
NameValueCollection nvc=Request.QueryString;
String[] arr1=nvc.AllKeys;
String[] sta2=nvc.GetValues(arr1[0]);

This assigned a value from nvc to sta2, so that also needs a trace:

D:\PYTHON\scancode>scancode.py -t details.aspx sta2
Tracing variable:sta2
String[] sta2=nvc.GetValues(arr1[0]);
pro_id=sta2[0];

Here's another iteration; tracing pro_id:

D:\PYTHON\scancode>scancode.py -t details.aspx pro_id
Tracing variable:pro_id
String pro_id="";
pro_id=sta2[0];
String qry="select * from items where product_id=" + pro_id;
response.write(pro_id);

Finally, this is the end of the trace. This example has shown multiple traces of a single page, but it is possible to traverse multiple pages across the application. Figure 3 shows the complete output.

vulnerability detection with tracing
Figure 3. Vulnerability detection with tracing

As the source code and figure show, there is no validation of input in the source. There is a SQL injection vulnerability:

String qry="select * from items where product_id=" + pro_id;

The application accepts pro_id and passes it as is to the SELECT statement. It is possible to manipulate this statement and inject SQL payload.

Similarly, another line exposes a cross-site scripting (XSS) vulnerability:

response.write(pro_id);

Throwing back the (unvalidated) pro_id to the browser provides a position for an attacker to inject JavaScript to be executed in the victim's browser.

The scripts -sG option executes the global search routine. This routine looks for file objects, cookies, exceptions, etc. Each has potential vulnerabilities, and this scan can help you to identify them and map them to the respective threats:

D:\shreeraj_docs\perlCR>scancode.py -sG details.aspx
Dependencies:
13 :

Request Object Entry:
22 : NameValueCollection nvc=Request.QueryString;

SQL Object Entry:
49 : String qry="select * from items where product_id=" + pro_id;

SQL Object Entry:
50 : SqlCommand mycmd=new SqlCommand(qry,conn);

Response Object Entry:
116 : response.write(pro_id);

XSS Check:
116 : response.write(pro_id);

Exception handling:
122 : catch(Exception ex)

This code review approach takes minimal effort by detecting entry points, vulnerabilities, and variable tracing.

Mitigation and Countermeasure

After you have identified a vulnerability, the next step is to mitigate the threat. There are various ways to do this, depending on your deployment. For example, it's possible to mitigate SQL injection by adding a rule to the web application firewall to bypass a certain set of characters such as single and double quotes. The best way to mitigate this issue is by applying secure coding practices--providing proper input validation before consuming the variable at the code level. At the SQL level, it is important to use either prepared statements or stored procedures to avoid SQL SELECT statement injection. For mitigation of XSS vulnerabilities, it is imperative to filter out characters such as greater than (>) and less than (<) prior to serving any content to the end-client. These steps provide threat mitigation to the overall web application.

Conclusion

Code review is a very powerful tool for detecting vulnerabilities and getting to their actual source. This is the "whitebox" approach. Dependency determination, entry point identification, and threat mapping help detect vulnerability. All of these steps need architecture and code reviews. The nature of code is complex, so no single tool can meet all of your needs. As a professional, you need to write tools on the fly when doing code review and put them into action when the code base is very large. It is not feasible to go through each line of code.

In this scenario, one of the methods is to start with entry points, as discussed earlier in this article. You can build complex scripts or programs in any language to grab various patterns in voluminous source code and link them together. Tracing the variable or function is the key that can show up the entire traversal and greatly help in determining vulnerabilities.

http://www.oreillynet.com/pub/a/sysadmin/2006/11/02/webapp_security_scans.html?page=3