Monday, March 12, 2007

Detecting Web Application Security Vulnerabilities

Web Application Vulnerability Detection with Code Review

Web application source code, independent of languages and platforms, is a major source for vulnerabilities. One of the CSI surveys on vulnerability distribution suggests that 64% of the time, a vulnerability crops up due to programming errors and 36% of the time, due to configuration issues. According to IBM labs, there is a possibility of at least one security issue contained in every 1,500 lines of code. One of the challenges a security professional faces when assessing and auditing web applications is to identify vulnerabilities while simultaneously performing a source code review.

Problem Domain

Several languages are popular for web applications, including Active Server Pages (ASP), PHP, and Java Server Pages (JSP). Every programmer has his own way of implementing and writing objects. Each of these languages has exposed several APIs and directives to make a programmer's life easy. Unfortunately, a programming language cannot offer any guarantee on security. It is the programmer's responsibility to ensure that his own code is secure against various attack vectors, some of which may be malicious in nature.

On the other side, it is imperative to get the developed code assessed from a security standpoint, externally or in-house, prior to deploying the code on production systems. It's impossible to use only one tool to determine vulnerabilities residing in the source code, given the customized nature of applications and the many ways in which programmers can code. Source code review requires a combination of tools and intellectual analysis to determine exposure. The source code may be voluminous, running into thousands or millions of lines in some cases. It is not possible to go through each line of code manually in a short time span. This is where tools come into play. A tool can only help in determining information; it is the intellect--with a security mindset--that must link this information together. This dual approach is the one normally advocated for a source code review.


To demonstrate automated review, I present a sample web application written in ASP.NET. I've produced a sample Python script as a tool for source code analysis. This approach can work to analyze any web application written in any language. It is also possible to write your own tool using any programming language.

Method and Approach

I've divided my method for approaching a code review exercise into several logical steps with specific objectives:

  • Dependency determination
  • Entry point identification
  • Threat mapping and vulnerability detection
  • Mitigation and countermeasures

Dependency determination

Prior to commencing a code review exercise, you must understand the entire architecture and dependencies of the code. This understanding provides better overview and focus. One of the key objectives of this phase is to determine clear dependencies and to link them to the next phase. Figure 1 shows the overall architecture of a web shop in the case study under review.

architecture for the sample web application
Figure 1. Architecture for web application []

The application has several dependencies:

  • A database. The web application has MS-SQL Server running as the backend database. This interface must be examined when performing a code review.
  • The platform and web server. The application runs on the IIS web server with the .NET platform. This is helpful from two perspectives: 1) in securing deployment, and 2) in determining the source code type and language.
  • Web resources and languages. In this example, ASPX and ASMX are web resources. They are typical web applications and web services pages, written in the C# language. These resources help to determine patterns during a code review.
  • Authentication. The application authenticates users through an LDAP server. The authentication code is a critical component and needs analysis.
  • Firewall. The application layer firewall is in place and content filtering must be enabled.
  • Third-party components. Any third-party components being consumed by the application along with the integration code need analysis.
  • Information access from the internet. Other aspects that require considerations are RSS feeds and emails, information that an application may consume from the internet.

With this information in place, you are in a better position to understand the code. To reiterate, the entire application is coded in C# and is hosted on a web server running IIS. This is the target. The next step is to identify entry points to the application.

Entry point identification

The objective of this phase is to identify entry points to the web application. A web application can be accessed from various sources (Figure 2). It is important to evaluate every source; each has an associated risk.

web app entry points
Figure 2. Web application entry points

These entry points provide information to an application. These values hit the database, LDAP servers, processing engines, and other components in the application. If these values are not guarded, they can open up potential vulnerabilities in the application. The relevant entry points are:

  • HTTP variables. The browser or end-client sends information to the application. This set of requests contains several entry points such as form and query string data, cookies, and server variables (HTTP_REFERER, etc). The ASPX application consumes this data through the Request object. During a code review exercise, look for this object's usage.
  • SOAP messages. The application is accessible by web services over SOAP messages. SOAP messages are potential entry points to the web application.
  • RSS and Atom feeds. Many new applications consume third-party XML-based feeds and present the output in different formats to an end-user. RSS and Atom feeds have the potential to open up new vulnerabilities such as XSS or client-side script execution.
  • XML files from servers. The application may consume XML files from partners over the internet.
  • Mail system. The application may consume mails from mailing systems.

These are the important entry points to the application in the case study. It is possible to grab certain key patterns in the submitted data using regular expressions from multiple files to trace and analyze patterns.

Scanning the code with Python is a source code-scanning utility. It is simple Python script that automates the review process. This Python scanner has three functions with specific objectives:

  • The scanfile function scans the entire file for specific security-related regex patterns:

    ".*.[Rr]equest.*[^\n]\n" # Look for request object calls
    ".*.select .*?[^\n]\n|.*.SqlCommand.*?[^\n]\n" # Look for SQL execution points
    ".*.FileStream .*?[^\n]\n|.*.StreamReader.*?[^\n]\n" # Look for file system access
    ".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n" # Look for
    cookie and session information
    "" # Look for dependencies in the application
    ".*.[Rr]esponse.*[^\n]\n" # Look for response object calls
    ".*.write.*[^\n]\n" # Look for information going back to browser
    ".*catch.*[^\n]\n" # Look for exception handling
  • The scan4request function scans the file for entry points to the application using the ASP.NET Request object. Essentially, it runs the pattern ".*.[Rr]equest.*[^\n]\n".
  • The scan4trace function helps analyze the traversal of a variable in the file. Pass the name of a variable to this function and get the list of lines where it is used. This function is the key to detecting application-level vulnerabilities.

Using the program is easy; it takes several switches to activate the previously described functions.

Cannot parse the option string correctly
scancode -
flag -sG : Global match
flag -sR : Entry points
flag -t : Variable tracing
Variable is only needed for -t option

Examples: -sG details.aspx -sR details.aspx -t details.aspx pro_id


The scanner script first imports Python's regex module:

import re

Importing this module makes it possible to run regular expressions against the target file:

p = re.compile(".*.[Rr]equest.*[^\n]\n")

This line defines a regular expression--in this case, a search for the Request object. With this regex, the match() method collects all possible instances of regex patterns in the file:

m = p.match(line)

Looking for entry points

Now use to scan the details.aspx file for possible entry points in the target code. Use the -sR switch to identify entry points. Running it on the details.aspx page produces the following results:

D:\PYTHON\scancode> -sR details.aspx
Request Object Entry:
22 : NameValueCollection nvc=Request.QueryString;

This is the entry point to the application, the place where the code stores QueryString information into the NameValue collection set.

Here is the function that grabs this information from the code:

def scan4request(file):
infile = open(file,"r")
s = infile.readlines()
linenum = 0
print 'Request Object Entry:'
for line in s:
linenum += 1
p = re.compile(".*.[Rr]equest.*[^\n]\n")
m = p.match(line)
if m:
print linenum,":",

The code snippet shows the file being opened and the request object grabbed using a specific regex pattern. This same approach can capture all other entry points. For example, here's a snippet to identify cookie- and session-related entry points:

# Look for cookie and session management
p = re.compile(".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n")
m = p.match(line)
if m:
print 'Session Object Entry:'

Threat mapping and vulnerability detection print linenum,":",

Discovering entry points narrows the focus for threat mapping and
vulnerability detection. An entry point is essential to a trace. It is
important to unearth where this variable goes (execution flow) and its
impact on the application.

After locating these entry points to the application, you need to trace them and search for vulnerabilities.

The previous scan found a Request object entry in the application:

22 :    NameValueCollection nvc=Request.QueryString;

Running the script with the -t option will help to trace the variables. (For full coverage, trace it right through to the end, using all possible iterations).

D:\PYTHON\scancode> -t details.aspx nvc
Tracing variable:nvc
NameValueCollection nvc=Request.QueryString;
String[] arr1=nvc.AllKeys;
String[] sta2=nvc.GetValues(arr1[0]);

This assigned a value from nvc to sta2, so that also needs a trace:

D:\PYTHON\scancode> -t details.aspx sta2
Tracing variable:sta2
String[] sta2=nvc.GetValues(arr1[0]);

Here's another iteration; tracing pro_id:

D:\PYTHON\scancode> -t details.aspx pro_id
Tracing variable:pro_id
String pro_id="";
String qry="select * from items where product_id=" + pro_id;

Finally, this is the end of the trace. This example has shown multiple traces of a single page, but it is possible to traverse multiple pages across the application. Figure 3 shows the complete output.

vulnerability detection with tracing
Figure 3. Vulnerability detection with tracing

As the source code and figure show, there is no validation of input in the source. There is a SQL injection vulnerability:

String qry="select * from items where product_id=" + pro_id;

The application accepts pro_id and passes it as is to the SELECT statement. It is possible to manipulate this statement and inject SQL payload.

Similarly, another line exposes a cross-site scripting (XSS) vulnerability:


Throwing back the (unvalidated) pro_id to the browser provides a position for an attacker to inject JavaScript to be executed in the victim's browser.

The scripts -sG option executes the global search routine. This routine looks for file objects, cookies, exceptions, etc. Each has potential vulnerabilities, and this scan can help you to identify them and map them to the respective threats:

D:\shreeraj_docs\perlCR> -sG details.aspx
13 :

Request Object Entry:
22 : NameValueCollection nvc=Request.QueryString;

SQL Object Entry:
49 : String qry="select * from items where product_id=" + pro_id;

SQL Object Entry:
50 : SqlCommand mycmd=new SqlCommand(qry,conn);

Response Object Entry:
116 : response.write(pro_id);

XSS Check:
116 : response.write(pro_id);

Exception handling:
122 : catch(Exception ex)

This code review approach takes minimal effort by detecting entry points, vulnerabilities, and variable tracing.

Mitigation and Countermeasure

After you have identified a vulnerability, the next step is to mitigate the threat. There are various ways to do this, depending on your deployment. For example, it's possible to mitigate SQL injection by adding a rule to the web application firewall to bypass a certain set of characters such as single and double quotes. The best way to mitigate this issue is by applying secure coding practices--providing proper input validation before consuming the variable at the code level. At the SQL level, it is important to use either prepared statements or stored procedures to avoid SQL SELECT statement injection. For mitigation of XSS vulnerabilities, it is imperative to filter out characters such as greater than (>) and less than (<) prior to serving any content to the end-client. These steps provide threat mitigation to the overall web application.


Code review is a very powerful tool for detecting vulnerabilities and getting to their actual source. This is the "whitebox" approach. Dependency determination, entry point identification, and threat mapping help detect vulnerability. All of these steps need architecture and code reviews. The nature of code is complex, so no single tool can meet all of your needs. As a professional, you need to write tools on the fly when doing code review and put them into action when the code base is very large. It is not feasible to go through each line of code.

In this scenario, one of the methods is to start with entry points, as discussed earlier in this article. You can build complex scripts or programs in any language to grab various patterns in voluminous source code and link them together. Tracing the variable or function is the key that can show up the entire traversal and greatly help in determining vulnerabilities.

Open Tools for MySQL Administrators

MySQL provides some tools to monitor and troubleshoot a MySQL server, but they don't always suit a MySQL developer or administrator's common needs, or may not work in some scenarios, such as remote or over-the-web monitoring. Fortunately, the MySQL community has created a variety of free tools to fill the gaps. On the other hand, many of these are hard to find via web searches. In fact, web searches can be frustrating because they uncover abandoned or special-purpose, not ready-to-use projects. You could spend hours trying to find tools for monitoring and troubleshooting your MySQL servers. What's a tool-seeker to do?

Relax! I've already done the work, so you won't have to. I'll point you to the tools I've actually found useful. At the end of this article I'll also list those I didn't find helpful.

This article is about tools to discover and monitor the state of your server, so I won't discuss programs for writing queries, designing tables, and the like. I'm also going to focus exclusively on free and open source software.

Tools to Monitor Queries and Transactions

The classic tool for monitoring queries is Jeremy Zawodny's mytop. It is a Perl program that runs in a terminal and displays information about all connections in a tabular layout, similar to the Unix top program's process display. Columns include the connection ID, the connection's status, and the text of the current query. From this display you can select a query to EXPLAIN, kill a query, and a few other tasks. A header at the top of the display gives information about the server, such as version, uptime, and some statistics like the number of queries per second. The program also has some other functions, but I never found myself using them much.

There are mytop packages for various GNU/Linux distributions, such as Gentoo and Fedora Core, or you can install one from Jeremy's website. It is very small and has minimal dependencies. On the downside, it hasn't been maintained actively for a while and doesn't work correctly with MySQL 5.x.

A similar tool is mtop. It has a tabular process display much like mytop, and although it lacks some features and adds others, the two programs are very similar. It is also a Perl script and there are installation packages for some operating systems, or you can download it from SourceForge. Unfortunately, it is not actively maintained and does not work correctly on newer versions of MySQL.

Some programmers have also created scripts to output MySQL's process list for easy consumption by other scripts. An example is this SHOW FULL PROCESSLIST script, available from the always-useful MySQL Forge.

My own contribution is innotop, a MySQL and InnoDB monitor. As MySQL has become increasingly popular, InnoDB has become the most widely used transactional MySQL storage engine. InnoDB has many differences from other MySQL storage engines, so it requires different monitoring methods. It exposes internal status by dumping a potentially huge amount of semi-formatted text in response to the SHOW INNODB STATUS command. There's a lot of raw data in this text, but it's unusable for real-time monitoring, so I wrote innotop to format and display it conveniently. It is the main monitoring tool at my current employer.

Innotop is much more capable than the other tools I've mentioned, and can replace them completely. It has a list of processes and status information, and offers the standard functions to kill and explain queries. It also offers many features that are not in any other tool, including being able to list current transactions, lock waits, deadlock information, foreign key errors, I/O and log statistics, InnoDB row operation and semaphore statistics, and information on the InnoDB buffer pool, memory usage, insert buffer, and adaptive hash index. It also displays more standard MySQL information than mytop and its clones, such as compact, tabular displays of current and past status information snapshots. It is very configurable and has interactive help.

Installation is simple, because innotop is a ready-to-run Perl script, but there are no installation packages yet, so you must download it from my website.

There are also some web-based tools. There are two web-based mytop clones, phpMyTop and ajaxMyTop. These are useful when you don't have shell access and can't connect remotely to your database server, but can connect from a web server. ajaxMyTop is more recent and seems to be more actively developed. It also feels more like a traditional GUI program, because thanks to Ajax, the entire page does not constantly refresh itself.

Another web-based tool is the popular phpMyAdmin package. phpMyAdmin is a Swiss Army Knife, with features to design tables, run queries, manage users and more. Its focus isn't on monitoring queries and processes, but it has some of the features I've mentioned earlier, such as showing a process list.

Finally, if you need to monitor what's happening inside a MySQL server and don't care to--or can't--use a third-party tool, MySQL's own mysqladmin command-line program works. For example, to watch incremental changes to the query cache, run the command:

$ mysqladmin extended -r -i 10 | grep Qcache

Of course, innotop can do that for you too, only better. Take a look at its "V" mode. Still, this can be handy when you don't have any way to run innotop.

Tools to Monitor a MySQL Server

Sometimes, rather than monitoring the queries running in a MySQL server, you need to analyze other aspects of the system's performance. You could use standard command-line utilities to monitor the resources used by the MySQL process on GNU/Linux, or you could run Giuseppe Maxia's helpful script to measure MySQL resource consumption. This tool recursively examines the processes associated with the MySQL server's process ID, and prints a report on what it finds. For more information, read Giuseppe's own article on the O'Reilly Databases blog.

The MySQL Forge website is an excellent place to discover tips, tricks, scripts, and code snippets for daily MySQL administration and programming tasks. For example, there's an entry to help you measure replication speed, a "poor man's query profiler" to capture queries as they fly by on the network interface, and much more.

Another excellent resource is mysqlreport, a well-designed program that turns MySQL status information into knowledge. It prints out a report of relevant variables, sensibly arranged for an experienced MySQL user. I find this tool indispensable when I have to troubleshoot a server without knowing anything about it in advance. For example, if someone asks me to help reduce load on a MySQL server that's running at 100 percent CPU, the first thing I do is to run mysqlreport. I can get more information by glancing at its output than I could in 10 minutes of talking to the customer. It immediately tells me where to focus my efforts. If I see a high key read ratio and a high percentage of index scans, I can immediately look for large indexes and a key buffer that's too small. That intuition could take many minutes to develop just by examining SHOW STATUS.

The mysqlreport website has full information on how to install and use the program, but better yet, there are excellent tutorials on how to interpret its output, with real examples. Some of these go into detail on MySQL internals, and I recommend them to any MySQL developer.

Another common task is setting up automated systems to monitor your server and let you know if it's alive. You could write your own monitor, or you could just plug in a ready-made one. According to a MySQL poll, Nagios is the most popular tool for doing this. There's also a Watchdog mysql monitor plugin for mon, the Linux scheduling and alert management tool. We currently use a home-grown system at my employer, but we're looking at using Nagios soon.

Tools I Didn't Find Useful

The Quicomm MySQL Monitor is a web-based administration tool similar to phpMyAdmin, not a monitor in the same sense as mytop or innotop. It offers relatively few features compared to phpMyAdmin.

Another web-based tool is MySysop, which is billed as a "MySQL system optimizer", though it certainly doesn't do anything on its own to optimize a MySQL system. It offers recommendations I would not trust without doing enough investigation to arrive at the same conclusions. By the time I could install and run this system, I'd have long since run mysqlreport.

Finally, I've never understood how to even use the Google mMaim (MySQL Monitoring And Investigation Module). It is part of Google's open source code contributions, and Google probably uses it internally to monitor its servers. However, it's not obvious to the rest of the world how to do this, as evidenced by the mailing list. The mailing list also reveals that Google released the code simply for the sake of releasing it. While I appreciate the gesture, I can't find any use for the code.


If you're trying to find tools for your own work, I recommend innotop and mysqlreport, and a healthy dose of command-line competence. I used to rely on mytop for my routine monitoring, but now I use innotop, because it shows much more information, including all-important details about transactions. When I need to analyze a server to discover what's wrong with it, it's impossible to match mysqlreport's instant snapshot of server health and activity. When I need to know about MySQL's resource consumption and performance, I augment standard command-line utilities with scripts, such as Giuseppe Maxia's.

There are certainly other tools, but the ones mentioned here are free and open source, have nearly every feature you can find in other tools, and do a lot you can't find elsewhere at all.

VOIP on the Nokia 770 Internet Tablet

I ended my previous article (Linux on the Nokia 770 Internet Tablet) by saying that the release of the OS 2006 prepared the way for some serious VOIP work. The 770 can now make SIP-based VOIP phone calls and is more like what you'd expect from Nokia--a phone!

What does it take to upgrade the machine, and how difficult is it? As it happens, not much and not very, but when you're at risk of bricking the machine, there's always a certain level of anxiety.

The first step in the upgrade is to visit the Nokia 770 support site for a Windows download or Maemo's 770 download page for Linux and Mac OS X. Download the new OS. You need to provide the machine number of your 770; the download pages provide instructions on how to find it.

The next step is to do it! On Linux and Mac OS X, connect the 770 to the host machine with the USB cable and run a script while holding down the home button (and possibly your breath, as well). I flubbed my first attempt by letting go of the button too soon. The good news was that the only result was a failure notice on the host machine console:

SW version in image: SU-18_2006SE_1.2006.26-8_PR_MR0
Image '2nd', size 8704 bytes
Image 'secondary', size 87040 bytes
Image 'xloader', size 13824 bytes
Image 'initfs', size 1890304 bytes
Image 'kernel', size 1266560 bytes
Image 'rootfs', size 60030976 bytes
Suitable USB device not found, waiting
USB device found at bus 002, device address 002-0421-0105-02-00
Sending request 0x01 failed: Unknown error: 0
NOLO_REQ_GET_STATUS: Invalid argument
Device status query failed

Holding down the button for the whole operation was the way forward. Here is my success:

SW version in image: SU-18_2006SE_1.2006.26-8_PR_MR0
Image '2nd', size 8704 bytes
Image 'secondary', size 87040 bytes
Image 'xloader', size 13824 bytes
Image 'initfs', size 1890304 bytes
Image 'kernel', size 1266560 bytes
Image 'rootfs', size 60030976 bytes
Suitable USB device not found, waiting
USB device found at bus 002, device address 002-0421-0105-02-00
Found board Nokia 770 (F5)
NOLO version 0.9.0
Sending xloader image (13 kB)...
100% (13 of 13 kB, avg. 385 kB/s)
Sending secondary image (85 kB)...
100% (85 of 85 kB, avg. 765 kB/s)
Flashing bootloader... done.
Sending kernel image (1236 kB)...
100% (1236 of 1236 kB, avg. 796 kB/s)
Flashing kernel... done.
Sending initfs image (1846 kB)...
100% (1846 of 1846 kB, avg. 795 kB/s)
Flashing initfs... done.
Sending and flashing rootfs image (58624 kB)...
100% (58624 of 58624 kB, avg. 598 kB/s)
Finishing flashing... done

Looks, etc.

What you get is an updated interface with more operations available from the desktop.

This is the same process I demonstrated in my previous article if you want to add to the basic Linux install by importing more apps such as the terminal. I'm using it and Joe to write this report (Emacs keystrokes just didn't work out for me on this machine, and I didn't get the hang of the double escapes with Vi either). There is a version of Vim that works quite well, though.

The catalog of apps is fairly similar, except there are some that haven't made it across yet, and some new ones as well.

I should put in a warning here about a theme called LCARS. It's a Star Trek thing, which looks pretty cool. The minus side starts with hard-to-see fonts in daylight. From there, it grew significantly worse on my configuration, with corrupted data files and various apps refusing to start. The problem, I think, is that this theme is very weighty for this machine, and the OS doesn't so far degrade very nicely when it runs out of memory. This only affected runtime files, so an uninstall followed by a couple of reboots seemed to fix everything.

At least, that was true for me on release 1 of OS 2006. The recently released update cured all those problems on my machine. LCARS now runs like a charm and looks pretty good as well.

Another tangent is email. The bundled client is quite OK for dealing with a few emails but it gets old very quickly if you get lots. For example, you can't tag emails so deleting quite a few is a major pain. It won't handle groups at all and GMail isn't all that great, either.

Pine to the rescue! I used to prefer Mutt but it isn't available for this platform, and I'm on the road and don't have a suitable machine to do it myself. Anyway, running Pine on the 770 is way cool. The easiest way to get it is to add mistral user to your repositories list, update available packages, and get Pine.

If you're new to Pine, the best way to edit the config file .pinerc is through the internal setup within the program. Be sure to enable the mouse in xterm, as this allows you to tap options on the screen rather than having to drop down the menu item in the improved Xterm that will send a Ctrl signal. Another note: as initially configured, the emails you send will come from User. This is easy to fix. See Jimc's Nokia 770 page for details.


Without importing any apps, the limit of your VOIP calling is to fellow Gmailers. You're not a Gmailer, you say? Well, as a 770 owner, you already have an account. It's just a pity that the Opera browser shipped with OS 2006 can't fully cope with Gmail. Opera tells me that the next version is better.

Another alternative is to download a client from the Gizmo Project. Once you open the app, you receive 25 cents of free calls if you register. At 1 cent per minute to quite a few places, the rates are quite competitive. Calls to fellow Gizmo users are free. You can also register a normal phone number for your device at Gizmo for $12 for three months. Calling is very straightforward. You put in the number, put the 770 up to your ear, and talk away. Top up your minutes by clicking on "add credit" in the "home" section.

There's also Tapioca, which is "a GoogleTalk client with VoIP and instant messaging capabilities, with a simple user interface. It can be installed on the device without any conflict with the product's built-in Gtalk client."

Another project called Minisip comes from the postgrad students at the Royal Institute of Technology in Stockholm, Sweden. It's quite advanced, but there are no downloads at the moment due to code rewrites.


Finally there's a port of the well-known Asterisk that will do VOIP as well as PABX duties. Getting this on a 770 isn't, at the moment, for the faint of heart though...but if you're a long-term Asterisk user, you won't be faint of heart.

This is what you want, I'm sure. Here's what I did to get a working (as in "non-crashing") version of Asterisk 1.2.1 (the latest release from Digium) on the Nokia 770.

If you're in a hurry or you don't want to mess with compiling and Scratchbox (or you simply don't know what those are), just skip to the binaries.

  • Start Scratchbox.
  • From within Scratchbox, run wget to download the latest Asterisk sources.
  • Unarchive the sources with tar xvfz asterisk-1.2.1.tar.gz. This will give you an asterisk-1.2.1 folder. Change to that folder (cd asterisk-1.2.1).
  • Patch the main Makefile and the one for the GSM codec in order to make them compile for the 770. Download both diffs with wget and wget
  • Patch the main Makefile with patch Makefile Makefile.diff.

There are eight steps to go; read more at Installing Asterisk on the Nokia 770.

Note: A point of interest here is that the linked Asterisk Nokia 770 binary includes a SIP client for OS 2005, which might be useful if you don't want to upgrade for other reasons.

The Scratchbox reference means that you first need to install the Maemo SDK. Otherwise, you can pick up toward the end of the instructions and get a ready-made binary, which needs some work to install...

  • You're ready to move the binaries to your Nokia 770. Go to /tmp/ast121/ and type tar cvfz asterisk-1.2.1-nokia770-arm-binary.tar.gz *. You can also download the Nokia 770 Asterisk binary directly. Drop the files on your memory card or scp them from your machine--your choice.

    Another note: As I write this, the binary for OS2006 does not work due to missing libraries. I imagine the fix is on its way, though.

  • On the 770, start an XTerm and become root.
  • Go to the folder where you dropped the asterisk-1.2.1-nokia770-arm-binary.tar.gz file and (as root) type tar -zvx -C / -f asterisk-1.2.1-nokia770-arm-binary.tar.gz.

    Note: The easiest way to become root is to get Becomeroot from the's application list. With that on board, sudo su gives you a passwordless root.

  • That's all. To run Asterisk, edit the configuration files at /etc/asterisk, then type asterisk -vvvvvc to start the program and get a console prompt.

Other Things

There is some interesting stuff coming up with handwriting recognition. At a recent Symbian Smartphone show, I saw both Symbian and 770 demos of vastly improved systems. The one from MyScript recognized whole lines of cursive linked writing rather than just one letter at a time. XT9 also showed an improved version of the current system.

Some people call the 770 "the new Zaurus" but really the only comparison is Linux and the degree of enthusiasm around. Nokia seems fully aware of what it has, which is more than Sharp ever demonstrated, at least in markets other than Japan. Nokia also has the advantage of having much wider distribution channels.

Very special thanks to Gala's fourth-year computer science students at Simferopol University for showing me where to get a WLAN connection for this article. Special thanks as well to Ciaron Linstead in Berlin for extensive use of his network, which allowed me to get Pine working, among other things.

A New Visualization for Web Server Logs

There are well over a hundred web server log analyzers (Google Directory for Log Analysis) or web statistics tools ranging from commercial offerings such as WebTrends to open source ones such as AWStats. These take web server logfiles and display numbers such as page views, visits, and visitors, as well as graphs over various time ranges. This article presents the same data in those logfiles in a very different way: as a 3D plot. By the end of this article, I hope you will agree with me that the visualization described herein is a novel and useful way to view the content of logfiles.

The logfiles of web servers record information on each HTTP request they receive, such as the time, the sender's IP address, the request URL, and the status code. The items in each request are fairly orthogonal to one another. The IP address of a client has no relation to the URL that it requests, nor does the status code of the request to the time of the request. If that is the case, what could be a better way to display these n columns from the logfiles than an n-dimensional plot?

When an administrator observes anomalous behavior on a web server, she reaches out for web statistics reports, as they are usually all there is as a record of past activity. These often prove fruitless, mainly because web statistics is primarily a marketing-oriented view of web server activity. The next step is to take the raw logfiles apart with ad hoc scripts. The sheer mass of data makes it difficult to reduce it to a few numbers that reveal the cause of the problem. Another complication is that you may not quite know what you are looking for other than that it is abnormal behavior. The path this article takes is to provide a visualization of raw data such that cause or causes make themselves visible. This comes from the real-life experience of a client, where crippling performance problems appeared out of nowhere.

The Plot

The scatter plot in Figure 1 shows more than half a million HTTP page requests (each request is a dot) in 3D space. The axes are:

  • X, the time axis--a full day from midnight to midnight of November 16.
  • Y, the requester's IP address, with the conventional dotted decimal format sorted and given an ordinal number between 1 and 120,000, representing the number of clients that accessed the web server.
  • Z, the URL (or content) sorted by popularity. Of the approximately 60,000 distinct pages on the site, the most popular URLs are near the zero point of the Z-axis and the least popular ones at the top.

3D scatter plot of a good day
Figure 1. Scatter plot showing HTTP requests

If the plotted parameters were truly orthogonal, you could expect a random distribution: a flat featureless plot. The parameters, however, are not completely independent of one another. For example, the IP ranges for Italy may prefer the Italian pages on the website. Therefore instead of a random plot, there are clusters in the 3D space. If you think about it, that does not seem unreasonable: the home page is probably the most visited page on a website. Studies (especially Jakob Nielsen on website popularity and Jakob Nielsen on traffic log patterns) argue convincingly that popularity closely follows Zipf's law: a log curve with a long tail. Hence the dense horizontal layer at the bottom in Figure 1. The vertical rectangular planes are search crawlers. They request pages over the whole content space from a small number of IP addresses and do that over the whole day. Therefore, clustering along each of the three dimensions is common.

The Case Study

The website of a client grew inexplicably sluggish one day. Since the web server, CMS, and auxiliary servers had run well for the preceding months, the only rational explanation pointed to an unusual request pattern. The web log-analysis reports showed nothing out of the ordinary. Command-line scripts (with awk, sort, grep, and friends) running over the logfiles also revealed no anomalies. I used Gnuplot to graph the requests in 3D space. (See also an excellent Gnuplot introduction) Some time later, the 3D plot made the culprit evident.

3D scatter plot of a bad day
Figure 2. Scatter plot of a bad day.

The thick pillar in the plot stands out like a sore thumb. This is a dense set of requests in a short time (about 100 minutes on the X-axis, which represents 24 hours) from a single IP address (Y-axis) and going over the whole content space (Z-axis). Why should it cause trouble? Large-scale CMS servers generate content on-the-fly from a database. Caches usually handle most requests, so only the small number of requests that are not currently in the cache should require database activity. On this particular CMS, the caches keep content for 15 minutes. When the client requested all of the pages in a short time, the high number of cache misses placed a heavy load on the database. This resulted in deteriorated performance. Search crawlers such as Yahoo Slurp and Googlebot do pretty much the same thing, but they spread the load over a much longer period.

The Process

Now that you have seen the output, here's how to generate it. The input is, of course, an access logfile that has lines of data, one per HTTP request. A typical line from an Apache server conforms to the NCSA combined access logfile standard. (See the Combined Log Format description at Note that I've wrapped the long line: - - [15/Jan/2006:21:12:29 +0100] "GET
/index.php?level=2 HTTP/1.1" 200 5854 ""
"Mozilla/5.0 (X11; U; Linux i6 86; en-US; rv:1.7.3)

The Perl script at the end of the article takes sequences of these lines and condenses them to just what Gnuplot needs. Run it with an access logfile and redirect it to an output file, such as gnuplot.input, from the command line:

$ perl access_log > gnuplot.input

The output will be a series of lines matching those of the access logfile. For the previous line from the access log, the corresponding output is:

15/Jan/2006:21:12:29 906 41 200

The fields in gnuplot.input, the output file of the Perl script, are date/time, ip rank (906), url rank (41), and status code.

To display the sequence of lines in Gnuplot, give it the commands:

$ gnuplot
set style data dots
set xdata time
set timefmt "%d/%b/%Y:%H:%M:%S"
set zlabel "Content"
set ylabel "IP address"
splot "gnuplot.input" using 1:2:3


If the plot is too dense--as was the case for me--thin it down by
telling Gnuplot to only use every nth data point. For example, I
thinned Figure 1 by plotting every tenth point with the Gnuplot splot command:

splot "gnuplot.input" using 1:2:3 every 10

Figure 3 shows the corresponding scatter plot.

Thinned 3D scatter plot of a good day

Figure 3. Thinned scatter plot

Gnuplot makes it easy to focus on a part of the plot by setting the
axes ranges. Figure 4 shows a small part of the Y- and Z-axes. The
almost continuous lines that run parallel to the time axis are
monitoring probes that regularly request the same page. Four of them
should be clearly visible. In addition, I changed the eye position.

Monitoring probes visible after reducing the Y and Z ranges.

Figure 4. Reduced Y and Z ranges showing monitoring probes

Because real people need sleep, it should be possible to make out
the diurnal rhythms that rule our lives. This is evident in Figure 4.
The requests are denser from 08:00 to about 17:00 and quite sparse in
the early hours of the morning.

Changing the viewing angle can give you a new point of view. Gnuplot lets you do it in one of two ways: with the command line set view or interactively with a click and drag of the mouse.

The Pièce de Résistance

Because a display of 3D plots is difficult to see in three
dimensions without stereoscopic glasses, I used a few more
manipulations to "jitter" the image such that the depth in the picture
is visible. The plot in Figure 5 is an example of this. It was easy to
generate with more Gnuplot commands followed by GIF animation with ImageMagick.

An animated scatter plot

Figure 5. A animated GIF of the scatter plot that hints at the 3D structure

Further Work

With Gnuplot 4.2, which is still in beta, it is now possible to draw
scatter plots in glorious color. Initial tests show that using color
for the status code dimension makes the plots even more informative.
Stay tuned.


Though the 3D plots present no hard numbers or trend lines, the
scatter plot as described and illustrated above may give a more
intuitive view of web server requests. Especially when diagnosing
problems, this alternative way of presenting logfile data can be more
useful than the charts and reports of a standard log analyzer tool.

Code Listings

The Perl script:

# convert access log files to gnuplot input
# Raju Varghese. 2007-02-03

use strict;

my $tempFilename = "/tmp/temp.dat";
my $ipListFilename = "/tmp/iplist.dat";
my $urlListFilename = "/tmp/urllist.dat";

my (%ipList, %urlList);

sub ip2int {
my ($ip) = @_;
my @ipOctet = split (/\./, $ip);
my $n = 0;
foreach (@ipOctet) {
$n = $n*256 + $_;
return $n;

# prepare temp file to store log lines temporarily
open (TEMP, ">$tempFilename");

# reads log lines from stdin or files specified on command line

while (<>) {
my ($ip, undef, undef, $time, undef, undef, $url, undef) = split;
$time =~ s/\[//;
next if ($url =~ /(gif|jpg|png|js|css)$/);
print TEMP "$time $ip $url $sc\n";

# process IP addresses

my @sortedIpList = sort {ip2int($a) <=> ip2int($b)} keys %ipList;
my $n = 0;
open (IPLIST, ">$ipListFilename");
foreach (@sortedIpList) {
print IPLIST "$n $ipList{$_} $_\n";
$ipList{$_} = $n;
close (IPLIST);

# process URLs

my @sortedUrlList = sort {$urlList {$b} <=> $urlList {$a}} keys %urlList;
$n = 0;
open (URLLIST, ">$urlListFilename");
foreach (@sortedUrlList) {
print URLLIST "$n $urlList{$_} $_\n";
$urlList{$_} = $n;
close (URLLIST);

close (TEMP); open (TEMP, $tempFilename);
while () {
my ($time, $ip, $url, $sc) = split;
print "$time $ipList{$ip} $urlList{$url} $sc\n";
close (TEMP);

How Linux and open-source development could change the way we get things done

An army of disheveled computer programmers has built an operating system called Linux based on a business model that seems to have been written with everything but business in mind. Instead of charging customers as much as the market can bear, Linux is given away for free; instead of hiding information from competitors, Linux programmers share their work with the world; instead of working for money, Linux developers are motivated primarily by adrenaline, altruism, and the respect of their peers.

Despite this unusual foundation, Linux is booming and even beginning to challenge Microsoft's control of the operating system industry. Linux may eventually pull the rug out from under the richest company in the world. It may not. But no matter what happens, it has already shown that money doesn't have to make the world, even the business world, go round. In fact, as technology improves and computers connect and create even more of our society, the principles of cooperation and collaboration that drive Linux may well spread to other fields: from computers, to medicine, to the law.

The Source

The Linux movement kick-started in 1991 when Linus Torvalds, a puckish graduate student at the University of Helsinki, got frustrated with his rickety computer. Refusing to buy another one, he wrote a new operating system--the core programs by which applications (like Microsoft Word) talk to hardware (like microprocessors). When finished, instead of running down to the patent office, he posted his code on the Internet and urged other programmers to download it and work with him to improve it. A few emailed back suggestions, some of which Torvalds took. A few more wrote the next day and a couple more the day after that. Torvalds worked constantly with these new colleagues, publicly posting each improvement and delegating responsibility to more and more programmers as the system grew. By 1994, Linux (a combination of "Linus" and "Unix," another operating system) had 100,000 users. Today, it has between 10 and 20 million users and is the fastest growing operating system in the world.

But Linux (rhymes with 'cynics') is different from almost every other operating system available. For one thing, it's downloadable for free straight off the Web. It's also open source, meaning that the source code, the program's all-important DNA, is open for anyone to look at, test, and modify. Most software is developed so that only the original authors can examine and change the code; with open-source models, however, anyone can do it if they have a computer and the right intuition.

To see the power of this model, consider what happens when you're running Microsoft Windows or Macintosh OS and your computer crashes: You stamp your feet and poke a twisted paper clip into a tiny reset button. You probably don't know what happened and it's probably going to happen again. Since you've never seen the source code, it probably doesn't even occur to you that you could fix the problem at its root. With Linux, everything's transparent and, even if you aren't an expert, you can simply post your question on a Linux-help Web page and other users can usually find solutions within hours, if not minutes. (The amorphous Linux community recently won InfoWorld's Product of the Year award for Best Technical Support.) It's also entirely possible that someone--perhaps you--will write some new code that fixes the problem permanently and that Linux developers, led by Torvalds, will incorporate into the next release. Presto, that problem's fixed and no one will need paper clips to fix it again.

To make another analogy, fixing an error caused by a normal software product is like trying to fix a car with the hood welded shut. With Linux, not only can you easily pop the hood open, there is extensive documentation telling you how everything works and how it all was developed; there's also a community of thousands of mechanics who will help you put in a new fuel pump if asked. In fact, the whole car was built by mechanics piecing it together in their spare time while emailing back and forth across the Web.

The obvious threat to this type of open development is appropriation. What if someone lifts all the clever code off the Web, copyrights it, and sells it? What if someone takes the code that you wrote to fix your crashed computer (or busted fuel pump), copyrights it, and markets it for $19.95? Well, they can't. When Torvalds created Linux, he protected it under the GNU General Public License, an intriguing form of copyright commonly known as copyleft. Under copyleft, anyone who redistributes the program, with or without changes, must pass along the freedom to further copy, change, and distribute it. Theoretically one can download Linux off the Web, add a string of useful features, and try to sell it for $50. But anyone who buys this new version can just copy it, give it away, or sell it for a dollar, thus destroying the incentive for piracy. An equivalent would be if I were to write the following at the end of this article: "Verbatim copying and redistribution of this entire article is permitted in any medium provided this notice remains at the end."

Use the Source

From the Oxford English Dictionary to hip-hop music, open-source development has always been with us to some degree. Since the mid-19th century, contributors to the OED have defined words and sent them to a centralized location to be gathered, criticized, sorted, and published. With music, as long as you stay within certain copyright laws, you can take chunks of other people's compositions and work them into your own. Most academic research is also built on open-source cooperation. Even the compact fluorescent light bulb above my head comes from data shared by researchers over centuries about electricity, properties of glass, and centralized power.

But still, most business isn't done with open source. Coca-Cola keeps its formula secret, Microsoft won't tell you how it builds its programs, and when a researcher for Ford suddenly stumbles upon the means to a more efficient fuel pump, she doesn't reflexively email her friend at Honda with a precise description of her discovery. A great deal of scientific and medical research is also done through closed source as individual laboratories race each other to determine who'll be the first to find the answer and earn the patent.

But two extraordinary changes are making open source substantially more plausible as a development and research model for 2000 than it was for 1990--and they'll make it even more so for 2010. First, the Internet. Today, I can open my Web browser and communicate instantly with Burmese refugees or writers working on projects similar to mine. Secondly, computer power has been increasing exponentially for generations and will probably continue to do so--in large part because every time you build a faster computer it allows you to build a faster one still. It's difficult to overestimate the change. The standard laptop to which I'm now dictating this article (with technology unavailable just two years ago) has more power than almost any computer owned by the government a decade ago. In four years, it could well be obsolete. As author Ray Kurzweil and others have pointed out, if cars had improved as much over the past 50 years as computers, they'd cost less than a nickel and go faster than the speed of light.

Intellectual and Physical Properties

This rate of progress is critical because the advantages of open-source development depend on the powers of technology and the balance between what can be done through thinking and what has to be done by building. Every product has certain intellectual components and certain physical components built into it. With a car, for example, the intellectual component is the thought about how to build it, how to set up the assembly lines, and how to process data you have on different kinds of tires. The physical components include the actual rubber in the tires, the machines that ran the tests, the operation and maintenance of factories.

Faster computers and increased connectivity are drastically changing the relationship between these components in two ways. First, some things that used to require physical components no longer do. You may not have to buy rubber to test the tires if you can just run a simulator online. (The 777 project at Boeing was designed so that nothing physical was produced before the plans hit the factory floor.) Second, connectivity makes the flow of information much faster and smoother and greatly facilitates the sharing of ideas and data. There is a saying known as "Linus' law" that "given enough eyes, all bugs are shallow." In other words, given enough people working on them, all problems are solvable. And the Internet has not only helped coordinate a lot more eyes: in some ways, it's given everyone glasses.

Open-source development benefits from this transition because its advantages are almost all in the realm of intellectual property. Open source improves communication and facilitates sharing ideas. But it doesn't mean that you can buy a ton of concrete for free or drive a nail into a wall without a hammer. This is why open source has come first and most prominently to computer programming: a profession where almost all of the development is intellectual, not physical, and where people have been connected over the Internet for more than 20 years. Programming also employs highly specific, common tools--computers--that a fairly large number of people have access to and that some people, such as university students, can even access for free. If a solution or improvement is found, nothing additional needs to be built; code just needs to be entered and then downloaded.

But there is still one great problem standing firmly in the way of even the most modern open-source software development project. As a 21-year-old Bill Gates asked in an angry letter to open-source programmers in 1976: "One thing you do is prevent good software from being written. Who can afford to do professional work for nothing?"

Microsoft's empire, and a great deal of the rest of our society, is built upon the assumption that there isn't an answer to this rhetorical question. To survive, organizations need to patent their information, protect their secrets, and earn as much money as they can. But the success of Linux shows that there's a way around that model. Something extraordinary has been built with a completely different set of rules. I asked David Soergel, a researcher in the department of genetics at Stanford, whether people could use the open-source model to develop medicines. "My first reaction: they'd rather get paid, and if they were any good at it, then they would be paid by somebody. My second reaction: wait a minute, that argument obviously fails for software."

Money for Nothing

Naive as it may be to think that people aren't motivated by money, it is just as naive to think that people are only motivated by money. People are motivated by a variety of factors: money, recognition, enjoyment, a belief that one is doing something good for the world, and so on. We each weigh these factors and make decisions based on our perceptions of their relative importance. At different points in our lives, we give different factors different weights. When we're poor, we tend to value simply high-paying work more than we do when we're well-off; it usually takes a high-paying job to get someone to do something really boring and it generally takes a very fulfilling job to get someone to work for less than what he should normally be able to earn.

Since people working on open-source projects generally earn nothing, or very little, there need to be other incentives. In Linux, there seem to be principally three. First, enjoyment. Computer programming can be addictive, exciting, and extraordinarily intense. Linux can be particularly enjoyable because almost every problem solved is a new one. If your Windows machine crashes, fixing the problem generally entails tediously working through scores of repair procedures which you may have used a thousand times. If A fails, try B. If B fails, try C. If a Linux computer crashes, anyone who repairs it is not only working on that one specific machine, he's finding a solution for the whole Linux community.

Eric Roberts, a computer science professor at Stanford, once explained to The New York Times that people in the profession must be "well trained to do work that is mind-numbingly boring." This is particularly true of work on most closed-source systems where programmers must continually reinvent the wheel. But with open-source projects, the things that need to be done haven't been done before. According to one only slightly hyperbolic programmer, Ali Abdin, writing to a Linux news group about how he felt after creating his first open-source project: "The feeling I got inside when I knew that I had some code out there that I can share with people is indescribable... I felt on top of the world, that I can program anything...I felt as mother would feel giving birth to a child, giving it life, for the first time."

Secondly, and similarly, Linux programmers are motivated by a feeling that they are changing the world and developing an operating system that really works. Torvalds laid out this philosophy well in a speech this summer: "I don't resent Microsoft for making lots of money. I resent them for making bad software."

Thirdly, but most significantly, Linux programmers seem motivated by prestige and, in particular, respect from their peers. Having "hacked the kernel" (contributed to the core of the operating system) gives programmers a certain stature--much as completing a four-minute mile does among runners--and, since the program is open source, everyone knows exactly who contributed what. I was once introduced to a programmer described as "the guy who wrote all the Ethernet device drivers!" as though I was meeting Jonas Salk "who came up with the cure for polio!" And, in fact, Linux programmers often discuss their work as falling in the tradition of eminent scientists. As three well-known programmers put it in the introduction to their book Open Sources: "It would be shortsighted of those in the computer industry to believe that monetary reward is the primary concern of open source's best programmers... These people are involved in a reputation game and history has shown that scientific success outlives financial success... When the history of this time is written a hundred years from now, people will perhaps remember the name of Bill Gates, but few other computer industrialists. They are much more likely to remember names like... Linus Torvalds."

Importantly, this philosophy may well be helping Linux develop creatively. There is a great deal of psychological research that shows that people actually do more creative work when they aren't motivated primarily by money. Tell a child that you'll pay her for reading a book and she'll read it with little imagination. Have one group of college poets think about getting rich and famous through their writing, according to research done by Harvard Professor Teresa Amabile, and they tend to turn out less creative work than a second group that's just asked to write poems. Is it possible that Linux programmers created such an extraordinary operating system in part because they were driven by other factors and weren't doing it for the money? I asked Professor Amabile if the implications of her research cross over to open-source programming and whether it could explain some of the remarkable innovations that have come from people working without pay. "Yes," she responded, "this [would be] entirely consistent."

Making free software affordable

Still, Linux programmers are not completely locking themselves out of the economy and there's a second response to Gates' rhetorical question: If your core open-source project is successful enough, it's possible to eventually make money off of it indirectly. No, it's not possible to make as much money as a proprietary company can--open source and copyleft will ensure this--and there's always going to be an astounding amount of work that has to be done without financial reward. But open-source programmers don't have to starve.

The trick to making money off Linux, or any open-source development, is to profit off of derivatives. No one actually makes money off the Linux code, but they can make money by offering technical support or helping customers install the program. Companies that do this follow a well-trodden path: give something away in order to sell something else. This is what cellular phone companies do when they give you the actual telephone handset for free if you agree to pay for using it a certain number of minutes a month. We do the same thing with the Monthly's Web page. We post some articles (giving them to readers for free) in the hope that visitors' interest will be piqued and that they'll subscribe.

Red Hat, the best-known Linux company, sells the operating system and other open-source programs in a box for $80, though you can download their product for free from Their revenue comes from the technical support they offer and from consumers' need for trust. It's much less unsettling to buy a program like Linux if you get it shrink-wrapped with a manual than if you have to download both. VA Linux, another well-known company, sells Linux hardware: You choose the memory and the motherboard; it builds you a computer.

Has the money these companies brought into the open-source movement corrupted it and made it more like the traditional model that Microsoft uses? Surely, there are a lot of people doing promotional and administrative work at Red Hat for the money and there are probably even some people working on Linux for Red Hat because they get paid a lot to do it (the company hires programmers to write code that anyone, including Red Hat's competitors, can use). But programmers mostly see the money as an added and surprising plus, and Linux is still driven by adrenaline, altruism, and peer recognition.

While it is possible that this could change, it hasn't so far. I asked Richard Stallman--the creator of copyleft, as well as many of the programs that run with the Linux through a system called GNU, and the man often considered to be the father of the open-source movement--whether he thought that money would change the attitudes of people who used to work on GNU/Linux without being paid. "In general, I don't think that wealth will make a hacker into a worse person. It would be more likely to enable the hacker to spend more time volunteering for free software instead of on work for pay."

This point is particularly germane because most open-source programmers have always had to work other jobs, and many have only been able to contribute to the project during the evenings or when their employers weren't looking. Linus Torvalds, for example, helps design microprocessors for a company called Transmeta in the daytime and does all his Linux coding after work. When I asked John Hall, vice president of VA Linux, what motivates programmers, he responded: "For some, it's altruism. For some, it's fame. For some, it's religion. For a very few, it's money."

So what's next?

To determine where open source is likely to move next, one has to imagine a scenario where these obstacles can be overcome. A project would need to be fun, or at least rewarding, to get going and it would have to primarily pose intellectual, not physical, challenges. Once it began to move, derivative financial incentives could help push it along. There also has to be a certain amount of luck, particularly with regard to organization. It's hard enough to get six people to agree to a restaurant for dinner; it's much harder to coordinate thousands of people known to each other only by email addresses. Linux has gotten around this latter problem in no small part because of Torvalds himself. He is a benevolent dictator who has earned the trust and respect of virtually everyone on the project. He's a relaxed, friendly, funny guy whose strength of character keeps his distended organization free from the internal battles that sink so many others. He also has learned how to delegate and has developed a core of equally well-respected close associates who coordinate different parts of the project.

One intriguing possibility for future open-source development comes from medicine, an area where people can become passionate and where intellectual components can far exceed physical components. Consider a smart doctor who loses a friend to rare disease X and decides to devote her life to finding a cure. Ten years ago, trying to develop anything more than a local network to collaboratively design a drug to cure the disease would have been extremely difficult. Communication would have had to be done over the phone or with photocopies slowly and expensively mailed. It made much more sense to have small groups of developers working together in laboratories, foundations, or universities.

Today the possibilities for open collaboration have improved. An ambitious doctor can network online with other researchers interested in disease X and, at the least, can quickly exchange data about the newest research techniques. In fact, there are already medical networks that pass around information about acute medical cases, using email and computers that can automatically send out patient files over a network and put X-rays into the overnight mail.

Now think another decade ahead when everyone will have high-speed Internet lines at least 500 times as fast as standard connections today (this will probably happen in three years), where it is fairly likely that we all will be able to simulate the movement of disease X online, and where it will surely be possible for medical students to run tests that approximate the human immune system on high-powered laboratory computers. Now the same doctor can farm out parts of the project to interested collaborators, many of whom have also lost friends to X and are passionate about finding a cure. If the coordinator is a good organizer and can hold people together the way that Torvalds has, the organization could grow, attracting even more people to join the collaborative effort with each success. Every breakthrough or improvement in the model could be posted online so that other participants could begin work on the next challenge. If a sample test is performed, data could be transferred to the Web simultaneously.

Eventually a prototype could be developed and adopted by an established drug company (or perhaps even a non-profit company, funded by foundations, that specializes in distributing open-source drugs and selling them at minimal costs) that licenses the product with the FDA, runs it through the necessary tests, and then manufactures, distributes and sells it--keeping prices relatively low both because no company would have exclusive copyrights and because research costs (drug companies' largest expense) would be drastically reduced.


A real-life example of another possible opportunity for open source comes from Harvard where law Professors Larry Lessig and Charles Nesson have started the Open Law Project, an attempt to try cases using the open-source model. Interested people sign into the Website, read what other contributors have written, and help to develop arguments and briefs. According to the site description, "what we lose in secrecy, we expect to regain in depth of sources and breadth of argument." The program is run under the same sort of benevolent dictatorship model as Linux, with Lessig serving as chief. People brainstorm, debate, and then Lessig synthesizes and writes the briefs. Currently, the group is constructing arguments challenging, appropriately enough, the United States Copyright Extension Act.

There are great advantages to this model for law: The problems faced by lawyers are mostly intellectual, not physical; there is an abundance of people (especially law students) who are potentially willing to work on the projects for free; and there is the powerful draw of doing something you believe is a public service. If you don't agree with current copyright laws, join the group and figure out how to change them.

Of course, open-law will never be able to handle certain kinds of cases. As Nesson said to me, "open-law is not conducive to ambush." If you need to rely on secret arguments or evidence that the other side doesn't know about, you're not going to post everything on the Net. But if you just need to develop the best argument and rely on increased information flows, the model could work quite well.


It is very difficult to determine exactly where open-source projects will take off next. So much depends on the personalities of the coordinators and the excitement they are able to generate; a great deal also depends on the different ways that technology develops and the way that different markets and research fields change. For some organizations, open-source development will make more sense in a couple of years than it does now. For others, it will make less. But the overall trends of technology are likely to push open source closer and closer to the mainstream.

Imagine a scale with all the advantages of a proprietary model on the left and all the advantages of an open-source model on the right. Pretend everybody who wants to solve a problem or build a project has a scale like this. If it tips to the left, the proprietary model is chosen; if it tips to the right, the open model is chosen. Now, as connectivity increases with the Internet, and computer power increases exponentially, more and more weight accumulates on the right. Every time computer power increases, another household gets wired, or a new simulator is built online, a little more weight is added to the right. Having the example of Linux to learn from adds some more weight to the right; the next successful open-source project will add even more.

Not enough is added to the right side to tip the scale for everybody and everything, but open source is presently growing and it should only continue that way. Netscape has made its Web browser open source. Sendmail, the program that routes most of our email, is open source. Most Web sites use an open-source program called Apache at their core. Even some microchip developers are starting to use open source.

Perhaps the next boom in open source will come from the law; perhaps from drug X; perhaps it will be something entirely different. Although it's difficult to tell, it is quite likely that the scale is going to tip for some projects and that there will be serious efforts at open-source development in the next decade. Moreover, it's quite likely some of these projects will work. Open source has created the fastest growing operating system in the world and it's done so by capitalizing on changes in technology that will almost certainly seem commonplace in a decade or two. Linux will continue to grow; but 10 years from now, it will probably no longer be the largest open-source project in the world.