Tuesday, November 28, 2006

Open Source Software and Documents: A Literature and Online Resource Review

Introduction

Open source software (OSS) and open source documents (OSD) are a rising star in technology today. The term "open source" was coined merely two years ago (1998)1, and is now a media buzzword (Raymond, 205). With its rapidly growing market share and corporate and public interest to match, open source as a concept will not stay a fringe phenomenon for long; in fact, it is rapidly entering the mainstream. This literature and online resource review is a starting point for anyone interested in the subject.

The Open Source Revolution

Since open source is relatively novel (as far as the mainstream, non-hacker2 culture is concerned) and largely exists online, there are only two printed works on the subject-and most of the material in these two books is also freely available online. One of the things that make the open source movement so unusual is that as it has developed over the last twenty years or so, it has done little self-documenting. This is one reason that Eric S. Raymond , self-appointed chief advocate for open source, wrote his now-famous essay "The Cathedral and the Bazaar."

ESR (as the hacker community refers to Mr. Raymond) wrote "Cathedral and Bazaar" largely to ameliorate this condition of non-documentation. Fascinated by the rapid development and growing sophistication of the Linux3 operating system, ESR began studying the open source development model (Raymond, 198). Why was Linux so mature, when the Free Software Foundation had been trying to develop a similar operating system for years without success? He found that while the corporate, mainstream, closed-source method (the "cathedral" model) of coding large programs like operating systems is bound by Brooke's Law, the open source development process (the "bazaar" model) actually reverses it. Brooke's Law states that programming work performed increases with direct proportion to the number of programmers (N), but the complexity of a project increases by the square of the number of programmers (N2). Therefore, it should follow that thousands of programmers working on a single project should become mired in a nightmare of human communication and version control. As "Cathedral and Bazaar" explains, the open source model (the "bazaar") overcomes this problem through customary central version control, mutual respect, and an army of developers and bug testers. This is summed up in a famous statement by ESR known as "Linus' Law" (so named for Linus Torvalds, original author and maintainer of the Linux kernel4): "Many eyes make all bugs shallow." "The Cathedral and the Bazaar," first given in 1997 at the Linux Kongress in Bavaria, led directly to the release of the Netscape browser source (see http://www.mozilla.org) and the current open source boom (Raymond, 200).

History

So how did all this come about? The open source concept is as old as the history of computing, and is closer to the original academic development of computing systems than the corporate model of today. These early days are illustrated in two excellent essays, "A Brief History of Hackerdom" by Eric S. Raymond, and "The GNU Operating System and the Free Software Movement" by Richard M. Stallman. Both of these essays trace the simultaneous beginnings of modern computing, the Internet, and open source software development. More historical information (along with the origins of many arcane computer terms) can be found at the Jargon File (at http://www.tuxedo.org/~esr/jargon/) or in its published counterpart, The New Hacker's Dictionary (which was, incidentally, one of the first books to be commercially published and simultaneously available online for free).

The first organized effort to produced open source software was the Free Software Foundation (FSF), founded by Richard M. Stallman (known as RMS) in 1985 (Stallman, 60). RMS formed the nonprofit foundation for two reasons: to further develop GNU5 software, and to create a thinktank to further the notion of "Copyleft." Copyleft is a pun-the idea being to turn copyright around upon itself. The FSF developed this concept into the GNU Public License (GPL), a software distribution license that stipulates (in a nutshell):

* Software released under the GPL shall be freely distributable
* The software shall be distributed along with its source code
* Anyone is free to modify the source code and change the program, as long as the resulting program is also freely distributable and modifiable

This ensures that all of the GNU software (and any other software released under the GPL) is protected from those who would use the code to create proprietary, closed-source programs. Around half of the open source software available today is made available under the terms of the GPL. Today there exist several similar licenses of varying restrictions and attitudes toward commercial use and sale of covered software (see http://www.opensource.org/licenses/).

Open Source Documents

The first documents that truly followed the open source model (in the sense of having many contributors and reviewers coupled with online availability) were Frequently Asked Questions lists, known as FAQs. The first online FAQ to go by that title is attributed to Eugene Miya, a NASA employee (Hersch, 1). His SPACE-digest mailing list FAQ was written in 1982, when the Internet was a little-known experimental network known as the Advanced Research Projects Agency Network (ARPANET) (see http://www.faqs.org/faqs/faqs/about-faqs/ ). Unfortunately, little is known about the history of these now-ubiquitous informational documents. An attempt was begun in 1996 to write a book about FAQs, but the web page for this project has not been updated since 1997 (see http://www.faqs.org/faqbook/ ).

Unfortunately, documentation is one of the weakest aspects of open source program (Stallman, 68). This is, perhaps, a result of the fact that hackers enjoy coding so very much; updating the documentation is sometimes an afterthought. Conversely, the idea that programmers make poor writers is an unfortunate stereotype. Eric Raymond insists that the very best hackers are also excellent writers, since good programming involves both logical analysis of a problem and a high level of creativity (Raymond, 246). This is evident in the fact that ESR (author of the popular Fetchmail program and numerous modules for the Emacs text editor), Richard Stallman (author of GNU Emacs, the GNU C compiler, and other keystone programs), Larry Wall (creator of the Perl programming language), and other open source luminaries have written numerous (and excellent) essays, manuals, and technical books.

This is changing, however. Since open source software, particularly the Linux operating system, needs good documentation to expand to new users, much work has been done to improve this situation. Open source programs are usually documented in three forms:

* README files that are distributed with each individual program
* Manual pages ("man pages", so named after the man command used to access them), technical references which are also distributed with each program (see ftp://ftp.win.tue.nl/pub/linux/docs/manpages/)
* HOWTO documents, which are instructional in nature, and usually task- (as opposed to program-) oriented (see http://www.linux.org/help/ldp/howto/howto.html). There is also a smaller, less step-by-step subset of the HOWTO documents known as Mini-HOWTOs (see http://www.linux.org/help/ldp/mini/minihowto.html )6

The maintenance of these documents is made difficult by the very nature of the open source development model. Since there are so many developers, running under a directive of "release early, release often," open source software can change at a rapid pace. To facilitate better documentation and document management, the Linux Documentation Project was founded by Matt Welsh in 1992 ( http://www.linuxdoc.org/ ). There is a recent interview of Deb Richardson, the current head of the LDP, at Slashdot (http://slashdot.org/article.pl?sid=00/03/27/0717244&mode=thread ). Another similar resource is the Open Source Writer's Group ( http://www.oswg.org:8080/oswg ), which serves as a database for open source volunteers willing to do documentation and other open source writing projects, based on skill and interest.

As an offshoot of the concept of freely available and modifiable documentation, OpenContent was created by David Wiley to create a license similar to the GPL that would apply to any information that is not a program (http://www.opencontent.org/). The idea is that if computer programs can be debugged (edited) and improved by making them modifiable by anyone with the desire to help, documents and other content should benefit from a similar process. A similar license, the GNU Free Documentation License (GNU FDL) was authored by Richard Stallman and released in March, 2000. The idea of "freely modifiable and distributable" music, stories, instructions, and other documents and media is still new, and the nature of any applicable distribution license is (as of March 2000) widely debated (see http://www.linuxmall.com/news/features/000324fdl , and http://opencontent.org/announce.shtml ).

Online Forums and Other Resources
Open source is a community as well as a method of software and document development. There are several "watering holes" that open source advocates, developers, writers, and the curious frequent. The most famous of these forums is Slashdot (so named for the dot-and-slash (/.) notation used to denote the directory structure used in UNIX systems), an open source news and discussion forum ( http://slashdot.org/ ). Journalists who are unfamiliar with the open source community usually go to Slashdot to get their first taste of the quirky, often irreverent world of open source adherents. The site consists of articles, which are usually submitted by Slashdot readers, and discussion of those articles. With the slogan of "News for nerds, stuff that matters," Slashdot topics range from open source issues to science fiction, the role of "geeks" in society, science and technology, and the occasional essay. Other forums and news sites are:

* Segfault7 ( http://www.segfault.org/), a sort of anti-Slashdot that posts parodies and humorous stories
* Freshmeat ( http://www.freshmeat.net/), which announces new releases of open source software and other relevant news such as security issues
* Kernel Notes ( http://kernelnotes.org/ ), which publishes announcements regarding the Linux Kernel4 (which is sometimes updated with a new release several times per day!)
* Linux Forum ( http://www.linuxforum.com/ ), a currently-defunct site for general Linux discussion

There are also other websites, mailing lists, and Usenet8 newsgroups too numerous to mention; a Web search for the term "Linux" at www.google.com yields 1,560,000 results. Examining www.linux.org or the comp.os.linux hierarchy of newsgroups should point the curious in the right direction.

In addition, there are specialized development forums dedicated to open source. Since any open source business model depends on the abundance of quality software, several companies host free development sites, which offer a combination of development tools, shell accounts, FTP (File Transfer Protocol) and Web hosting, version control software like CVS (Concurrent Versions System), and "matchmaking" (introducing developers and users who are looking for each other), all for free. Some of the most significant are:

* Sourceforge ( http://www.sourceforge.net ), which offers news, Web and FTP site hosting, shell accounts, CVS, and discussion forums for open source projects
* The Free Software Bazaar ( http://visar.csustan.edu/bazaar/ ), which serves as a link between open source developers and open source users who need development work done
* SourceXchange, ( http://www.sourcexchange.com/ ) where developers and users can barter code, documents and ideas

Closing

These are just the tip of the iceberg, though I have consciously tried to denote those resources which will provide the most valuable information and point the reader toward other resources of more specialized interest. There are thousands upon thousands of Linux-related pages on the World Wide Web. There are also Usenet newsgroups, mailing lists, magazines (see http://www.linuxworld.com/ ), and Linux User Groups (LUGs), in addition to the many different Linux distributions (see http://www.linux.org/dist/index.html ).

Open Source is a phenomenon that is growing in momentum, membership, and market share. It has already touched the lives of everyone who uses the Internet (since many of the services and programs that make the Internet go are either open source, or based on an open source program). It will continue to do so, and anyone required to keep up with technology in the world today needs at least some familiarity with its precepts and concepts. I hope this review will assist those that would like to learn more.

Notes

1. The term "open source" was coined by Eric Raymond and ratified in a meeting between himself, Richard M. Stallman, and other notable open source advocates. It is intended to replace the previous term, "free software," used by Richard Stallman. Despit e the constant admonishment that the "free" in "free software" meant "Free as in speech, not as in beer," corporate-minded people were leery of the idea of software that could not be sold (Raymond, 212). [Back]

2. "Hacker" is how programmers describe someone who enjoys solving problems in ingenious ways. It is a term of praise that must be earned in the hacker community. Unfortunately, people who exploit software bugs to crash, interfere with, or gain unauthorized access to other people's computer systems also sometimes refer to themselves as "hackers." The popular media has seized upon this misnomer and popularized it, to the confusion of the general public. People in the open source community refer to such miscreants and criminals as "crackers." [Back]

3. Linux (named after Linus Torvalds, its creator) is the most popular of the open source operating systems. Linux is a "workalike" clone of the UNIX operating system, based on the Linux kernel (see note 4, below), the suite of GNU (see note 5, below) t ools and applications, and other software packages depending on which Linux you are using. There are many flavors of Linux (called distributions), a few of which are RedHat ( http://www.redhat.com/ ), Slackware ( http://www.slackware.com/ ), and Debian ( http://www.debian.org ). [Back]

4. The kernel is the heart of any operating system. The kernel performs low-level tasks such as memory allocation, process management, and communication with hardware. It serves as the negotiator between programs and the hardware of a computer system. Kernels are some of the most difficult and complex of all types of computer programs. [Back]

5. GNU is a recursive acronym that stands for "GNU's Not UNIX." (which stands for "GNU's Not Unix Not Unix," which stands for. . .) It is the "brand" for software developed by the Free Software Foundation. Another well-known recursive acronym name is the PINE mail reader ("PINE Is Not Elm" (Elm is another mail reader upon which Pine was based)). A book could easily be written on the quirky names for UNIX commands and acronyms. For example, the biff command (used to check for new email) is not a recursive acronym; it was named after a dog. [Back]

6. There are HOWTO documents on all sorts of Linux concepts. Including how to get Linux to make coffee (see http://www.linux.org/help/ldp/mini/Coffee.html )! [Back]

7. Segfault is named, appropriately, after an error known as a "Segmentation Fault." This is an error that occurs when a program tries to access an incorrect segment of memory (see http://www.tuxedo.org/~esr/jargon/html/entry/segmentation-fault.html ). The program will then perform a core dump, which, as far as most users (and many programmers) are concerned, means that a meaningless gobbledygook of numbers and letters is dumped onto the screen or into a file called "core." [Back]

8. Usenet is the collective term for the collection of Internet-wide newsgroups. Usenet was originally a handful of forums intended for ARPANET developers to use as a common bulletin board. It now contains many thousands of newsgroups, arranged in hierarchical dotted notation (e.g., under rec.pets, one may find rec.pets.dogs, rec.pets.cats, and rec.pets.cats.siamese). [Back]