Monday, December 4, 2006

DAISY: an open-source JIT compiler for large machines

DAISY features

The Dynamically Architected Instruction Set from Yorktown (DAISY) translates the binary code of a source architecture to the binary code of a VLIW or EPIC machine by emulating the existing architecture (such as PowerPC) on an underlying processor, which is hidden from the user. As a just-in-time compiler, DAISY looks at the code at run-time and picks out which operations to run in parallel with the target architecture. Because this process is transparent to the user, the user sees only the source architecture instruction set. Fragments of the source architecture code are translated, scheduled, and optimized just-in-time for the target architecture to operate.

The use of dynamic binary translation under DAISY enables arbitrary reordering of code, while maintaining memory and precise exception semantics for the source architecture, such as PowerPC. The translator also optimizes code using scheduling, combining, copy propagation, load-store telescoping, loop unrolling, and other optimizations to expose a large amount of parallelism to the underlying VLIW processor. Some versions of DAISY can fully emulate a 32-bit PowerPC, including system level states like privileged registers, exceptions, cache management, address translation, and page tables. But until these capabilities have a more user-friendly interface, the open source version of DAISY will support only user-mode translation.

DAISY's first release uses the PowerPC architecture (which runs under AIX) as the source architecture. It compiles AIX PowerPC binary applications just-in-time to the VLIW code, and then simulates the VLIW code. The unit of translation in this version is one 4K byte page of code at a time. This release of DAISY also includes some distinctive hardware and software techniques such as:

* Speculative instruction scheduling on multiple paths and through branch and loop iteration boundaries
* Simultaneous scheduling, register allocation and cluster assignment
* Aggressive re-ordering of memory and ALU operations (while preserving precise PowerPC exceptions)
* Optimizations for increasing instruction level parallelism (e.g. combining, load-store telescoping)
* Fast-compiled simulation of pieces of VLIW code that provide detailed statistics, including multi-level cache and TLB effects

Why open source it?

Kemal Ebcioglu and his team are aiming at expanding the project's scope and capabilities by releasing DAISY under the IBM public license. "We hope to foster excitement in binary translation research. We anticipate that people will be drawn to DAISY because of its unique strengths as a dynamic just-in-time compiler for large machines. It offers a great experimental framework for trying out different architectural features and software techniques for instruction level parallelism and binary translation." The team will be encouraging widespread contributions to the project and has extended an open invitation to join in their efforts. "We invite volunteers worldwide to consider joining the DAISY open source team in implementing leading edge dynamic compiler techniques and architectural features through the development and expansion of DAISY capabilities."

Development and the team

In addition to Kemal Ebcioglu, DAISY's core team members are Erik Altman, Michael Gschwind, and Sumedh Sathaye at IBM's Watson Research Center. Development of the project grew out of work on VLIW at Watson in 1996. DAISY began as an offshoot of the lab's primary research on programming parallelism in this context. "The problem with VLIW architecture," notes Erik Altman, "was primarily incompatibility with architectures like x86 or PowerPC. DAISY introduces a new architecture that is compatible with existing programs. It emulates a PowerPC program on an underlying architecture."

Kemal Ebcioglu heads the DAISY project. He has been conducting research on compilers and architectures for instruction-level parallelism topics (in particular VLIW) at the IBM Watson Research Center, since 1986. He is the current ACM SIGMICRO chair, the steering committee chair for the Parallel Architectures and Compilation Techniques conference, and the vice president for North America for IFIP Working Group 10.3 (Concurrent Systems). His current research interests include Java and dynamic binary-to-binary compilation toward achieving high ILP and hardware commonality across architectures. Ebcioglu received a Ph.D. in computer science from the State University of New York at Buffalo in 1986.

Erik Altman is a research staff member at the Watson Research Center. Aside from being one of the originators of the DAISY project, his research interests include binary translation and optimization, compilers, architecture and microarchitecture. He received a PhD in Computer Science from McGill University.

Michael Gschwind is a research staff member at the Center. Before joining IBM in 1997, he was an Assistant Professor with Technische Universit?Wien, in Vienna, Austria. His research interests include compilers, computer architecture, hardware/software co-design, application-specific processors, and field-programmable gate arrays. He holds PhD and MS degrees in computer science from Technische Universit?Wien.

Sumedh Sathaye is a research staff member at the Center. His research interests include computer architecture and microarchitecture, instruction-level parallelism, and binary translation. He received a Ph.D. in computer engineering from North Carolina State University in Raleigh, NC.

The DAISY team encourages open participation in their project and is eagerly awaiting input from the open source community. They can be reached any time, day or night, at the e-mail addresses listed in Resources below. They welcome comments from all corners of the peanut gallery on DAISY, parallel programming and any related or unrelated subjects. For the time being, daisy will be the only newsgroup.

Kemal, Erik, Michael and Sumedh have published several papers on DAISY in addition to the DAISY manual. Because of copyright notice requirements (in terms of posting these papers on the Web), they are available as PostScript or PDF files on the DAISY Web page along with the DAISY manual.

Current research goals for DAISY development

Architecture as a Layer of Software

With software dynamic binary translation projects like DAISY, new architectures (VLIW/EPIC) can be introduced under the covers without any disruption to software. All users of a DAISY-style system, including system-level privileged users, observe only the source/legacy architecture (such as PowerPC). Old software runs without recompilation or any other changes. There is a three-fold advantage to having users see only the source legacy architecture:

1. There is no need to recompile programs on the new machine.
2. If you decide to change from VLIW to a wider machine, you are still executing the same PowerPC code. In other words, you can change the underlying machine instead of the code without causing disruption to the program.
3. You can take advantage of dynamic profiling.

The proprietary Transmeta product uses a similar binary translation approach for x86, but this is a far narrower machine with more limited applications.

Scalability

DAISY software is parameterized and can support machines that issue up to 16 or more operations per cycle. A low-end DAISY implementation might be able to issue only 4 operations, while a high-end DAISY implementation might be able to issue 16 (with both using the same translation and virtual machine software).

Multiple Platforms

Although the current release of DAISY supports translation only from PowerPC, dynamic binary translation could theoretically translate from multiple architectures (for example from x86, System/390, and PowerPC to a common underlying core). This aspect of DAISY will eventually provide critical design resources to focus on a single implementation.

Efficient Processor Design

DAISY uses a streamlined, new multiple-issue processor core as its target. The processor is specifically designed for efficient emulation of the source architectures by offering binary translation support and an increased number of registers. PowerPC, for example, has 32 general-purpose registers and DAISY has 64. Because DAISY is an emulation machine, it can use a much more simple and powerful design than a general machine, which requires a far more complex grouping of features.

The DAISY source code is up on the DAISY Web page. Both tar.gz and tar.Z files are available as well as a CVS repository. As DAISY becomes more robust and user-friendly, updated versions will be released that will eventually cover full emulation of 32-bit PowerPC.