More Resources

A Linux-based tool for hardware bring up, Linux development, and manufacturing.


by Venton, T.^Miller, M.^Kalla, R.^Blanchard, A.
IBM Systems Journal • June, 2005 •

Bare Metal Linux (BML), a tool that we implemented to accelerate the bring up of POWER5 * (1)-based systems, is described in this paper. The POWER5 processor, released in 2004, is the latest version of the POWER architecture from IBM (POWER is a RISC [reduced instruction set computer] architecture). The POWER5 design implements two-way simultaneous multithreading (SMT) on each of the two processor cotes on the chip. SMT combines multithreading, which consists of multiple threads utilizing the same processor in one-at-a-time fashion, with the simultaneous use of the multiple execution units present in a modern processor. In the two-thread SMT architecture of POWER5, the execution units net needed by the first thread are available to the second thread in the same clock cycle.

Non-Uniform Memory Access (NUMA) refers to a computer memory architecture where the memory access time depends on the memory location. Specifically, access to local memory is faster than nonlocal memory. For increased efficiency the operating system must incorporate in its algorithms knowledge about NUMA, such as the ratio of access times to local and remote memories. Although POWER5 systems, which contain multiple memory controllers distributed throughout the machine, are net NUMA in the classical sense (remote memory latency is very close to local memory latency), they still benefit from NUMA-aware scheduling.

When a new system is designed, it is necessary to put the hardware through a series of tests to verify that it functions as expected. Booting a general-purpose operating system is a complex exercise requiring hardware errors to be addressed, initializations to be set up correctly, and firmware to be functional before operating-system testing can commence. This bring-up process is usually done in stages, incrementally increasing the scope and coverage of the hardware tested.

Typically the bring up of a processor chip begins at wafer test, when test patterns are run on the wafer to detect any circuits that are not working correctly. After good test sites (on the wafer} have been identified, the chips are diced and mounted on substrates to form modules. The bring up then continues on these modules by mounting them in test fixtures, which provide the system environment. At this point the chips execute functional code sequences intended to verify proper instruction execution. These low-level tests consist of the following steps: (1) generate a stream of instructions, initial conditions, and expected results, (2) load and run the generated stream and save the results, and (3) compare these results to the expected results.

After the low-level tests have verified basic processor functions, more complex exercisers are then used to verify functions in the processor and memory subsystems. After this stage is completed, the verification process continues at the operating-system level. Support is provided to execute larger, more complex programs that require a file system for storing code, data, and supporting tools. At this point support for I/O devices is needed. Whereas it is fairly straightforward to develop and employ low-level exercisers for processor core and memory, when I/O is required, then the flexibility of a general-purpose operating system is typically needed.

The POWER5 system predecessor, using POWER4 * processors, (2) supported two methods of booting an operating system. In the first method the operating system is booted directly on the hardware by firmware. In the second method the firmware loads a hypervisor and, at the same time, the system resources are allocated to a number of hypervisor-controlled partitions. Each partition behaves as a separate virtual computer, on which an operating system may be loaded.

The POWER5 hypervisor provides additional virtualization capabilities compared to those for POWER4 systems, and in particular a high degree of resiliency to runtime errors. Supporting such advanced functions necessarily involves complexity. Although the architecture of the hypervisor has been designed to support additional virtual resources, these advanced functions were integrated throughout the hypervisor and the supporting firmware. As a result, POWER5 firmware no longer supports booting the operating system directly on the hardware.

This presented a problem during the bring-up phase of system development, when the hardware and the software were brought together. At this stage, the I/O had very limited testing. Without a general-purpose operating system running, the POWER5 bring-up team could hot run operating system-based exercisers on the new hardware. Yet, the hypervisor had to be functional before an operating system could be booted. Complex error recovery during early bring up was not desirable because it had the potential to hide errors from the debug engineers. For these reasons relying on the hypervisor for the bring up was ruled out.

Our solution was to create Bare Metal Linux (BML) by modifying the Linux ** kernel to run directly on the hardware, leaving out both the hypervisor and the firmware layer. By not including any error recovery and by supporting only simple configurations, we kept the BML code simple. For example, the code that configured the I/O subsystem was only a few pages long. Handling complex configurations was deferred until after the hypervisor became operational on the system. By eliminating many code layers, along with the associated initialization delays, we achieved rapid boot times for BML, an important feature of the tool. Figure 1 illustrates the POWER5 bring-up process using BML.

[FIGURE 1 OMITTED]

Related work

There are many firmware solutions targeted at CPU and system bring up. IBM has the PIBS (3) (PowerPC * Initialization and Boot Software) firmware stack, and there are other products that offer similar functionality. (4) By presenting a common layer between the hardware and the software, they isolate many of the platform details from the operating system. A similar solution is provided by the Linux based LinuxBIOS, (5) which serves as a firmware stack that initializes the hardware and boots a second-stage operating system.

In addition, many embedded boards are brought up without firmware, and instead a minimal boot loader is used to load a Linux kernel. (6) In this case the boot-loader does some low-level initialization and loads the kernel, which is then responsible for initializing the test of the system.

Other methods for accelerating system bring up are based on simulation and include the technique known as virtual power on. (7) This technique, used extensively by the IBM zSeries * development team, employs simulation to debug firmware and resolve operating-system bring-up issues before hardware is available.

Firmware solutions and LinuxBIOS still require a kernel to be loaded and control to be transferred to this kernel. This means there are two code bases to work with as well as interface constraints between them. BML on the other hand was developed as a single code base, which made it easy to develop and debug incrementally. The virtual power-on concept is complementary and focused on software.

The test of the paper is organized as follows, in the next section, "BML design," we describe the key changes we made to the Linux kernel. In the following section, "Experience," we describe how BML performed as a tool for POWER5 bring up. We also describe several additional applications that BML was round to be helpful with, such as chip manufacturing and Linux development. The section "Conclusion" contains some final comments.

BML DESIGN

A number of changes were needed to run the Linux kernel without a firmware stack. We describe some of the key changes we made in order to adapt the Linux kernel to our environment in the following subsections.

Hardware discovery

IBM POWER5 systems use Open Firmware (8) to load the operating system, which then would normally interact with this firmware during early boot and initialization. Among other things, Open Firmware provides a method of device and system discovery in which various system parameters can be determined, such as the amount of system memory, the number of CPUs, and the location of PCI (Peripheral Component Interconnect) host bridges. PCI is laid out in a tree structure with the host bridge at the foot. The host bridge is the point at which the CPU interfaces with the PCI subsystem.

In the BML environment there is no Open Firmware code to provide this information. This, however, presents less of a problem than it seems at first. Because Linux is highly portable, architecture-specific system interfaces such as Open Firmware and ACPI (9) (an open industry specification for configuration and power management) cannot be used throughout the operating system. Instead, the architecture-specific system interfaces are isolated in small, easy-to-modify sections of code. A simple interface was provided to accept the critical system information that is normally gathered by means of Open Firmware. This involved loading a number of parameter values into general-purpose registers before starting the kernel initialization thread. The parameters were as follows:

* Number of CPUs--POWER5 systems can have up to 128 threads. A 128-bit mask of the available CPU threads was passed in by means of a pair of registers.

* Memory size--The memory size in gigabytes was passed in by means of a register. In the interest of simplicity, memory was required to be contiguous starting at address zero.


1  2  3  4  5  
COPYRIGHT 2005 All Rights Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: