A Linux-based tool for hardware bring up, Linux
development, and manufacturing.
by Venton, T.^Miller, M.^Kalla, R.^Blanchard, A.
Bare Metal Linux (BML), a tool that we implemented to accelerate
the bring up of POWER5 * (1)-based systems, is described in this paper.
The POWER5 processor, released in 2004, is the latest version of the
POWER architecture from IBM (POWER is a RISC [reduced instruction set
computer] architecture). The POWER5 design implements two-way
simultaneous multithreading (SMT) on each of the two processor cotes on
the chip. SMT combines multithreading, which consists of multiple
threads utilizing the same processor in one-at-a-time fashion, with the
simultaneous use of the multiple execution units present in a modern
processor. In the two-thread SMT architecture of POWER5, the execution
units net needed by the first thread are available to the second thread
in the same clock cycle.
Non-Uniform Memory Access (NUMA) refers to a computer memory
architecture where the memory access time depends on the memory
location. Specifically, access to local memory is faster than nonlocal
memory. For increased efficiency the operating system must incorporate
in its algorithms knowledge about NUMA, such as the ratio of access
times to local and remote memories. Although POWER5 systems, which
contain multiple memory controllers distributed throughout the machine,
are net NUMA in the classical sense (remote memory latency is very close
to local memory latency), they still benefit from NUMA-aware scheduling.
When a new system is designed, it is necessary to put the hardware
through a series of tests to verify that it functions as expected.
Booting a general-purpose operating system is a complex exercise
requiring hardware errors to be addressed, initializations to be set up
correctly, and firmware to be functional before operating-system testing
can commence. This bring-up process is usually done in stages,
incrementally increasing the scope and coverage of the hardware tested.
Typically the bring up of a processor chip begins at wafer test,
when test patterns are run on the wafer to detect any circuits that are
not working correctly. After good test sites (on the wafer} have been
identified, the chips are diced and mounted on substrates to form
modules. The bring up then continues on these modules by mounting them
in test fixtures, which provide the system environment. At this point
the chips execute functional code sequences intended to verify proper
instruction execution. These low-level tests consist of the following
steps: (1) generate a stream of instructions, initial conditions, and
expected results, (2) load and run the generated stream and save the
results, and (3) compare these results to the expected results.
After the low-level tests have verified basic processor functions,
more complex exercisers are then used to verify functions in the
processor and memory subsystems. After this stage is completed, the
verification process continues at the operating-system level. Support is
provided to execute larger, more complex programs that require a file
system for storing code, data, and supporting tools. At this point
support for I/O devices is needed. Whereas it is fairly straightforward
to develop and employ low-level exercisers for processor core and
memory, when I/O is required, then the flexibility of a general-purpose
operating system is typically needed.
The POWER5 system predecessor, using POWER4 * processors, (2)
supported two methods of booting an operating system. In the first
method the operating system is booted directly on the hardware by
firmware. In the second method the firmware loads a hypervisor and, at
the same time, the system resources are allocated to a number of
hypervisor-controlled partitions. Each partition behaves as a separate
virtual computer, on which an operating system may be loaded.
The POWER5 hypervisor provides additional virtualization
capabilities compared to those for POWER4 systems, and in particular a
high degree of resiliency to runtime errors. Supporting such advanced
functions necessarily involves complexity. Although the architecture of
the hypervisor has been designed to support additional virtual
resources, these advanced functions were integrated throughout the
hypervisor and the supporting firmware. As a result, POWER5 firmware no
longer supports booting the operating system directly on the hardware.
This presented a problem during the bring-up phase of system
development, when the hardware and the software were brought together.
At this stage, the I/O had very limited testing. Without a
general-purpose operating system running, the POWER5 bring-up team could
hot run operating system-based exercisers on the new hardware. Yet, the
hypervisor had to be functional before an operating system could be
booted. Complex error recovery during early bring up was not desirable
because it had the potential to hide errors from the debug engineers.
For these reasons relying on the hypervisor for the bring up was ruled
out.
Our solution was to create Bare Metal Linux (BML) by modifying the
Linux ** kernel to run directly on the hardware, leaving out both the
hypervisor and the firmware layer. By not including any error recovery
and by supporting only simple configurations, we kept the BML code
simple. For example, the code that configured the I/O subsystem was only
a few pages long. Handling complex configurations was deferred until
after the hypervisor became operational on the system. By eliminating
many code layers, along with the associated initialization delays, we
achieved rapid boot times for BML, an important feature of the tool.
Figure 1 illustrates the POWER5 bring-up process using BML.
[FIGURE 1 OMITTED]
Related work
There are many firmware solutions targeted at CPU and system bring
up. IBM has the PIBS (3) (PowerPC * Initialization and Boot Software)
firmware stack, and there are other products that offer similar
functionality. (4) By presenting a common layer between the hardware and
the software, they isolate many of the platform details from the
operating system. A similar solution is provided by the Linux based
LinuxBIOS, (5) which serves as a firmware stack that initializes the
hardware and boots a second-stage operating system.
In addition, many embedded boards are brought up without firmware,
and instead a minimal boot loader is used to load a Linux kernel. (6) In
this case the boot-loader does some low-level initialization and loads
the kernel, which is then responsible for initializing the test of the
system.
Other methods for accelerating system bring up are based on
simulation and include the technique known as virtual power on. (7) This
technique, used extensively by the IBM zSeries * development team,
employs simulation to debug firmware and resolve operating-system
bring-up issues before hardware is available.
Firmware solutions and LinuxBIOS still require a kernel to be
loaded and control to be transferred to this kernel. This means there
are two code bases to work with as well as interface constraints between
them. BML on the other hand was developed as a single code base, which
made it easy to develop and debug incrementally. The virtual power-on
concept is complementary and focused on software.
The test of the paper is organized as follows, in the next section,
"BML design," we describe the key changes we made to the Linux
kernel. In the following section, "Experience," we describe
how BML performed as a tool for POWER5 bring up. We also describe
several additional applications that BML was round to be helpful with,
such as chip manufacturing and Linux development. The section
"Conclusion" contains some final comments.
BML DESIGN
A number of changes were needed to run the Linux kernel without a
firmware stack. We describe some of the key changes we made in order to
adapt the Linux kernel to our environment in the following subsections.
Hardware discovery
IBM POWER5 systems use Open Firmware (8) to load the operating
system, which then would normally interact with this firmware during
early boot and initialization. Among other things, Open Firmware
provides a method of device and system discovery in which various system
parameters can be determined, such as the amount of system memory, the
number of CPUs, and the location of PCI (Peripheral Component
Interconnect) host bridges. PCI is laid out in a tree structure with the
host bridge at the foot. The host bridge is the point at which the CPU
interfaces with the PCI subsystem.
In the BML environment there is no Open Firmware code to provide
this information. This, however, presents less of a problem than it
seems at first. Because Linux is highly portable, architecture-specific
system interfaces such as Open Firmware and ACPI (9) (an open industry
specification for configuration and power management) cannot be used
throughout the operating system. Instead, the architecture-specific
system interfaces are isolated in small, easy-to-modify sections of
code. A simple interface was provided to accept the critical system
information that is normally gathered by means of Open Firmware. This
involved loading a number of parameter values into general-purpose
registers before starting the kernel initialization thread. The
parameters were as follows:
* Number of CPUs--POWER5 systems can have up to 128 threads. A
128-bit mask of the available CPU threads was passed in by means of a
pair of registers.
* Memory size--The memory size in gigabytes was passed in by means
of a register. In the interest of simplicity, memory was required to be
contiguous starting at address zero.
COPYRIGHT 2005 All Rights
Reserved. Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2005, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.