Computers get hot, really hot. Right now, individual locations on a chip
can, for a brief moment, produce as much heat as a hot plate. And while the prospect
of boiling water on your motherboard may be intriguing, thermal management has
become a pressing concern for computer designers. Consider these facts: the heat
produced by a computer chip is directly related to its power density-and the power
density of high-end microprocessors is doubling every three years.
Hot
spots can cause timing errors as transistors fail to switch properly and even
produce physical damage. To control temperatures, the creators of the Pentium
4 chip incorporated a simple thermal management technique in the chip's design.
When temperature on the chip exceeds a certain threshold, the chip slows down
using a technique called clock-gating. This is only a stop-gap measure, however.
As chips increase in density and therefore in heat production, the trade-off between
performance and temperature regulation is going to become too costly.
Computer
science professor Kevin Skadron is working to develop new, more flexible approaches
that attack thermal management from a broader perspective. His work takes into
account that there are other variables- such as the frequency at which hot spots
develop, the temperature gradient across a chip, and specific behaviors that vary
from program to program-that affect a chip's performance and cooling requirements.
He also argues that a more important measure for thermal management is expected
life, rather than specific temperature.
Along with electrical and computer
engineering professor Mircea Stan, he is working to develop a model that incorporates
these factors and that does so at a finer resolution than existing simulations.
Instead of reflecting average chip-wide temperatures, this model reflects temperatures
at individual functional units. With HotSpot, the modeling program that they have
developed, computer architects can identify the hottest functional units on the
chip, assess the effect of different thermal arrangements on performance, battery
life, and temperature, and understand the thermal implications of running a particular
software program.
Most importantly, HotSpot sets the stage for evaluating
different thermal management techniques. In other words, if you can identify potential
hot spots and dangerous temperature gradients and understand the trade-offs, you
can maximize performance over the expected lifespan of the chip.
Skadron
has found that a hybrid approach, combining a method called fetch-gating (reducing
the instruction activity moving through the chip) and dynamic voltage scaling
(which can lower its operating frequency), provides a solution that accommodates
many of the different kinds of thermal stress that may develop.
Given the
number of times that HotSpot has been downloaded, it is clear that researchers
find Skadron's approach valuable. With funding from the National Science Foundation,
Intel, and a U.Va. FEST grant, he and Stan are now refining HotSpot even further
so that it can provide a basis for understanding the relationship between thermal
performance and a chip's lifespan. Such knowledge would enhance the ability of
chip designers to develop fast, energy-efficient, reliable, and temperature-aware
computer systems.
This story first appeared in the Explorations, a publication
produced by the Office of the Vice President for Research and Graduate Studies.
Visit
Explorations here.