Cache system intro
Asia Pacific Institute of Information Technology
Copyright©
2000 All Rights Reserved. Computer System Research Group 7.
The
robust multi-level, 512-entry, split TLB cache significantly improves performance
of systems configured with large physical memory or storage, typically found
in server environments, by caching all important translation information used
by operating systems and application software that access large physical memory
or storage. Thus, the cache architecture of the AMD Athlon processor enables
high instruction execution rates by minimizing effective memory latency and
system snoop responses, and it provides large spatial locality of data for transaction-based
applications and multiprocessing operating systems. The architecture also supports
high-bandwidth data transfers to and from the execution resources, and it contributes
to significant performance gains and extremely fast operation of data-intensive
software programs.
The
AMD Athlon processor's cache architecture is the first to incorporate a system-based
MOESI (Modify, Owner, Exclusive, Shared, Invalid) cache control protocol for
x86 multiprocessing platforms. Since the system logic manages memory coherency
throughout the system by specifying all cache state transitions, either using
a MESI or MOESI cache coherency protocol, and by filtering out unnecessary processor
snoops, AMD Athlon processors are designed to deliver exceptional performance
in both uniprocessor and multiprocessor system configurations. The AMD Athlon
processor cache architecture also supports error correction code (ECC) protection,
which is a required feature for high reliability of business desktop systems,
workstations, and servers. Thus, the AMD Athlon processor's cache architecture
provides the features required for high-performance computing from desktop to
server configurations.
The newest Pentium III processors include support for 100 and 133 MHz system
bus and Advanced Transfer Cache featuring 256K on-die, full speed L2 cache plus
Advanced System Buffering. The Pentium® III processor includes two separate
16 KB level 1 (L1) caches, one for instruction and one for data.
The
AMD Athlon processor with performance enhancing cache memory includes an integrated,
full-speed, 16-way set-associative, 256KB L2 cache. Previous AMD Athlon processors
contain an L2 controller which operates at the maximum frequency compatible
with the latest industry-standard SRAMs. By integrating the L2 cache onto the
processor, the L2 cache always operates at the same frequency as the processor,
thereby minimizing any delays incurred waiting for external data from a slower
bus. The newer AMD Athlon processor's L2 cache is 16-way set-associative, twice
that of the L2 cache of the Intel Pentium III processor (16-way vs. 8-way).
Higher associativity dramatically improves application performance since more
local application data resides in the high-speed L2 cache memory instead of
system memory. Finally, the integrated L2 cache tags improves performance by
quickly indicating whether critical application data is located within the L2
cache. Having integrated tags is especially important for processors which utilize
external SRAMs for the L2 cache. If application data is determined not to reside
in the L2 cache early enough, then the processor can immediately request this
data from the slower system memory, instead of checking for this data in an
external L2 cache first, and then, having to request this data from system memory.
The
L1 cache provides fast access to the recently used data, increasing the overall
performance of the system. Certain versions of the Pentium III processor include
a Discrete, off-die level 2 (L2) cache. This L2 cache consists of a 512 KB unified,
non-blocking cache that improves performance over cache-on-motherboard solutions
by reducing the average memory access time and by providing fast access to recently
used instructions and data.