Cache

2 minute read

Published:

Memory Types

Static RAM

  • 6 Transistors per bit, of which make up two inverters & two other for off/on
  • Optimized of speed primarly and density secondly
  • Volatile

Dynamic RAM

  • 1 Transistor and 1 Capacitor per bit
  • Optimized for density
  • Volatile

Volatile Storage

  • Magnetic Disk
  • Flash RAM

Memory Hierachy

Registers

  • Accessed Directly via user level ISA

Memory

  • Accessed Indirectly via user level ISA

Disk

  • Not Accessed via user level USA

Memory Hierachy In Detail

Level:

0 - Registers.

1 - Primary Cache.

2 - Secondary Level Cache ~ Made of SRAM.

3 - Main Memory ~ Made of DRAM.

  • Typically 1GB to 16GB for computers.
  • In the hundreds of GB’s for servers.

4 - Disk (Swap & Files).

Locality

Temporal Locality

  • Recently referenced data/instructions likely to be referenced soon again.
  • Examples are for instance data in inner loops or hot global data.
  • Here the hierachy is reactive; Move things up when accessed.

Spatial Locality

  • Data/instructions nearby referenced data/instruction likely to be referenced soon.
  • Examples are elements in an array, fields in a struct.
  • Hierachy is proactive; Move things up speculatively.

Basic Memory Array Structure

Number of Entries

  • 2^n; n is number of address bits
  • Decoder changes n-bit address to 2^n bit signal

Size of Entries

  • Width of data accessed, 256 bits / 32 bytes in example

Finding Data via Indexing

Basic Cache ~ Array of Block Frames

  • 32 KB Cache ( 1024 Frames, 32B Blocks)

Finding Frame

  • For a 32-bit address
  • 32B Blocks ~ 5 Lowest Bits Locate Byte in Block ~ Offset Bits
  • 1024 Frames ~ 10 bits Find Frame ~ Index Bits

Tags

  • Each Frame Can Hold Multiple Blocks
  • All blocks with the same index bit pattern

Lookup Algorithm

1 Read Frame Indicated By Index Bits

2 if(tag matches && valid bit set)
then hit -> data is good
else miss -> data not good

Handling Cache Miss

Tag overhead of 32KB cache with 1024 32B

  • 32B Frames ~ 5 Bit Offset
  • 1024 Frames ~ 10 Bit Index
  • 32-bit Address ~ 32-bit Address - 5 Bit Offset - 10 Bit Index = 17 Bit Tag
  • (17 Bit + 1 Bit Valid) * 1024 Frames = 18Kb Tags = 2.2KB ~~~ 2.2/32 = 0.68 ~ 6.8%

Cache Performance

  • tavg = t_hit + (%miss * t_miss)
  • %miss will change depending on program because:
    • %miss = #miss/#accesses
    • Each program will have a different number of accesses

Cache Calculations

4 Bit Address ~ 8B Cache, 2B Blocks

  • Number of Frames = (Capacity/Block Size) = 8/2 = 4 Frames
  • Offset = log2(2) = 1 ~ 0000
  • Index = log2(#Frames) = log2(4) = 2 ~ 0000
  • Tag = 4 - 1 - 2 = 0000