ACM IEEE 37<sup>th</sup> International Symposium on Computer Architecture

## **Elastic Cooperative Caching:**

An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors

Enric Herrero<sup>1</sup>, José González<sup>2</sup>, Ramon Canal<sup>1</sup> <sup>1</sup>Universitat Politècnica de Catalunya <sup>2</sup>Intel Barcelona



UNIVERSITAT POLITÈCNICA DE CATALUNYA

## Outline

- Motivation
- Related Work
- Elastic Cooperative Caching
- Evaluation
- Conclusions



## **Motivation**

- Find optimal cache organization for tiled microarchitectures
- Desired behavior
  - Scalable
  - Minimize access latency
  - Minimize inter-thread interference
  - Minimize off-chip misses –

Avoid centralized structures.

- Data placement based on proximity.
- → Private cache partitions.

Dynamic cache allocation.



## **Motivation**



# Application Taxonomy Saturating Utility Low Utility

- Shared High Utility
- Private High Utility

Extended classification from Qureshi et al. [MICRO'06]



## **Related Work**

- Reactive NUCA [ISCA'09]
- Adaptive Selective Replication [MICRO'06]
- Adaptive Shared/Private NUCA [HPCA'07]

- OS-page granularity.
- Software based.
- Common shared cache space.
- Adjusts replication but not amount of cache per node.
- Centralized structures.



More: Athena Award Lecture Mary Jane Irwin



#### **Elastic Cooperative Caching – Structure**



#### Elastic Cooperative Caching – Adaptive Spilling

#### ElasticCC oportunity: Not only repartition but also decide which nodes can use shared partitions.

| Туре                    | Working<br>Set Size | Sharing | Local<br>Reuse | Private<br>Cache Size | Spilling |
|-------------------------|---------------------|---------|----------------|-----------------------|----------|
| Saturating<br>Utility   | Small/<br>Medium    | H/L     | H/L            | Small/<br>Medium      | No       |
| Low Utility             | Big                 | Low     | Low            | Small                 | No       |
| Shared<br>High Utility  | Big                 | High    | H/L            | Small                 | Yes      |
| Private<br>High Utility | Big                 | Low     | High           | Big                   | Yes      |

Spill shared blocks or blocks fromcaches with 75% or more private cache space



#### **Elastic Cooperative Caching – Structure**



#### **Desired** behavior

- Scalable
- Minimize access latency
- Minimize interthread interference
- Minimize off-chip misses

Cache Partitioning. Dynamic Cache Allocation.



#### **Evaluation** – Studied Configurations

#### 16 Processors

- Pairs of SPEC OMP'01 benchmarks of each of previous categories.
- Configurations
  - Shared Memory
  - Private Memory
  - Distributed Cooperative Caching (DCC)
  - Adaptive Selective Replication (ASR)
  - Elastic Cooperative Caching
  - ElasticCC + Adaptive Spilling
  - Ideal: Fixed Half Private/Half Shared 2xL2



## **Evaluation – Performance & Efficiency**





#### **Evaluation – Off-Chip Misses & Reuse**





Gafort – Low Utility

Apsi, Art, Equake – Saturating Utility

**Ammp** – Shared High Utility

**Swim** – Private High Utility





#### Gafort – Low Utility

No reuse, does not benefit from caches.









#### **Evaluation - Temporal Cache Behavior**



Gafort-Equake execution, Equake Thread 1



## Conclusions

#### Elastic Cooperative Caching

- Distributed organization
- Adaptive behavior to application requirements



ACM IEEE 37<sup>th</sup> International Symposium on Computer Architecture

## **Elastic Cooperative Caching:**

An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors

Enric Herrero<sup>1</sup>, José González<sup>2</sup>, Ramon Canal<sup>1</sup> <sup>1</sup>Universitat Politècnica de Catalunya <sup>2</sup>Intel Barcelona

eherrero@ac.upc.edu



UNIVERSITAT POLITÈCNICA DE CATALUNYA