

# Event-driven computing

**Andrew Brown** 

Southampton adb@ecs.soton.ac.uk

Simon Moore

Cambridge simon.moore@cl.cam.ac.uk

**David Thomas** 

Imperial College d.thomas1@imperial.ac.uk

**Andrey Mokhov** 

Newcastle andrey.mokhov@newcastle.ac.uk













### Imperial College London



















- POETS the concept
- The realisation
- A panorama of applications
- Down amongst the hard sums
- A brief meander into reliability

### **POETS**

# Cores are the new transistor

### -Silicon

- 10um .. 10 nm
- 2<sup>10</sup> reduction in feature size
- 2<sup>20</sup> increase in density
- ~20 process generations



The rise of the many-core exponent is just beginning

# Fabrication technology...



...has bought about massive changes:

- Hardware increasingly statically and dynamically unreliable
- Cores are free
- Communication costs ~ 1000 x compute costs

### **POETS**

# Simulation: The old way

- Build a model:
  - Probably some sort of graph
  - Local interactions
- Process the model:
  - Linearise and shove it through some (small) set of processors
- Big problems:
  - A lot of fetching
  - A lot of shoving
  - A lot of writing back

# The POETS way



- Build a model:
  - Probably some sort of graph
  - Local interactions
- Process the model:
  - Distribute processors over the model
- Big problems:
  - No fetching
    - Wherever there's data, there's a processor
  - Little shoving
    - Things interact with their neighbours in parallel
  - No writing back
    - Data is stored locally to the generation site

Interaction via <u>small</u>, <u>fast</u> messages

# Software design...



...needs to change in response to this plethora of cores:

- Software must cope with
  - Unknown and changing physical core topologies
  - Non-deterministic message passing
  - Localised data
- Dramatic performance improvements
  - For certain classes of problems
- Underlying mathematics still valid
  - Algorithmic embodiment different





### Inspired by SpiNNaker

**Partially** Ordered **Event** Triggered **Systems** 



We learnt so much from Steve

# It's all about speed





# The reward...



- Dramatic performance improvements
  - For certain classes of problems

 $O(n^?) \longrightarrow O(1)$ 



# Imagine a system...



- Arbitrarily large number of cores
  - Cores are free
- Core-core communications cheap
  - We designed it that way

- What might we do with it?
- How might we do anything with it?





- POETS the concept
- The realisation
- A panorama of applications
- Down amongst the hard sums
- A brief meander into reliability

# A POETS board





# A box of boards





# The penteract







# The reality





# Prototype: Tinsel



- Tinsel = multithreaded POETS processor
- 32-bit multithreaded RISC-V
- 64 cores x 16 threads/Tinsel = 1024 P-cores
- 16 caches, 16 mailboxes, 2 DDR3 controllers
- 40% of DE5-NET FPGA board
- Clocks at over ~300MHz (fast for a softcore)
- Passes RISC-V regression test suite

# Hardware is nothing without software...

### **POETS**







- POETS the concept
- The realisation
- A panorama of applications
- Down amongst the hard sums
- A brief meander into reliability

# What sort of problem?









- POETS the concept
- The realisation
- A panorama of applications
- Down amongst the hard sums
- A brief meander into reliability

# A real problem....



### Heat equation

$$\frac{\partial u}{\partial t} = D \frac{\partial^2 u}{\partial x^2}$$

#### becomes

n<sup>th</sup> timestep

$$u_{i}^{(n+1)} = u_{i}^{(n)} + D \frac{\partial t}{\partial x_{i}^{2}} \left( u_{i-1}^{(n)} - 2u_{i}^{(n)} + u_{i+1}^{(n)} \right)$$

dx<sub>i</sub> are different because the mesh is irregular simulated time t = ndt

### Numerical methods



Gauss-Seidl:



Convergence fast: *newest* values used as soon as they become available

Predicate tree thin - not parallelisable

Jacobi:



Convergence slow: *entire* old value set used to generate *entire* new value set

Predicate tree wide - easily parallelisable

# Event-driven method





Solution trajectory (non-deterministic) - massively parallel **Event-driven** (m) (m+1) $X_1$  $X_2$ **≯**X<sub>2</sub> **X**3  $X_3$  $X_4$ **≯**X<sub>Δ</sub> **⋄**X<sub>5</sub> X<sub>5</sub> **X**6  $X_6$  $X_7$  $X_7$ 

Values updated randomly

IN PARALLEL

# Solution times





### POETS /

# Synfire rings







- POETS the concept
- The realisation
- A panorama of applications
- Down amongst the hard sums
- A brief meander into reliability

# Reliable computing on unreliable substrates



Finite difference heat diffusion canonical 2D square grid:



Diagonal temperature profile vs iteration

# When things break:



- System with 1000000 cores
- Unrealistic to expect
  100% uptime
- What happens when cores die?
- Algorithm 'self-heals' around unresponsive core



# 5% device death



#### Results degrade gracefully







Diagonal temperature profile vs iteration

# On the nature and consequence of failures



- Generic electronic system:
  - Stuck-at, bridging, crosstalk....
  - Arbitrary effects propagate usually disabling
- Event-based architectures:
  - Interconnect complex
  - Core/node fails silently
  - Affected mesh sites disappear from model
  - Solution algorithm unaffected





A desktop machine ...

- a *personal* computer -

... will have 25000 cores

Tree 'control\_5a\_run\_5\_stuff\P\_0000\_F\_Blob.frm' u 3:4:5

