### ARC: Computer Architecture

tanguy.risset@insa-lyon.fr Lab CITI, INSA de Lyon Version du March 27, 2025

Tanguy Risset

March 27, 2025

### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- (8) Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle

### ARC course presentation

- Schedule:
  - Course 4h
  - labs (TP) 20h
  - Evaluation (In french): un seul devoir papier en fin de cours
- skills and knowledge learned in ARC cours:
  - Bolean logic, arithmetics
  - combinatorial and sequential logic circuits, automata.
  - Processor architecture, datapath, compilation process, RISC architecture
  - Assembly code, link with high level programming languages
  - Simple processor design, simple assembly program analysis.
  - Link with compilation, operating systems and programming
- Moddle (open): frames, labs, various document
- Course based on the two IF architecture course: AC and AO (open courses on Moodle).





# Computer architecture usefulness

- How to solve a problem with electrons:
- ARC is useful
  - For general knowledge of a computer scientist
  - To understand pro/cons of modern complex architectures
  - For embedded system programming

Problem
Algorithm
Program
Run-Time system
Architecture/ISA
Micro-Architecture
Logic
Circuit
Electrons



### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle

### History of computing

from Yale Babylonian Collection,  $\simeq$  1600 BC

- Ancient time: various arithmetics systems
- 17th century (Pascal and Leibniz): notion of mechanical calculator
- 1822 Charles Babbage Difference engine (tabulate polynomial htt functions)
- 1854 Georges Boole proposes the so-called Boolean logic.
- (More details on the poly or on Internet)



http://www.math.ubc.ca/~cass/Euclid/ybc/ybc.html



By By Carsten Ullrich - Own work, CC BY-SA 2.5

# History of computers

- 1936: Alan Turing's PhD on a universal abstract machine
- 1941: Konrad Suze builds the Z3 first programmable computer (electro-mechanic)
- 1946: ENIAC is the first electronic calculator
- 1949: Turing and Von Neumann build the first universal electronic computer: the Manchester Mark 1
- (More details on the poly or on Internet)





Z3 computer at Deutches Museum, Munich



By Venusianer, CC BY-SA 3.0



#### Table of Contents

- introduction
- History
- **Electrons and Logic**
- Processor Architecture
- Automate
- The Russian train example
- Mealy and Moore Automata
- Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle



- Discovered in 1947 at Bell Labs: (transfer resistor)
- Could replace the thermionic triode (vacuum tube) that allow radio and telephone technologies.
- Principle: flow or Interrupt current between Source and Drain, depending on Gate status
  - Can be seen as a switch
  - Wildly used after Integrated Circuit invention (1958)



Mosfet technology

# Popular Transistor technology: CMOS

- CMOS: Complementary Metal Oxide Semiconductor
- Two logical levels : 0 = 0V and 1 = 3V
- Two types of transistors
  - nMOS : current flows if gate is 1
  - pMOS : current flows if gate is 0
- Mainly used to realize basic logical gates (NOT, NAND, NOR, etc.)



### Moore's low

- Gordon Moore, co-founder of Fairchild Semiconductor and Intel, predicted in "a doubling every two year in the number of components per integrated circuit"
- Contributed to world economic growth
- Slow down in 2015 and is ended now.



### Boolean functions

#### Boole Algebra is equipped with three operations

- a unary operation, **negation**, noted NOT;
- two binary commutative, associative operations:
  - conjunction AND, with 1 as neutral element;
  - **disjunction** OR, with 0 as neutral element;
- AND is distributive over OR.

If a and b are 2 boolean variables, we write:

$$NOT(a) = \overline{a}$$
,  $AND(a, b) = ab = a.b$ ,  $OR(a, b) = a + b$ 

### Boolean Cheat Sheet

- neutral elements: a + 0 = a,  $a \cdot 1 = a$
- absorbing elements: a + 1 = 1,  $a \cdot 0 = 0$
- idempotence: a + a = a,  $a \cdot a = a$
- tautology/antilogy:  $a + \overline{a} = 1$ ,  $a \cdot \overline{a} = 0$
- commutativity: a + b = b + a, ab = ba
- distributivity:  $a + (bc) = (a + b)(a + c), \quad a(b + c) = ab + ac$
- associativity: a + (b + c) = (a + b) + c = a + b + c,

$$a(bc) = (ab)c = abc$$

• De Morgan's law:  $\overline{ab} = \overline{a} + \overline{b}$ ,

$$\overline{a+b} = \overline{a} \cdot \overline{b}$$

• others: a + (ab) = a,  $a + (\overline{a}b) = a + b$ ,

$$a(a+b)=a, \quad (a+b)(a+\overline{b})=a$$

# Elementary logical gates



Amplifier:

$$F = x$$

| X | F |
|---|---|
| 0 | 0 |
| 1 | 1 |

AND: 
$$F = x y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

$$x \longrightarrow F$$

NOT: 
$$F = \overline{x}$$

|   | X | F |  |
|---|---|---|--|
|   | 0 | 1 |  |
| - | 1 | 0 |  |



$$F = \overline{(x \ y)}$$

| X | У | F |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |



Tanguy Risset

ARC: Computer Architecture

15

introduction 0000

History 000

Mea

### Elementary logical gates



OR:

$$F = x + y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |



XOR:

$$F = x \oplus y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |



NOR:

$$F = \overline{(x+y)}$$

| Χ | y | , |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 0 |



XNOR:

$$F = x \odot y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

# Combinatorical circuit Design

- Boolean description of the problem:
  - Compute y and z from a, b and c
  - y is 1 if a is 1 or b and c are 1.
  - z is 1 if b or c is 1 (but not both) or if a, b et c are 1.
- Truth table
- Open Logic equation

• 
$$y = \overline{a}bc + a\overline{b}\overline{c} + a\overline{b}c + ab\overline{c} + abc$$

• 
$$z = \overline{a}\overline{b}c + \overline{a}b\overline{c} + a\overline{b}c + ab\overline{c} + abc$$

Optimized logic equations

• 
$$y = a + bc$$

• 
$$z = ab + \overline{b}c + b\overline{c}$$

logic gates





Tanguy Risset

ARC: Computer Architecture

History

Processor Architecture Automate

The Russian train example

# Disjunctive Normal Form (DNF)

- In Boolean logic, a logical formula in Disjunctive Normal Form (Forme normale disjonctive in French) if:
  - It is a disjunction of one or more clauses
  - where the clauses are conjunction of literals
  - a literal is a variable, a constant or 'not' a variable
- Otherwise put, it is an OR of ANDs.
- Example of DNF:
  - $x.\bar{y}.\bar{z} + \bar{t}.u.v$
  - $(a \wedge b) \vee \neg c$
- Example not in DNF:
  - $\bullet$  (x+y)
  - $a \lor (b \land (c \lor d))$

# Conjunctive Normal Form (CNF)

- In Boolean logic, a formula is in conjunctive normal form (forme normale conjunctive in French) if:
  - it is a conjunction of one or more clauses,
  - where a clause is a disjunction of literals;
  - a literal is a variable, a constant or 'not' a variable
- Otherwise put, it is an AND of ORs.
- Example of CNF:
  - $(x+y+\bar{z})(\bar{x}+z)$
  - $(a+\bar{b}+\bar{c})(\bar{d}+\bar{a})$
  - $\bullet$  x + y
- Example not in CNF
  - $\overline{(x+y)}$
  - x(y + (z.t))

|                                            |                   | <b>◀□▶◀</b> ₫ | P → 4 를 → 4 를 | ▶ \(\beta\) | 999         |
|--------------------------------------------|-------------------|---------------|---------------|-------------|-------------|
| Tanguy Risset                              | ARC: Computer Arc | hitecture     |               |             | 19          |
| introduction History Electrons and Logic F |                   |               |               | in example  | <b>Meal</b> |
|                                            |                   |               |               |             |             |

### From Truth table to DNF

- Back to previous example (z is 1 if b or c is 1 (but not both) or if a, b et c are 1.)
- Truth table on the right, z is 1 if and only if one of the five condition identified occurs.
- It is easy to find a conjunction that is valid in a unique case: example:  $\bar{a}.\bar{b}.c$  is 1 if and only if:  $a=0,\ b=0$  and c=1 (double arrow on the right)
- by adding all the conjunction valid only on each of the five cases identified on the right, we get a DNF formulae that has exactly that truth table.

|   |     |   |   | _            |
|---|-----|---|---|--------------|
| i | npu | t |   |              |
| а | b   | С | Z |              |
| 0 | 0   | 0 | 0 |              |
| 0 | 0   | 1 | 1 | $\Leftarrow$ |
| 0 | 1   | 0 | 1 | $\leftarrow$ |
| 0 | 1   | 1 | 0 |              |
| 1 | 0   | 0 | 0 |              |
| 1 | 0   | 1 | 1 | $\leftarrow$ |
| 1 | 1   | 0 | 1 | $\leftarrow$ |
| 1 | 1   | 1 | 1 | $\leftarrow$ |

Hence the possible formulae for z:  $z = \overline{abc} + \overline{abc} + \overline{abc} + a\overline{bc} + ab\overline{c} + ab\overline{c}$ How can it be simplified?

# Simple Boolean optimization: Karnaugh Table (1)

• Karnaugh map (tables de Karnaugh) use a "visual" reprentation of a simple property:

 $(a.\bar{b}) + (a.b) = a.(\bar{b} + b) = a$ 

- The first step in the method is to transform the truth table (3 or 4 input variables) of the function in a two-dimensional array (split into two parts of the set of variables)
- Rows and columns are indexed by the valuations of the corresponding variables in such a way that between two rows (or columns) only one boolean value changes.
- In our example (3 variables):

| a b | 0 0 | 0 1 | 1 1 | 1 0 |
|-----|-----|-----|-----|-----|
| С   |     |     |     |     |
| 0   | 0   | 1   | 1   | 0   |
| 1   | 1   | 0   | 1   | 1   |

Tanguy Risset ARC: Computer Architecture Electrons and Logic Processor The Russian train example

# Simple Boolean optimization: Karnaugh Table (2)

- Then, we try to cover all '1' of the table by forming groups.
  - each group contains only adjacent '1'
  - must form a rectangle
  - the number of elements of a group must be a power of two.
- each group correspond to a possible optimization of the DNF
- In our example:

| ſ | a b | 0 0 | 0 1 | 1 1 | 1 0 |
|---|-----|-----|-----|-----|-----|
|   | С   |     |     |     |     |
| ſ | 0   | 0   | 1   | 1   | 0   |
| Ī | 1   | 1   | 0   | 1   | 1   |

- example : Three groups:
  - $\bar{a}.b.\bar{c} + a.b.\bar{c}$  simplifies to  $b.\bar{c}$
  - $a.b.\bar{c} + a.b.c$  simplifies to a.b
  - $a.\bar{b}.c + \bar{a}.\bar{b}.c$  simplifies to  $\bar{b}.c$
- hence  $z = \overline{abc} + \overline{abc} + a\overline{bc} + ab\overline{c} + abc$  simplifies to  $z = a.b + \bar{b}.c + b.\bar{c}$

#### Well formed cicruits

As far as combinatorial circuits are concerned, a "Well formed" circuit is:

- A logic gate
- A wire
- Two well formed circuits next to each other
- Two well formed circuits, the outputs of one being the inputs of the other
- Two well formed circuits sharing a common input It can be shown that it correspond to an acyclic graph of logic gates.
- No cycles, no ouptuts conected together

|                      |                | Tanguy Risset       | ARC: Computer Are | chitecture |         | 23         |
|----------------------|----------------|---------------------|-------------------|------------|---------|------------|
| introduction<br>0000 | History<br>000 | Electrons and Logic |                   |            | example | <b>Mea</b> |
| 11 (11               |                | hinatorics logic    |                   |            |         |            |

- *n* input multiplexer
- decoder  $log(n) \rightarrow n$
- n bits adder
- n bits comparator
- n bits ALU
- etc.

# Memorizing: latches and Flip-Flops

• Set-Reset Latch (SR latch, *Bascule RS*): When R and S are reset, Q and  $\overline{Q}$  keep their previous value.



| S | R | Q         | $\overline{Q}$       |
|---|---|-----------|----------------------|
| 0 | 1 | 0         | 1                    |
| 1 | 1 | forbidden | forbidden            |
| 1 | 0 | 1         | 0                    |
| 0 | 0 | $Q_{n-1}$ | $\overline{Q_{n-1}}$ |

• Gated D latch (Flip-flop, register, *Bascule D*): sample input data on clock rising edge and keeps the value when clock is 0.



ARC: Computer Architecture 25

Processor Architecture Automate

The Russian train example

Meal

latches and Flip-Flops: other common representation

• Latch (verrou)

History



Flip-Flop (register)

# Sequential logic

Sequential logic combines logic function and memorizing, it opens the way to synchronous circuits, automata, programs, algorithms....

- *n* bits register
- n bits counter
- state machine
- CPU
- Computer



- Extremely complex in general.
- Many computation models:
  - Sequential
    - State machine
    - control + data-path
  - task parallelism (communicating tasks)
  - Data parallelism (data-flow)
  - Asynchronous circuits
- Important notion use every where: finite state machine (automate)

# Logic in ARC: Digital software

In ARC: use of Digital software
(https://github.com/hneemann/Digital)

- Design basic logic components (TD1)
- Design of a memory (sequential component, TD2)
- Design of dedicated circuit: integer division (TD3).
- Study of a Von Neumann 8-bit processor (TD4)





### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- 4 Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 8 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle

incr D

### What is a Von Neumann machine?



- Computer architecture Model (also called *Princeton* architecture) proposed after J. Von Neumann report: "First Draft of a Report on the EDVAC".
- Usually abstracted as a processor connected to a memory
- The memory is accessed (randomly) with an address (i.e. unlike a Turing machine)
- The memory contains both data and program (unlike a Harvard machine).



#### Compilation, Assembly code and binary code

| High Level Language $\Rightarrow$ | Assembly code $\Rightarrow$ | Binary code $\Rightarrow$ |
|-----------------------------------|-----------------------------|---------------------------|
| int a,b,c;                        | load RO, @b                 | 0100101110101             |
| a = b + c;                        | load R1, @c                 | 0100101010001             |
|                                   | add R3,R0,R1                | • • •                     |
|                                   | store R3, @a                | 1001001100011             |

# Fast compilation thanks to Donald Knuth (and others..)

- The programmer:
  - Write a program (say a C program: ex.c)
  - Compiles it to an object program ex.o
  - links it to obtain an executable ex

#### content of ex.c

```
#include <stdio.h>
int main()
{
   printf("hello World\n");
   return(0);
}
```



Tanguy Risset

ARC: Computer Architecture

introduction History 5000 Sociologo Sociolo

Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



Tanguy Risset ARC: Computer Architecture 34

introduction on a Processor (8 general purpose

Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)





# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



introduction History Electrons and Logic Processor Architecture Automate The Russian train example Monday of the Russian train example Mo

# Program execution on a Processor (8 general purpose registers)



Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Program execution on a Processor (8 general purpose registers)



# Computer Architecture in ARC

- Design of a simple dedicated circuit in logisim
- Study of a simple processor in logisim
- Overview of assembly code principles
- Compilation basics
- embedded system case study

|                      | Та | nguy Risset | ARC: Compute        | r Architectur | re                        | 35      |
|----------------------|----|-------------|---------------------|---------------|---------------------------|---------|
| introduction<br>0000 |    |             | Processor Architect |               | te The Russian train exam | ple Mea |
| A 1 1                | ,  |             |                     |               |                           |         |

### Add on: two's complement representation

- Two's complement (complément à deux) is the most common representation for negative integers
- For a number on N bits:
  - Positive integers from 0 to  $2^{N-1}-1$  are represented with usual binary encoding
  - Negative integer x from  $-2^{N-1}$  to -1 are represented by coding in binary the positive number  $2^N |x|$
  - Hence Negative integers always have the last (i.e. most significant) bit at 1, and positive always have the last bit at 0
- Example with N=3
  - Integers between  $-4_{10}$  and  $3_{10}$  can be represented
  - $-1_{10}$  is represented as  $111_2$  ( $2^3 1 = 7$ )
  - $-2_{10}$  is represented as  $110_2$  ( $2^3 2 = 6$ )
  - $-4_{10}$  is represented as  $100_2 (2^3 4 = 4)$

# Add on: two's complement representation (2)

- Two's complement have an important property: Addition "classical" algorithm works (except that the overflow should be ignored).
- Example:
  - $-1_{10} + (-2_{10}) = 111_2 + 110_2 = 1101_2 = (ignoring the)$  $carry/overflow)101_2 = -3$
  - $-1_{10} + 2_{10} = 111_2 + 010_2 = 1001_2 = \text{(ignoring the)}$  $carry/overflow)001_2 = 1$
- For x > 0,  $x \le 2^{N-1}$ , The representation of -x on N bit two's complement can be obtained by:
  - Complementing each bits of x
  - adding 1 to the resulting integer
- Example:
  - with N=3 and  $x=3_{10}=011_2$ , complement of x is  $100_2$  adding 1 gives  $101_2 = -3_{10}$
  - With N=8 and  $x = 96_{10} = 01100000_2$  complement of x is 10011111, adding one is  $-96_{10} = 10100000_2$ , indeed  $256 - 96 = 160 = 10100000_2$

Tanguy Risset

ARC: Computer Architecture

Processor Architecture Automate The Russian train example

### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 5 Automate
- The Russian train example
- Mealy and Moore Automata
- Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- D Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle

#### Automata

• Definition (Wikipedia): An automaton is a self-operating machine, or a machine or control mechanism designed to automatically follow a predetermined sequence of operations, or respond to predetermined instructions.

- In computer science:
  - Used in language theory to build compilers
  - Used in any technical domain: to describe predetermined behaviour.
  - Used in computer architecture: to design dedicated circuit.
  - A computer is a specific automaton.



- Un automate est une collection de K états numérotés de 0 à K-1. ainsi qu'une collection de transitions
- Un état particulier est l'état initial.
- Tous les états sont soit des états d'acceptation et soit des états de refus
- Les transitions, sont étiquetées
  - o soit par des actions (par exemple, je lis la lettre x)
  - 2 soit par des condition (par exemple, la lettre x est présente)
- le triplets (état 1, lettre x, état 2) signifie: si je suis dans l'état 1 et que je lis la lettre x, alors je vais dans l'état 2.



### Notion d'automate

- Fonctionnement d'un automate
  - Initialisation de l'automate dans l'état
  - il lit les lettres du mot une par une
    - s'il trouve une transition possible, il l'exécute,
    - sinon il répond «le mot n'appartient pas au langage»;
  - si l'automate arrive à effectuer des transitions jusqu'à la dernière lettre du mot, il regarde alors dans quel état il termine:
    - si l'état appartient à la classe d'acceptation, l'automate répond «le mot appartient au » (on dit que le mot est reconnu),
    - sinon, il répond «le mot n'appartiennent pas au langage».



#### Notion de mot reconnu



- fee  $\rightarrow$  reconnu
- feu  $\rightarrow$  reconnu
- fei  $\rightarrow$  non reconnu (impossible de lire 'i')
- ullet fe o non reconnu (arrêt dans un état non final)

### Link with architecture: Computers are automata

- Every computing machine is an automata
- Computer are *universal* in the sense that the program gives much flexibility in the action performed.
- In fact the basic action of a computer is very repetitive:
  - Read the instruction at \$PC in memory
  - decode the instruction
  - send the decoding to the ALU (or to memory if it is a load)
  - increment \$PC
- Dedicated circuits (ASICs) are automata designed for specific tasks.



### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 5 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 8 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- 12 Pipelining RISC instructions: the "Von Neumann" cycle

# Example from the poly



- A piece of unique train track for both train directions between the cities T. et K.
- Sensors triggered by train weight on rallways will command red lights when the track is used by a train.
- Modeling:
  - A booleen A (for 'Ampoule') indicating the state of the red light
  - Two booleans (LS for Left Sensor and RS for Rigth sensor) indicating the states of the sensors
  - An automaton to command the red lights



#### The Russian train automaton



### The Russian train automaton



- Circles are *states* of the automaton (e.g. NoTrain state models the cases where no train stand on the track).
- States specifies output Values (here only one: A)
- Arrows are transitions, labeled by event that triggered them.

# Back to the Russian train example



- The Inputs are RS and LS sensors Boolean values
- The Output is the value of Boolean
- ullet The functions (Transition and Output) can be defined by tables  $\Rightarrow$
- X means 'don't care'

| S        | x=(LS, R | S) | s'=T(s,x) |  |
|----------|----------|----|-----------|--|
| NoTrain  | 00       |    | NoTrain   |  |
| NoTrain  | 01       |    | TrRight   |  |
| NoTrain  | 10       |    | TrLeft    |  |
| NoTrain  | 11       |    | XXX       |  |
| TrRight  | 0X       |    | TrRight   |  |
| TrRight  | 1X       |    | TrRight2  |  |
| TrRight2 | 1X       |    | TrRight2  |  |
| TrRight2 | 0X       |    | NoTrain   |  |
| S        | y=F(s)   |    |           |  |
| NoTrain  | 0        |    |           |  |
| TrRight  | 1        |    |           |  |
| TrRight2 | 1        |    |           |  |

# Implementation of a synchronous automaton as a circuit



- s is current state, s' is next state, x are input bits, y are output bits.
- Ck and reset are not considered as inputs
- State change will occur on each rising edge of the Clock.



We need to store 5 States, hence we need at least 3 bits:

| value (binary) | state    |
|----------------|----------|
| 100            | NoTrain  |
| 000            | TrRight1 |
| 001            | TrRight2 |
| 010            | TrLeft   |
| 011            | TrLeft2  |



# Russian train output function

• The output function is easy: A is on iff state is "NoTrain"

| S        | y=F(s) |
|----------|--------|
| NoTrain  | 0      |
| TrRight  | 1      |
| TrRight2 | 1      |



### Russian train Transition function: more complicater

| S              | x=(LS, RS) | s'=T(s,x) |
|----------------|------------|-----------|
| 100 (NoTrain)  | 00         | NoTrain   |
| 100 (NoTrain)  | 01         | TrRight   |
| 100 (NoTrain)  | 10         | TrLeft    |
| 100 (NoTrain)  | 11         | XXX       |
| 000 (TrRight)  | 0X         | TrRight   |
| 000 (TrRight)  | 1X         | TrRight2  |
| 001 (TrRight2) | 1X         | TrRight2  |
| 001 (TrRight2) | 0X         | NoTrain   |
| 010 (TrLeft)   | X0         | TrLeft    |
| 010 (TrLeft)   | X1         | TrLeft2   |
| 011 (TrLeft2)  | X1         | TrLeft2   |
| 011 (TrLeft2)  | X0         | NoTrain   |

### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 5 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 1 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- 12 Pipelining RISC instructions: the "Von Neumann" cycle



- Automata are very widely used in computer science in different domains.
- In ARC we use them to control the execution of dedicated synchronous circuits
- As soon as a dedicated circuit is designed, there is an automaton designed.

### Mealy and Moore automata

- We have seen a *Moore automaton*: output only depend on the state (not on the input), usually simpler to handle.
- The most general form of an automaton has a moore output and a mealy output inputs



|                      | Tar | nguy Risset | ARC: Computer Arc      | chitecture |        | 55   |
|----------------------|-----|-------------|------------------------|------------|--------|------|
| introduction<br>0000 |     |             | Processor Architecture |            | kample | Meal |

Summery: from Algorithm to Circuit

- From algorithm to automata (states and input/output)
- From automata to synchronous automata
- From synchronous automata to digital design

introduction History Electrons and Logic Processor A Processor Architecture Automate Meal: The Russian train example

# Lab topic: circuit for integer division

```
n := entrée N
p := entrée P
x := 0
q := 0
tant que x+p \leq n
    x := x+p
    q := q+1
fin tant que
sortie Q := q
```



### Lab topic: proposed circuit to realize it



#### Table of Contents

- 1 introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 8 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- 12 Pipelining RISC instructions: the "Von Neumann" cycle



- The instruction set (Set Architecture statement: ISA) is of
  - It determines the basic instructions executed by the CPU.
  - It's a balance between the hardware complexity of the CPU and the ability to express the required actions
  - It is represented in a symbolic way: the assembly code/language (ex: ADD R1,R2)
  - The tool that translates symbolic assembly code in binary code (i.e. machine code) is also called the assembler
  - Two types of ISA:
    - CISC: Complex Instruction Set Computer
    - RISC: Reduce Instruction Set Computer

### CISC: Complex Instruction Set Computer

- An instruction can code several elementary operations
  - Ex: a load, an add and a store (in memory operations)
  - Ex: computer a linear interpolation of several values in memory
- Need a mode complex hardware (specifically hardware accelerators)
- High variability in size and execution time for different instructions
- Produce a more compact code but more complex to generate
- Vax, Motorola 68000, Intel x86/Pentium



## Example: instructions ISA of Pentium



## RISC: Reduced Instruction Set Computer

- Small simple instructions, all having the same size, and (almost) the same execution time.
- no complex instruction
- Clock speed increase with pipelining (between 3 and 7 pipeline stages)
- Code simpler to generate but less compact
- Every modern processor use this paradigm: SPARC, MIPS, ARM, PowerPC, etc.

Tanguy Risset ARC: Computer Architecture Electrons and Logic Processor Architecture Automate The Russian train example

## Example: instructions of MSP430 ISA

1 operand instruction

|    |    |    |    |    | F  |     |     |   |     |    |   |     |         |   |   |
|----|----|----|----|----|----|-----|-----|---|-----|----|---|-----|---------|---|---|
| 15 | 14 | 13 | 12 | 11 | 10 | 9   | 8   | 7 | 6   | 5  | 4 | 3   | 2       | 1 | 0 |
| 0  | 0  | 0  | 1  | 0  | 0  | opc | ode |   | B/W | Ad | d | Des | st reg. |   |   |

|    |    |    |    |          | relative | e Jumps |   |   |            |          |   |   |   |   |   |
|----|----|----|----|----------|----------|---------|---|---|------------|----------|---|---|---|---|---|
| 15 | 14 | 13 | 12 | 11       | 10       | 9       | 8 | 7 | 6          | 5        | 4 | 3 | 2 | 1 | 0 |
| 0  | 0  | 1  | co | ondition |          |         |   | P | C offset ( | 10 bits) |   |   |   |   |   |

|    |       |    |    |          | 2 opera | ands inst | ruction |     |   |    |   |   |          |   |   |
|----|-------|----|----|----------|---------|-----------|---------|-----|---|----|---|---|----------|---|---|
| 15 | 14    | 13 | 12 | 11       | 10      | 9         | 8       | 7   | 6 | 5  | 4 | 3 | 2        | 1 | 0 |
| o  | pcode |    |    | Dest reg | 5.      |           | Ad      | B/W | A | As |   |   | Dest reg |   |   |

Examples:

- PUSB.B R4
- JNE -56
- ADD.W R4,R4

### Exemple of Pentium ISA

- Write a simple C program toto.c
- Type gcc -S toto.c and get the toto.s file
- you can also use the compiler explorer: https://gcc.godbolt.org/

```
main() {
  int i=17;
  i=i+42;
  printf("%d\n", i);
}
(... instructions ...)
movl $17, -4(\%rbp)
addl $42, -4(%rbp)
(... printf params...)
call printf
(... instrucitons ...)
```

```
Tanguy Risset
                  ARC: Computer Architecture
              Processor Architecture Automate The Russian train example
```

History Electrons and Logic

### Disassemby

- compile the assembly code: gcc toto.s -o toto
- disassemble with objdump: objdump -d toto

```
Adresses
           Instructions binaires
                                      Assembleur
(\ldots)
 40052c: 55
                                             %rbp
                                      push
                                             %rsp,%rbp
 40052d: 48 89 e5
                                      mov
 400530: 48 83 ec 10
                                             $0x10, %rsp
                                      sub
 400534: c7 45 fc 11 00 00 00
                                             $0x11,-0x4(%rbp)
                                      movl
                                             $0x2a,-0x4(%rbp)
 40053b: 83 45 fc 2a
                                      addl
                                             -0x4(\%rbp), %eax
 40053f: 8b 45 fc
                                      mov
 400542: 89 c6
                                             %eax,%esi
                                      mov
% (...)
```

#### common properties of ISA

- An ISA first defines the types of data on which the processors can compute (32 bit memory addresses, integer of various sizes, etc.)
- Then it contains various types of instructions:
  - Computation instructions (add, sub, or, and, ...), with various number of operands
  - Memory addessing instructions (load, store)
  - stack management instructions (push, pop)
  - Flow control instructions (jumps)
  - subroutine calls



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- (8) Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle

#### RISC-V history

- The RISC paradigm was invented 1980 David Patterson (UC Berkeley) and John Hennessy (Stanford U.).
- They described the MIPS architecture in their books "Computer Organization and Design" and "Computer Architecture: A Quantitative Approach."
- The MIPS was built by a commercial compagny (MIPS was in Nintendo 64, Sony PlayStation, PlayStation 2) and use in many architecture courses (including 3TC-ARC!).
- Hennessy and Patterson received the ACM A.M. Turing Award in 2017 (https://amturing.acm.org/byyear.cfm).

From 1980 to 2010, the development of the fifth generation of the RISC research project started and,led to the RISC-V (pronounced "risk-five").



- RISC-V is an open instruction set architecture (ISA), this is fairly new!
- RISC-V International is a global nonprofit organization that owns and maintains the RISC-V ISA intellectual property.
- Its members range from individuals to organizations like Google, Intel, and Nvidia.

### Industry innovation on RISC-V



(from https://www.allaboutcircuits.com/) → ◄ ۗ > > < > < > <

#### RISC-V Processor

- RISC-V is specified by its ISA (RV32I, for integer 32 bits for instance).
- Many extension of the ISA are specified (32 bits, 64 bits, 128 bits, Atomic Instructions, Compressed Instructions, etc.)
- The architecture can be pipelined or not, it can target small embedded systems or large powerfull machines.
- Each processor compagny can build it own RISC-V implementation as long as it respect the ISA specification.
- We will study the RV32I base integer ISA that implements the necessary operations to achieve basic functionality with 32-bit integers.

- a register-to-register (or load/store) architecture
- RISC-V use 3-adress instructions (destination is the first operand)
- 32 32-bits registers (x0-x31) plus a A program counter (pc) of
- register can be name x0, x1, etc. or with their more explicit ABI names: x0 is "zero", x1 is ra (return adress), etc (https://en.wikichip.org/wiki/risc-v/registers)
- x0 is hardwired to value 0
- x1 is the return adress (ra)
- x2 is the stack pointer (sp)
- 32-bit address space: Addressable memory of  $2^{32}$  bytes
  - $\Leftrightarrow$  2<sup>30</sup> words of 4 bytes

#### understanding RISC assembly

From C to assembly:

riscv64-linux-gnu-gcc -S NtimesN.c -o NtimesN.S

#### NtimesN.c

#### NtimesN.s

```
[...]
scanf("%d",&N);
i = N*N + 3*N;
printf("i=%d\n",i);
[...]
```

```
[\ldots]
call
      __isoc99_scanf@plt #call to scanf
      a5,4(sp)
lw
                           #N is now in a5
      a2, a5, a5
                           #a2 <- N*N
mulw
                           #a4 <- 2*N (shift
slliw a4,a5,1
      a5,a4,a5
                           \#a4 < - 2*N+N
addw
                           \#a5 < - N*N + 3*N
addw
      a2,a2,a5
lla
      a1,.LC1
                           #printf args (To b)
li
      a0,1
                           #.. explained late:
call
      __printf_chk@plt
                           #call to printf
[...]
```

### RISC-V register

- 32 registers in the *register file*
- Named
  - by their number: x0 x1 ...x31
  - or by their name zero ra sp fp a0 a1 ...a7 s1 s2 ...
- x0 (zero) contains value 0
- a0 ...a7 are used to pass arguments of a function call
- a0 a1 are used to transmit functions result
- t0 ...t6 and s0 ...s11 are working registers, used for CPU computations
- sp is the stack pointer
- fp is the frame pointer (explained later)
- ra contains the return address (after the end of current function)
- gp is a pointer to global area
- tp is the thread pointer

### Risc-V assembly addressing mode

- The addressing mode defines how the operands of each instruction are interpreted.
- RISC-V has four addressing modes:
  - Immediate addressing: the operand is a constant within the instruction
  - Register addressing: where the operand represents a register.
  - Base addressing: the operand is an address which is the sum of a register and a constant (sometimes called indirect addressing)
  - PC-relative addressing: the operand is an address which is the sum of PC and a constant

Tanguy Risset

ARC: Computer Architecture

75

introduction OOO Sociologic Ooo So

## Example of RISC-V adressing mode

- Register addressing
  - add x1, x2, x3
  - puts in x1 the value of x2 plus the value of x3.
  - x1=x2+x3
- Immediate addressing
  - addi x1, x2, 0x0f
  - addi x1, x2, 15
  - puts in x1 the value of x2 plus 15.
  - x1=x2+15
- Base addressing
  - lw x1, 10(x3)
  - puts in x1 the value situated in memory at the address obtained by adding 10 to the content of x3.
  - x1=Memory[x3+10]
- bne a1, a2, label
  - branch to address of label if values in a1 and a2 are different.
  - if (a1 != a2) then \$PC=label

#### Format of Risc-V instructions

- 3 types of format: R-Type, I-Types and B-Types
- R-types:

| C | ) 6    | /  | 11 12 14 | 15 19 | 20 24 | 25    | 31 |
|---|--------|----|----------|-------|-------|-------|----|
|   | opcode | rd | func3    | rs1   | rs2   | func7 |    |

- Used for 3-registers instructions
- opcode is the operation code that specifies the operation
- rs1 and rs2 are the first and second source register
- rd is the destination register
- func3 and func7 are used with op to select arithmetic operation (additionnal opcode fields)

Tanguy Risset ARC: Computer Architecture 77

introduction Occupation Selectrons and Logic Occupation Occupatio

• I-Types instruction are used for load, store, branch and immediate instruction.

| 0 | 6      | 7  | 1112 14 | 15 19 | 20        | 31 |
|---|--------|----|---------|-------|-----------|----|
|   | opcode | rd | func3   | rs1   | imm[0:11] |    |

- rs1 is a source register
- rd is a destination register
- func3 is additionnal opcode field
- The imm field is a 16 bit's integer in two's-complement code, ranging from -32 768 to 32 767 (remind that this is a problem in many cases)

#### **B-Types** instruction

B-Types instruction are used for Branch instructions

| 0 | 6      | 7 7         | 7 11     | 12 14 | 15 19 | 20 24 | 25 | 29       | 3031        |
|---|--------|-------------|----------|-------|-------|-------|----|----------|-------------|
|   | opcode | imm<br>[11] | imm[1:4] | func3 | rs1   | rs2   | i  | mm[5:10] | imm<br>[12] |

- The imm split-field is a 13 bit's integer containing an address (always even hence bit 0 is implicitely 0).
- can jump from address 0 to  $2^{13}$ =1MB from \$PC.
- For longer jump, on can use others instrction (PC absolute), barely used.

Basic arithmetic and logic instruction

- R-Types instructions: add, sub and, or, xor
  - add rd, rs1, rs2 // rd = rs1 + rs2
  - xor rd, rs1, rs2 // rd = rs1 r̂s2
- I-types for immediate operand operation:
  - addi rd, rs1, 4 // rd = rs1 + 4
  - li rd, 4 // rd = 4, pseudo (addi rd, zero, 4)

#### Load and store

- load and store operation use indexed addressing
  - the address operand specifies a signed constant and a register
  - These values are added to generate effective address
- byte instruction: 1b and sb transfer one byte

```
    lb rd, 20(rs1) // rd=Memory[rs1+20][0:7]
    lw rd, 20(rs1) // rd=Memory[rs1+20] (i.e.[0:31]
    sb rd, 20(rs1) // Memory[rd+20][0:7]=rs1
```

- sb stores only the lowest byte of operand register
- Word instruction: 1w and sw operates on word (i.e. 32 bits)
- Remind that address have to be aligned to 32 bit world, hence must be multiple of 4.



#### **Branches**

- Conditional branch
  - bne rs1, rs2, Label
  - if rs1 and rs2 have different values, the next instruction to execute is at address Label (i.e. pc = Label
  - beq rs1, rs2, Label // same thing if rs1=rs2
- Unconditionnal branch
  - j offset // next instruction executed is at address
     PC+offset
  - jr rs1 // next instruction executed is at address contained in rs1
- These are the only way of implementing loops in assembly:

```
[...]
    li s2, 1
while: beq s1, zero, done
    sub s1, s1, s2
    j while
done:
```

```
t2=1
while (t1 != 0) {
   t1 = t1 - t2
}
```

#### Function control flow in RISC-V

- RISC-V uses the jump-and-link (jal) instruction to call functions
  - Example:

- saves the return address (i.e. the address of the following instruction) in the ra register and jumpt to the label label (code of fact function)
- At the end of the execution of fact, the instruction ret jumps back to the address stored in ra (pseudo: jalr x0, ra, 0)
- Arguments transmited to Fact are stored in registers a0 ...a7
- Return values of Fact are stored in registers a0, a1

|                      |     |          |        |      |            | <b>◆□→◆</b> | ₽ ► ◀ ≣ ► ◀ | <b>∄</b> | 90 | 10   |
|----------------------|-----|----------|--------|------|------------|-------------|-------------|----------|----|------|
|                      |     | Tanguy R | Risset | ARC: | Computer A | rchitecture |             |          | 8  | 3    |
| introduction<br>0000 |     |          |        |      |            |             | The Russian |          |    | Meal |
| ) A /I               | . 1 |          |        |      |            | II a        |             |          |    |      |

#### Who save the register during Function call?

- When a function call occurs: jal ra, fact, who save the register?
  - The Caller (who knows which register he will use after the call)?
  - Or the callee (who knows which register he will use during its execution)?
- This convention is part of the *calling convetion* or ABI *application* binary interface.
- For MIPS:
  - ra, t0 ...t6, a0 ...a7, are caller saved
  - fp, s1 ...s11 are callee saved

#### Function call example with MIPS

- Let says: function B calls function C
- Function B wants to save t0, t1 and a0 because it will need them
  after the return of C.
- this is done using the stack via the stack pointer sp



- The stack is use to store all local information (in the sense local to the current function)
- That includes (say for function C):
  - local variable
  - Callee saved register if needed
  - Return address (i.e. the instruction following the jal C instruction).
  - (sometimes) the parameters passed to C
  - (sometimes) the result of C
  - In many ISA, the parameters and the results are passed through dedicated registers
- All these data constitute the frame of the fonction instance.
- the frame pointeur points to the frame of the current function
- For MIPS, the frame pointer is fp

#### Function B calls C

```
В
                       beguinning of
    sw t0,0(sp)
                     saving
                             t0 in stack
    sw t1,-4(sp)
                     saving
                             t1 in stack
    sw a0, -8(sp)
                     saving
                             a0 in stack
    sub sp,sp,12
                     correct stack pointer
                     call to C function
    jal ra, C
    lw a0,4(sp)
                     restoring return addresse of B from stack
    lw t1,8(sp)
                     restoring s1 from stack
    sw t0,12(sp)
                     restoring
                                 s0
    add sp, sp, 12
                             stack pointeur value
                     adjusst
    . . .
                    end of B
    ret
    . . .
```

### Sketching code of C function

```
C:
                           # C need 40 Bytes for its frame
    addi
            sp, sp, -40
            ra,32(sp)
                          # store return address (inst. in B)
    SW
            fp,28(sp)
                          # store frame pointer
    SW
            s0,24(sp)
                          # store s0 (because C uses it)
    SW
                          # fp <- sp: frame pointer of C set
            fp,sp
    move
```

lw ra,32(sp) # ra <- return address (in B)
lw fp,28(sp) # fp <- frame pointeur of B
lw s0,24(sp) # restore s0
addi sp,sp,40 # sp <- sp+40, restore B stack pointer
ret # return to ra (B function)

#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 5 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 8 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- Pipelining RISC instructions: the "Von Neumann" cycle



#### Procedure abstraction

- Let's pause a while to come back to high level langage
- What is a function (or a procedure)?
- How its isolation mecanisme (local variable) is implemented?
- This is implemented with a very fundamental mecanism: the Stack and the Activation Record (or Frame) of each procedure.

#### Notion of procedure

- Procedures (or functions) are the basic units for compilers
- Three important abstraction:
  - Control abstraction: parameter passing and result transmission
  - Memory abstraction: variable lifetime (local variables)
  - Interface: procedure's signature



#### Procedure Control Transfer

- Transfer mechanism of control between procedures:
  - when calling a procedure, the control is given to the procedure called;
  - when this called procedure ends, the control is returned to the calling procedure.
  - Two calls to the same procedure create two em independent instances (or invocations).
- two useful graphic representations:
  - The call graph: represents the information written in the program.
  - The call tree: represents a particular execution.

## Call Graphre calc;

```
begin { calc}
                                         Call Graph:
end;
procedure call<sub>1</sub>;
                                            main
     var y...
     procedure call<sub>2</sub>
           var z: ...
           procedure call<sub>3</sub>;
                                                             Call1
                 var y....
                 begin { call<sub>3</sub>}
                       x:=...
                       calc;
                                                             Call2
                 end:
           begin { call<sub>2</sub>}
                 z:=1;
                 calc:
                                                             Call3
                  call_3;
           end:
     begin { call_1 }
           call_2;
                                                             Calc
     end:
```

## Call Tree procedure calc;

begin { calc}

end;
procedure call<sub>1</sub>;
var y...
procedure call<sub>2</sub>
var z: ...
procedure call<sub>3</sub>;
var y....
begin { call<sub>3</sub>}
x:=...
calc;
end;
begin { call<sub>2</sub>}
z:=1;

calc:

call<sub>3</sub>;

end:

begin  $\{ call_1 \}$  $call_2;$ 

end:

Call tree for one particular execution:



main calls call<sub>1</sub>
call<sub>1</sub> calls call<sub>2</sub>
call<sub>2</sub> calls calc
calc returns to call<sub>2</sub>
call<sub>2</sub> calls call<sub>3</sub>
call<sub>3</sub> callscalc
calc returns to call<sub>3</sub>
call<sub>3</sub> returns to call<sub>2</sub>
call<sub>2</sub> returns to call<sub>1</sub>
call<sub>1</sub> returns to main

#### **Execution Stack**

- The transfer of control mechanism between procedures is implemented thanks to the *execution stack*.
- The programmer has this vision of virtual memory:



- The heap is used for dynamic allocation.
- The *stack* is used for the management of contexts of procedures (local variable, etc.)



#### Activation record

- Calling a procedure: Stacking the activation record (or frame).
- Need of a dedicated pointer for that: the activation record pointer (ARP) or frame pointeur (fp))
- The frame allows to set up the *context* of the procedure.
- This frame contains
  - The space for local variables declared in the procedure
  - Information for restoring the context of the calling procedure:
    - Pointer to the frame of the calling procedure (ARP or FP for em frame pointer).
    - Address of the return instruction (statement following the call of the appellant proceedings).
    - Eventually save the state of the registers at the time of the call.





### Return to calling function



| introduction History Electrons and Logic Processor Architecture Automate The Russian train example Mea |  | Tanguy Risset | ARC: Computer Arc | _ | rraerer e | 99           |
|--------------------------------------------------------------------------------------------------------|--|---------------|-------------------|---|-----------|--------------|
|                                                                                                        |  |               |                   |   |           | Meal<br>0000 |

#### Table of Contents

- 1 introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 1 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- 12 Pipelining RISC instructions: the "Von Neumann" cycle

### Coming back to previous call example with B and C

- Let says: function B calls function C
- Function B wants to save t0, t1 and a0 because it will need them after the return of C.
- this is done using the stack via the stack pointer sp



- The stack is use to store all *local* information (in the sense local to the current function)
- That includes (say for function C):
  - local variable
  - Callee saved register if needed
  - (sometimes) Return address (i.e. the instruction following the jal ra, C instruction).
  - (sometimes) the parameters passed to C
  - (sometimes) the result of C
  - In many ISA, the parameters and the results are passed through dedicated registers
- All these data constitute the frame of the fonction instance.
- the frame pointeur points to the frame of the current function
- For RISC-V, the frame pointer is fp

#### Function B calls C

```
В
                       beguinning of
    sw t0,0(sp)
                     saving
                             t0 in stack
    sw t1,-4(sp)
                     saving
                             t1 in stack
    sw a0, -8(sp)
                     saving
                             a0 in stack
    sub sp,sp,12
                     correct stack pointer
                     call to C function
    jal ra, C
    lw a0,4(sp)
                     restoring return addresse of B from stack
    lw t1,8(sp)
                     restoring s1 from stack
    sw t0,12(sp)
                     restoring
                                 s0
    add sp, sp, 12
                             stack pointeur value
                     adjusst
    . . .
                    end of B
    ret
    . . .
```

## Sketching code of C function

```
C:
                           # C need 40 Bytes for its frame
    addi
            sp, sp, -40
            ra,32(sp)
                           # store return address (inst. in B)
    SW
            fp,28(sp)
                          # store frame pointer
    SW
            s0,24(sp)
                           # store s0 (because C uses it)
    SW
                          # fp <- sp: frame pointer of C set
            fp,sp
    move
            ra,32(sp)
                           # ra <- return address (in B)</pre>
    lw
            fp,28(sp)
                          # fp <- frame pointeur of B
    lw
            s0,24(sp)
    lw
                           # restore s0
            sp, sp, 40
                          # sp <- sp+40, restore B stack pointe
    addi
                           # return to ra (B function)
    ret
```

### RISC-V Assembly for programme fib

#### Fibbonacci suite program:

```
int fib (int i)
{
   if (i<=1) return(1);
   else return(fib(i-1)+fib(i-2));
}
int main (int argc, char *argv[])
{
   fib(2);
}</pre>
```

## Assembleur RISC-V pour programme fib

```
fib:
           sp,sp,-48 # SP <- SP-48 :AR de 48 octet (12 mots)
   addi
         ra,40(sp)
                      # stocke adresse retour (64 bits) a SP+40
    sd
         s0,32(sp)
                      # sauvegarde registre s0
         s1,24(sp)
                      # sauvegarde registre s1
    sd
           s0,sp,48 # s0=ARP/FP <- SP
   addi
   mv
         a5.a0
                       # a5 <- arg1 (N)
   sw
         a5,-36(s0)
                       # stock arg1 (N) dans la pile (SP-12)
         a5,-36(s0)
                      # instruction inutile (supprimée si optimisation)
   lw
    sext.w
           a4,a5
                      # a4 <- sign extension a5(32)
   li
         a5,1
                       # a5 <- 1
                     # if (a4 > 1) sauter a .L2)
          a4,a5,.L2
   bgt
         a5,1
                      # ici on a arg1=N<=1 donc a5 <- res=1
   li
                       # sauter à .L3
    j
.L2:
             a5,-36(s0)
                          # ici N>1, a5 <- N
   lw
   addiw
             a5,a5,-1
                          # a5 <- a5 - 1
   sext.w
             a5,a5
                          # sign extension
             a0,a5
   mv
                          # a0 <- a5 (set arg in a0 for recursive call)
             fib
   call
                          # recursive call
   mv
             a5,a0
                          # a5 <- result from recursive call
                          # s1 <- a5
   mv
             s1,a5
   lw
             a5,-36(s0)
                          # a5 <- N
                          # a5 <- a5 -2
   addiw
             a5,a5,-2
             a5,a5
                          # sign extension
   sext.w
             a0,a5
                          \# a0 <- a5 (set arg in a0 for recursive call)
   mν
             fib
   call
                          # recursive call
             a5,a0
                          # a5 <- result from recursive call
   mv
    addw
             a5,s1,a5
                          # a5 <- fib(N-1)+fib(N-1)
    sext.w
             a5,a5
                          # sign extension
```

## Assembleur RISC-V pour programme fib

```
.L3:
                             #a0 <- a5 (set result in a0)
    mv
               a0,a5
    14
               ra,40(sp)
                             # restaure ra
    ld
               s0,32(sp)
                            # restaure s0
    ld
               s1,24(sp)
                            # resaure s1
    addi
               sp,sp,48
                            # restaure sp
    jr
               ra
                             # return
    .size
               fib, .-fib
    .section .rodata
    .align
               3
.LCO:
                "le resultat est %d "
    .string
               2
    .align
    .globl
               main
               main, @function
    .type
main:
    addi
               sp, sp, -32
                           # set AR for main
    sd
               ra,24(sp)
    sd
               s0,16(sp)
    addi
               s0,sp,32
               a5,a0
                            #store arg og main
    mv
               a1,-32(s0)
    sd
               a5,-20(s0)
    sw
               a0,2
                            # we call fig(2)
    call
               fib
    \boldsymbol{m}\boldsymbol{v}
               a5,a0
                            # get fib result
    mν
               a1,a5
                            #set args for printf
               a0,.LC0
    lla
    call
               printf@plt
               a5,0
                                                                        ◆□▶ ◆圖▶ ◆圖▶ ◆圖▶
                                                                                                             200
    li
                                                   ARC: Computer Architecture
                                                                                                              107
                                Tanguy Risset
```

introduction History Electrons and Logic Processor Architecture Automate The Russian train example Mea

example 1, if-then

bne s0, s1, Test add s2, s0, s1

Test:

### example 2, if-then-else

```
beq s4, s5, Lab1
add s6, s4, s5
j Lab2
```

Lab1: sub s6, s4, s5

Lab2:

introduction 0000

ARC: Computer Architecture 109

Tanguy Risset

History Electrons and Logic Processor Architecture Automate Occosion Occosi

example 3, looping

li t2, 0 li t3, 1 while: beq t1, zero, done add t2, t1, t2 sub t1, t1, t3 j while

done:

#### example 4, static variable

```
.globl main
 .type main, @function
          .data
 var1:
          .word
                 23
                            # declare storage for var1; initial
                            # value is 23
          .text
 main:
          lw t0, var1
                              # load contents of RAM location int
                              # register $t0: t0 = [var1] ( = 23
          li t1, 5
                                  $t1 = 5
                                            ("load immediate")
          la t2, var1
                              # load address of var1
          sw t1, (t2)
                              # store contents of register t1
                              #into RAM:
                                           [var1] = t1 = 5
 done:
                                           ir
               ra
                    Tanguy Risset
                               ARC: Computer Architecture
introduction
                             Processor Architecture Automate The Russian train example
example 5, array accesses
          .globl main
          .type main, @function
          .data
                               declare 12 bytes of storage to
 array1: .space 12
                            # hold array of 3 integers
          .text
          la t0, array1
                               load base address of array into
 main:
                            #register $t0
                            #t1 = 5 ("load immediate")
          li t1, 5
          sw t1, (t0)
                            #first array element set to 5;
                            #indirect addressing
          li t1, 13
                            #t1 = 13
          sw t1, 4(t0)
                            #second array element set to 13
```

#third array element set to -7

#t1 = -7

li t1, -7

ra

ir

sw t1, 8(t0)

### Documentation on RISV-V assembly

- The RISC-V Instruction Set Manual Volume I: User-Level ISA
  - https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf
- Risc-V assembly manual on github
  - https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md
- CheatSheet on ARC Moodle site.

|                      |                     |                  | 4 □ ▶ 4 ₫  | ₹ .     | ) Q (4      |
|----------------------|---------------------|------------------|------------|---------|-------------|
|                      | Tanguy Risset       | ARC: Computer Ar | chitecture |         | 113         |
| introduction<br>0000 | Electrons and Logic |                  |            | example | <b>Meal</b> |
|                      |                     |                  |            |         |             |

#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 The Russian train example
- Mealy and Moore Automata
- 8 Instruction Set Architecture (ISA, assembleur in French)
- The RISCV ISA example
- 10 Function, procédure et Pile d'exécution
- Coming back to RISC-V
- 12 Pipelining RISC instructions: the "Von Neumann" cycle





Program execution on a Processor (8 general purpose registers)



introduction History Electrons and Logic Processor Architecture Automate The Russian train example M

## Program execution on a Processor (8 general purpose registers)



Tanguy Risset ARC: Computer Architecture 115

introduction History Computer Architecture Automate Computer Architecture Automate Computer Architecture Computer Computer Architecture Computer Architecture Computer Comp

Program execution on a Processor (8 general purpose registers)



9 Q Q



Tanguy Risset ARC: Computer Architecture 115

introduction History Electrons and Logic Processor Architecture Automate The Russian train example Meal

## Program execution on a Processor (8 general purpose registers)





# Program execution on a Processor (8 general purpose registers)





Program execution on a Processor (8 general purpose registers)





# Program execution on a Processor (8 general purpose registers)



## Program execution on a Processor (8 general purpose registers)



Program execution on a Processor (8 general purpose registers)





## The "Von Neumann cycle"

- The so-called Von Neumann cycle is simply the decomposition of the execution of an instruction in several independent stages.
- The number of stages depend on the processor, usually 5 stages are commonly used as example:
  - Instruction Fetch (IF)
    - Reads the instruction from memory (at address \$PC) and write it in \$IR.
  - Instruction Decode (ID)
    - computes what needs to be computed before execution: jump address destination, access to register, etc.
  - Execute (EX)
    - executes the instruction: ALU computation if needed
  - Memory Access (MEM)
    - Loads (or stores) data from memory if needed
  - Write Back (WB)
    - Writes the result into the register file if needed

## The MIPS example

- The RISC paradigm was invented by Berkeley and popularized by Henessy and Patterson in the book on MIPS
- MIPS stands for Microprocessor without Interlocked Pipeline Stages and propose and architecture to execute each stage independently



from MIPS website https://www.mips.com/



- Use Christian Wolf slides for explaining MIPS instruction pipeline
- Here

### example of MIPS pipeline CPU architecture

Taken from Henessy/patterson book





#### Illustration of bubble on MIPS

- When next instruction cannot be fetched directly (because it need the result of previous instruction for instance) it creates a "bubble"
- For instance: an addition using a register that was just loaded
- The value of the register will be available after the MEM stage of first instruction, hence we can delay on only on cycle, provided there is a shortcut.



#### Another illustration of instruction pipeline

- Go back to our previous representation of the processor and memory:
  - Von Neumann computer= Memory + CPU
  - CPU= = control Unit + Datapath
  - Datapath= ALU + Register file



# A pipeline example from MIPS

- Execute the sequence of assemby instruction:
  - load value at address 500 in register R0
  - Add 1 to R0 and put result in R1
  - store value of Register R1 at address 500
- (Think of i=i+1)
- Code:

```
la R0,500
add R1, R0, 1
sw R1,500
```

#### First possible execution: without pipeline

• Before execution starts, \$PC contains the address of the first instruction: 100





Instruction Fetch



#### Instruction Decode





#### • Execute (nothing for load)



#### Memory access





#### Write Back



- increment \$PC
- Fetch next instruction
- etc. etc.



Counting CPI for non-pipelined architecture

- CPI= Cycle per instruction
- 5 cycles for executing on instruction
- ullet  $\Rightarrow$  15 cycles for 3 instructions.

### Example of pipelined execution

Instruction Fetch (for 'load' instruction)





- Instruction Decode (for load)
- Instruction Fetch (for 'nothing' because of a bubble: instruction 'add' delayed)



- Execute (for load: nothing to do)
- Instruction Decode (for 'nothing')
- Instruction fetch (for 'add')



Tanguy Risset

ARC: Computer Architecture

999 133

introduction History

The Russian train example

Mea

#### cycle 4

- Memory access (for load)
- Execute (for 'nothing')
- Instruction Decode (for add)
- Instruction fetch (for store)



- Write Back (instruction load)
- Memory access (for 'nothing')
- Execute (instruction add: bypass)
- Instruction Decode store



Tanguy Risset

ARC: Computer Architecture

√) Q (¬)

135

introduction Histo

History Electrons and Logic Processor A

Processor Architecture

Automate

The Russian train example

ple Meal

#### cycle 6

- Write Back (for 'nothing')
- Memory access (instruction add, nothing to do)
- Execute (instruction store: nothing to do)



- Write Back (instruction add)
- Memory access (instruction store: bypass)





# Counting CPI for both architectures

- Non-pipelined architecture:
  - 5 cycles for one instruction
  - ullet  $\Rightarrow$  15 cycles for 3 instructions.
- Pipelined architecture:
  - 5 cycles for one instruction
  - 8 cycles for 3 instructions.
  - ⇒ without bubbles, one instruction per cycle
  - A 'jump' instruction interrupt the pipeline (need to wait for the address decoding to fetch next instruction) ⇒ pipeline stall
  - Some ISA allow to use these delay slots: one or two instruction after the jump are executed before the jump occurs.

# Du langage à l'exécution

990 ARC: Computer Architecture 139 Tanguy Risset introduction 0000 History 000 Electrons and Logic Processor Architecture The Russian train example 000000000 Automate Mea

# Rappels d'architecture



#### Architecture view from the programmer

- Modern systems allow
  - To run multiple independent programs in parallel (process)
  - To access memory space larger than physical memory available (virtual memory)
- For the programmer: all this is transparent
  - Only one program runs with very large memory available
- The processor view memory contains:
  - The code to execute
  - Static data (size known at compile time)
  - Dynamic data (size known at runtime: the heap, and the space needed for the execution itself: the battery)
- The programmer sees only the data (static and dynamic)



• the complete process will translate a C program into code executable (loading and execution will take place later).



- We often call compilation the set compiler + assembler
- The gcc compiler also includes an assembler and linking process (accessible by options)

#### Your compilation process

- The programmer:
  - Write a program (say a C program: ex.c)
  - Compiles it to an object program ex.o
  - links it to obtain an executable ex

#### content of ex.c

```
#include <stdio.h>
int main()
{
   printf("hello World\n");
   return(0);
}
```



• The compilation process is divided in 3 phases:



#### Compilation: the front-end

- The front end of an embedded code compiler uses the same techniques as traditional compilers (we can want to include assembler parts directly)
- Parsing LR(1): the parser is usually generated with dedicated metacompilation tools such as Flex et bison for GNU

- - Some phases of optimizations are added, they can be very calculative
  - Some example of optimisation independent of the target machine architecturre
    - Elimination of redundant expressions
    - dead code elimination
    - constant propagation
  - Warning: optimization can hinder the understanding of the assembler (use the -O0 options with tt gcc)





Compilation: The back-end

- The code generation phase is dedicated to architecture target.
   Retargetable compilation techniques are used for architectural families.
- The most important steps important are:
  - Code selection
  - Register allocation
  - instruction scheduling

introduction

- The gcc command runs several program depending on the options
  - The pre-processer cpp

History Electrons and Logic

- The compiler cc1
- The assembleur gas
- The Linker 1d
- gcc -v allow to visualize the different programs called by gcc

Tanguy Risset ARC: Computer Architecture 149

introduction History 5000 Processor Architecture 000000 Processor Architecture Automate 000000 Processor Architecture 0000000 Processor Processor Architecture 00000000 Processor Processor Processor Architecture 0000000 Processor Pro

- the task of the pre-processor are :
  - elimination of comments,
  - inclusion of source files
  - macro substitution (#define)
  - conditionnal compilation.
- Example:

```
#define MAX(a, b) ((a) > (b) ? (a) : (b))
...
f=MAX(3,b);
```

```
#define MAX(a, b) ((a) > (b) ? (a) : (b))

ex1.i

f=((3) > (b) ? (3) : (b));
```

## The compiler cc1 or gcc -S

- generate assembly code
- gcc -S main.c -o main.S
- Exemple :

```
void main()
{ int i;
 i=0;

while (1)
 {
    i++;
    nop();
 }
}
```

Assembly code generated (for MSP430)

```
#2558,
                            stack initialization de la pile
mov
                SP
                           ; r4 <- SP
       r1,
            r4
mov
       #0, 0(r4)
mov
                             i initialization
       0(r4)
inc
                            i++
                           ; nop();
nop
                             unnconditionnal jump (PC-6):
       $-6
jmp
        SP
incd
      #0x1158
br
```

# Assembly code produce by mspgcc -S

```
.global
                main
                       main, @function
         .type
main:
/* prologue: frame size = 2 */
.L__FrameSize_main=0x2
.L__FrameOffset_main=0x6
        mov
                  #(__stack-2), r1
                  r1, r4
        mov
/* prologue end (size=3) */
                  #11o(0), @r4
        mov
.L2:
                  #llo(1), @r4
        add
        nop
                  .L2
        jmp
/* epilogue: frame size=2 */
        add
                  #2, r1
                  #__stop_progExec__
/* epilogue end (size=3) */
/* function main size 14 (8)
                                ARC: Computer Architecture
                   Tanguy Risset
```

Electrons and Logic Processor Architecture Automate The Russian train example Me

Assembler as ou gas

- transform an assembly code into object code (binaire representation of symbolic assembly code)
- Option -c of gcc allow to conbine compilation et assembly gcc -c main.c -o main.o

#### Linking: 1d

- Produce the executable (a.out by default) from object code of programs and library used
- There are two ways to use libraries in a program
  - Dynamic or shared libraries (default option): the code of the library is not included in the executable, the system dynamically loads the code of the library in memory when calling the program. We need than only one version of the library in memory even if several programs use the same library. The library must be em installed on the machine, before running the code.
  - Static libraries: the code of the library is included in the executable. The
    executable file is bigger but you can run it on a machine on which the
    library is not installed.

|                      |                |               |                        | <b>◆ □ → ◆ ₫</b> | ► =        | 200          |
|----------------------|----------------|---------------|------------------------|------------------|------------|--------------|
|                      |                | Tanguy Risset | ARC: Computer Ar       | chitecture       |            | 155          |
| introduction<br>0000 | History<br>000 |               | Processor Architecture |                  | in example | Meal<br>0000 |
| Binary               | file           | manipulation  |                        |                  |            |              |

# Some usefull command:

• nm

Allow to know symboles (i.e. label: function names) used in an object file or executable

trisset@hom\\$ nm fib.elf | grep main
000040c8 T main

• objdump allow to anlayze a binary file. For instance it can get correspondance between binary representation and assembly code trisset@hom\$ objdump -f fib

```
fib: file format elf32-msp430
architecture: msp:43, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00001100
```