### ARC: Computer Architecture

tanguy.risset@insa-lyon.fr Lab CITI, INSA de Lyon Version du March 20, 2023

Tanguy Risset

March 20, 2023



### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 5 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procédure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



### ARC course presentation

- Schedule:
  - Course 6h
  - labs (TP) 20h
  - Evaluation (In french): un QCM et un devoir papier en fin de cours
- skills and knowledge learned in ARC cours:
  - Bolean logic, arithmetics
  - combinatorial and sequential logic circuits, automata.
  - Processor architecture, datapath, compilation process, RISC architecture
  - Assembly code, link with high level programming languages
  - Simple processor design, simple assembly program analysis.
  - Link with compilation, operating systems and programming
- Moddle (open): frames, labs, various document
- Course based on the two IF architecture course: AC and AO (open courses on Moodle).





### Computer architecture usefulness

- How to solve a problem with electrons:
- ARC is useful
  - For general knowledge of a computer scientist
  - To understand pro/cons of modern complex architectures
  - For embedded system programming

Problem

Algorithm

Program

Run-Time system

Architecture/ISA

Micro-Architecture

Logic

Circuit

Electrons



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- 4 Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- Function, procÃ(C)dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



### History of computing

from Yale Babylonian Collection,  $\simeq$  1600 BC

- Ancient time: various arithmetics systems
- 17th century (Pascal and Leibniz): notion of mechanical calculator
- 1822 Charles Babbage Difference engine (tabulate polynomial http://www.math.ubc.ca/~cass/Euclid/ybc/ybc.html functions)

  Difference Machine close-up
- 1854 Georges Boole proposes the so-called Boolean logic.
- (More details on the poly or on Internet)





By By Carsten Ullrich - Own work, CC BY-SA 2.5

### History of computers

- 1936: Alan Turing's PhD on a universal abstract machine
- 1941: Konrad Suze builds the Z3 first programmable computer (electro-mechanic)
- 1946: ENIAC is the first electronic calculator
- 1949: Turing and Von Neumann build the first universal electronic computer: the Manchester Mark 1
- (More details on the poly or on Internet)



Z3 computer at Deutches Museum, Munich



By Venusianer, CC BY-SA 3.0



### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- **MIPS ISA**
- 13 Function, procÃ(C)dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



- - Discovered in 1947 at Bell Labs: (transfer resistor)
  - Could replace the thermionic triode (vacuum tube) that allow radio and telephone technologies.
  - Principle: flow or Interrupt current between Source and Drain, depending on Gate status
    - Can be seen as a switch
    - Wildly used after Integrated Circuit invention (1958)



### Popular Transistor technology: CMOS

- CMOS: Complementary Metal Oxide Semiconductor
- Two logical levels : 0 = 0V and 1= 3V
- Two types of transistors
  - nMOS : current flows if gate is 1
  - pMOS : current flows if gate is 0
- Mainly used to realize basic logical gates (NOT, NAND, NOR, etc.)



### Moore's low

- Gordon Moore, co-founder of Fairchild Semiconductor and Intel, predicted in "a doubling every two year in the number of components per integrated circuit"
- Contributed to world economic growth
- Slow down in 2015 and is ended now.



### Boolean functions

Boole Algebra is equipped with three operations

- a unary operation, negation, noted NOT;
- two binary commutative, associative operations:
  - conjunction AND, with 1 as neutral element;
  - disjunction OR, with 0 as neutral element;
- AND is distributive over OR

If a and b are 2 boolean variables, we write:

$$NOT(a) = \overline{a}$$
,  $AND(a, b) = ab = a.b$ ,  $OR(a, b) = a + b$ 

### Boolean Cheat Sheet

- neutral elements: a + 0 = a,  $a \cdot 1 = a$
- absorbing elements: a+1=1,  $a\cdot 0=0$
- idempotence: a + a = a,  $a \cdot a = a$
- tautology/antilogy:  $a + \overline{a} = 1$ ,  $a \cdot \overline{a} = 0$
- commutativity: a + b = b + a, ab = ba
- distributivity:  $a + (bc) = (a + b)(a + c), \quad a(b + c) = ab + ac$
- associativity: a + (b + c) = (a + b) + c = a + b + c,

$$a(bc) = (ab)c = abc$$

- De Morgan's law:  $\overline{ab} = \overline{a} + \overline{b},$   $\overline{a+b} = \overline{a} \cdot \overline{b}$
- others: a + (ab) = a,  $a + (\overline{a}b) = a + b$ ,

$$a(a+b)=a, \quad (a+b)(a+\overline{b})=a$$

### Elementary logical gates



Amplifier:

$$F = x$$

| X | F |
|---|---|
| 0 | 0 |
| 1 | 1 |





AND: 
$$F = x y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

NAND: 
$$F = \overline{(x \ y)}$$

| X | У | F |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

Tanguy Risset

ARC: Computer Architecture

990

History 000 Electrons and Logic Processor Architecture Automate O00000 Automata in langage theory

◆□▶ ◆圖▶ ◆臺▶ ◆臺▶ ■

### Elementary logical gates



OR:

$$F = x + y$$

|   | X | У | F |
|---|---|---|---|
| - | 0 | 0 | 0 |
|   | 0 | 1 | 1 |
|   | 1 | 0 | 1 |
|   | 1 | 1 | 1 |



XOR:

$$F = x \oplus y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

NOR:

$$F = \overline{(x+y)}$$

| X | У | F |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 0 |



XNOR:

$$F = x \odot y$$

| X | У | F |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

### Combinatorical circuit Design

- Boolean description of the problem:
  - Compute y and z from a, b and c
  - y is 1 if a is 1 or b and c are 1.
  - z is 1 if b or c is 1 (but not both) or if a, b et c are 1.
- 2 Truth table
- Section Logic Equation

• 
$$y = \overline{a}bc + a\overline{b}\overline{c} + a\overline{b}c + ab\overline{c} + abc$$

• 
$$z = \overline{a}\overline{b}c + \overline{a}b\overline{c} + a\overline{b}c + ab\overline{c} + abc$$

- Optimized logic equations
  - y = a + bc
  - $z = ab + \overline{b}c + b\overline{c}$
- logic gates





# Disjunctive Normal Form (DNF)

- In Boolean logic, a logical formula in Disjunctive Normal Form (Forme normale disjonctive in French) if:
  - It is a disjunction of one or more clauses
  - where the clauses are conjunction of literals
  - a literal is a variable, a constant or 'not' a variable
- Otherwise put, it is an OR of ANDs.
- Example of DNF:
  - $x.\bar{y}.\bar{z} + \bar{t}.u.v$
  - $(a \wedge b) \vee \neg c$
- Example not in DNF:
  - $\bullet$   $\overline{(x+y)}$
  - $a \lor (b \land (c \lor d))$

### Conjunctive Normal Form (CNF)

- In Boolean logic, a formula is in conjunctive normal form (forme normale conjunctive in French) if:
  - it is a conjunction of one or more clauses,
  - where a clause is a disjunction of literals;
  - a literal is a variable, a constant or 'not' a variable
- Otherwise put, it is an AND of ORs.
- Example of CNF:
  - $(x+y+\bar{z})(\bar{x}+z)$
  - $(a+\bar{b}+\bar{c})(\bar{d}+\bar{a})$
  - $\bullet$  x + y
- Example not in CNF
  - $\bullet$  (x+y)
  - x(y + (z.t))

|                                                               |                  | <b>◆ □ → ◆ </b> |                 |            | ) Q (~ |
|---------------------------------------------------------------|------------------|-----------------|-----------------|------------|--------|
| Tanguy Risset                                                 | ARC: Computer Ar | chitecture      |                 |            | 19     |
| introduction History 6000 COO COO COO COO COO COO COO COO COO |                  |                 | Automata in lan | gage theor | y The  |
|                                                               |                  |                 |                 |            |        |

### From Truth table to DNF

- Back to previous example (z is 1 if b or c is 1 (but not both) or if a, b et c are 1.)
- Truth table on the right, z is 1 if and only if one of the five condition identified occurs.
- It is easy to find a conjunction that is valid in a unique case: example:  $\bar{a}.\bar{b}.c$  is 1 if and only if:  $a=0,\ b=0$  and c=1 (double arrow on the right)
- by adding all the conjunction valid only on each of the five cases identified on the right, we get a DNF formulae that has exactly that truth table.

| i | npu |   |   |              |
|---|-----|---|---|--------------|
| а | b   | С | Z |              |
| 0 | 0   | 0 | 0 |              |
| 0 | 0   | 1 | 1 | $\Leftarrow$ |
| 0 | 1   | 0 | 1 | $\leftarrow$ |
| 0 | 1   | 1 | 0 |              |
| 1 | 0   | 0 | 0 |              |
| 1 | 0   | 1 | 1 | $\leftarrow$ |
| 1 | 1   | 0 | 1 | $\leftarrow$ |
| 1 | 1   | 1 | 1 | $\leftarrow$ |

Hence the possible formulae for z:  $z = \overline{a}\overline{b}c + \overline{a}b\overline{c} + a\overline{b}c + ab\overline{c} + ab\overline{c}$  How can it be simplified?

### Simple Boolean optimization: Karnaugh Table (1)

• Karnaugh map (tables de Karnaugh) use a "visual" reprentation of a simple property:

$$(a.\bar{b}) + (a.b) = a.(\bar{b} + b) = a$$

- The first step in the method is to transform the truth table (3 or 4 input variables) of the function in a two-dimensional array (split into two parts of the set of variables)
- Rows and columns are indexed by the valuations of the corresponding variables in such a way that between two rows (or columns) only one boolean value changes.
- In our example (3 variables):

| a b | 0 0 | 0 1 | 1 1 | 1 0 |
|-----|-----|-----|-----|-----|
| С   |     |     |     |     |
| 0   | 0   | 1   | 1   | 0   |
| 1   | 1   | 0   | 1   | 1   |

# Simple Boolean optimization: Karnaugh Table (2)

- Then, we try to cover all '1' of the table by forming groups.
  - each group contains only adjacent '1'
  - must form a rectangle
  - the number of elements of a group must be a power of two.
- each group correspond to a possible optimization of the DNF

| • | In | our | example: |
|---|----|-----|----------|
|   |    |     | •        |

| a b | 0 0 | 0.1 | 11 | 1 0 |
|-----|-----|-----|----|-----|
| С   |     |     |    |     |
| 0   | 0   | 1   | 1  | 0   |
| 1   | 1   | 0   | 1  | 1   |

- example : Three groups:
  - $\bar{a}.b.\bar{c} + a.b.\bar{c}$  simplifies to  $b.\bar{c}$
  - $a.b.\bar{c} + a.b.c$  simplifies to a.b
  - $a.\bar{b}.c + \bar{a}.\bar{b}.c$  simplifies to  $\bar{b}.c$
- hence  $z = \overline{a}\overline{b}c + \overline{a}b\overline{c} + a\overline{b}c + ab\overline{c} + abc$  simplifies to  $z = a.b + \overline{b}.c + b.\overline{c}$

### Well formed cicruits

As far as combinatorial circuits are concerned, a "Well formed" circuit is:

- A logic gate
- A wire
- Two well formed circuits next to each other
- Two well formed circuits, the outputs of one being the inputs of the other
- Two well formed circuits sharing a common input
   It can be shown that it correspond to an acyclic graph of logic gates.
- No cycles, no ouptuts conected together



# Usefull combinatorics logic components

- n input multiplexer
- decoder  $log(n) \rightarrow n$
- n bits adder
- n bits comparator
- n bits ALU
- etc.

### Memorizing: latches and Flip-Flops

• Set-Reset Latch (SR latch, *Bascule RS*): When R and S are reset, Q and  $\overline{Q}$  keep their previous value.



| S | R | Q         | $\overline{Q}$       |
|---|---|-----------|----------------------|
| 0 | 1 | 0         | 1                    |
| 1 | 1 | forbidden | forbidden            |
| 1 | 0 | 1         | 0                    |
| 0 | 0 | $Q_{n-1}$ | $\overline{Q_{n-1}}$ |

• Gated D latch (Flip-flop, register, *Bascule D*): sample input data on clock rising edge and keeps the value when clock is 0.



### latches and Flip-Flops: other common representation

• Latch (verrou)



• Flip-Flop (register)

### Sequential logic

Sequential logic combines logic function and memorizing, it opens the way to synchronous circuits, automata, programs, algorithms....

- n bits register
- n bits counter
- state machine
- CPU
- Computer



- Extremely complex in general.
- Many computation models:
  - Sequential
    - State machine
    - control + data-path
  - task parallelism (communicating tasks)
  - Data parallelism (data-flow)
  - Asynchronous circuits
- Important notion use every where: finite state machine (automate)

### Logic in ARC: Digital software

In ARC: use of Digital software
(https://github.com/hneemann/Digital)

- Design basic logic components (TD1)
- Design of a memory (sequential component, TD2)
- Design of dedicated circuit: integer division (TD3).





| introduction History Electrons and Logic Processor Architecture Automate Automata in langage theory T |  | Tanguy Risset | ARC: Computer Ar | , , , , , , , , , , , , , , , , , , , | 29    |
|-------------------------------------------------------------------------------------------------------|--|---------------|------------------|---------------------------------------|-------|
| 0000 000 00000000000000000000000000000                                                                |  |               |                  | 0                                     | y The |

### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, proc©dure et Pile d'execution
- 14 Coming back to MIPS
- Some additionnal useful information
  - Example of MIPS code



### What is a Von Neumann machine?



- Computer architecture Model (also called *Princeton* architecture)
  proposed after J. Von Neumann report: "First Draft of a Report on
  the EDVAC".
- Usually abstracted as a processor connected to a memory
- The memory is accessed (randomly) with an address (i.e. unlike a Turing machine)
- The memory contains both data and program (unlike a Harvard machine).



#### Compilation, Assembly code and binary code

| High Level Language $\Rightarrow$ | Assembly code $\Rightarrow$ | Binary code $\Rightarrow$ |
|-----------------------------------|-----------------------------|---------------------------|
| int a,b,c;                        | load RO, @b                 | 0100101110101             |
| a = b + c;                        | load R1, @c                 | 0100101010001             |
|                                   | add R3,R0,R1                |                           |
|                                   | store R3, @a                | 1001001100011             |

### Fast compilation thanks to Donald Knuth (and others..)

- The programmer:
  - Write a program (say a C program: ex.c)
  - Compiles it to an object program ex.o
  - links it to obtain an executable ex

#### content of ex.c

```
#include <stdio.h>
int main()
{
   printf("hello World\n");
   return(0);
}
```





# Program execution on a Processor (8 general purpose registers)















Program execution on a Processor (8 general purpose registers)



990







Tanguy Risset

ARC: Computer Architecture

34

introduction History Cooccide Coocci





Tanguy Risset

ARC: Computer Architecture

Automata in langage theory

The computer Architecture on th



### Computer Architecture in ARC

- Design of a simple dedicated circuit in logisim
- Study of a simple processor in logisim
- Overview of assembly code principles
- Compilation basics
- embedded system case study

|                                          |                  | 4 🗆 🕨      |                          | 4)4(4   |
|------------------------------------------|------------------|------------|--------------------------|---------|
| Tanguy Risset                            | ARC: Computer Ar | chitecture |                          | 35      |
| introduction History Electrons and Logic |                  | Automate   | Automata in langage theo | ory The |

### Add on: two's complement representation

- Two's complement (complément à deux) is the most common representation for negative integers
- For a number on N bits:
  - Positive integers from 0 to  $2^{N-1}-1$  are represented with usual binary encoding
  - Negative integer x from  $-2^{N-1}$  to -1 are represented by coding in binary the positive number  $2^N |x|$
  - Hence Negative integers always have the last (i.e. most significant) bit at 1, and positive always have the last bit at 0
- Example with N=3
  - Integers between  $-4_{10}$  and  $3_{10}$  can be represented
  - $-1_{10}$  is represented as  $111_2$  ( $2^3 1 = 7$ )
  - $-2_{10}$  is represented as  $110_2$  ( $2^3 2 = 6$ )
  - $-4_{10}$  is represented as  $100_2 (2^3 4 = 4)$

# Add on: two's complement representation (2)

- Two's complement have an important property: Addition "classical" algorithm works (except that the overflow should be ignored).
- Example:
  - $-1_{10} + (-2_{10}) = 111_2 + 110_2 = 1101_2 =$  (ignoring the carry/overflow) $101_2 = -3$
  - $-1_{10} + 2_{10} = 111_2 + 010_2 = 1001_2 =$  (ignoring the carry/overflow) $001_2 = 1$
- For x > 0,  $x \le 2^{N-1}$ , The representation of -x on N bit two's complement can be obtained by:
  - Complementing each bits of x
  - adding 1 to the resulting integer
- Example:
  - with N=3 and  $x=3_{10}=011_2$ , complement of x is  $100_2$  adding 1 gives  $101_2=-3_{10}$
  - With N=8 and  $x = 96_{10} = 01100000_2$  complement of x is 10011111, adding one is  $-96_{10} = 10100000_2$ , indeed  $256 96 = 160 = 10100000_2$

Tanguy Risset ARC: Computer Architecture 37

introduction 0000 History 000 Computer Architecture Computer Architecture Automata in langage theory 000 Computer Architecture 0000 Computer Architecture 0000 Computer Architecture 0000 Computer Architecture 17 Computer Architecture 17 Computer Architecture 18 Computer Architecture 19 Computer Archi

### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procÃ(C)dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code

#### Automata

- Definition (Wikipedia): An automaton is a self-operating machine, or a machine or control mechanism designed to automatically follow a predetermined sequence of operations, or respond to predetermined instructions.
- In computer science:
  - Used in language theory to build compilers
  - Used in any technical domain: to describe predetermined behaviour.
  - Used in computer architecture: to design dedicated circuit.
  - A computer is a specific automaton.



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procAc dure et Pile d'execution
- 14 Coming back to MIPS
- Some additionnal useful information
  - Example of MIPS code



### Langage régulier et automate

- Formellement, un langage (au sens syntaxe) est simplement un ensemble de mots formés à partir d'un alphabet fini.
- Exemples de langage
  - l'ensemble des mots du dictionnaire (alphabet de a à z)
  - l'ensemble des mots constitués de deux 'a' suivis d'un nombre quelconque de 'b':  $\mathcal{L}=\{aa, aab, aabb, aabbb, \ldots\}$
- Un langage est dit régulier s'il est reconnu par un automate.
- Automate reconnaissant le mot fee:



• Automate reconnaissant le langage {fee, feu}



Tanguy Risset

ARC: Computer Architecture

Electrons and Logic

Processor Architecture Automate

Automata in langage theory

### Notion d'automate

- Un automate est une collection de K états numérotés de 0 à K-1. ainsi qu'une collection de transitions
- Un état particulier est l'état initial.
- Tous les états sont soit des états d'acceptation et soit des états de refus
- Les transitions, sont étiquetées
  - 1 soit par des actions (par exemple, je lis la lettre x)
  - 2 soit par des condition (par exemple, la lettre x est présente)
- le triplets (état 1, lettre x, état 2) signifie: si je suis dans l'état 1 et que je lis la lettre x, alors je vais dans l'état 2.



### Notion de mot reconnu



- fee  $\rightarrow$  reconnu
- feu  $\rightarrow$  reconnu
- fei  $\rightarrow$  non reconnu (impossible de lire 'i')
- ullet fe o non reconnu (arrêt dans un état non final)

|                      |                |                     |                  | <b> </b>       |                             | <b>■</b> • 9 • 0 • 0 |
|----------------------|----------------|---------------------|------------------|----------------|-----------------------------|----------------------|
|                      |                | Tanguy Risset       | ARC: Computer Ar | chitecture     |                             | 43                   |
| introduction<br>0000 | History<br>000 | Electrons and Logic |                  | Automate<br>00 | Automata in langa<br>○○○○●○ | ge theory The        |
|                      |                |                     |                  |                |                             |                      |

### Exemple d'application: Expressions régulières

- Les expressions régulières sont couramment employées en ligne de commande, par exemple: 1s \*.c
- Une expression régulière X basée sur un alphabet décrit un langage, c'est à dire un ensemble de mots sur cet alphabet, on note ce langage E(X).
- Exemples:

| Expression régulière    | langage reconnu                                             |
|-------------------------|-------------------------------------------------------------|
| a*.b                    | mots constitués d'un nombre<br>quelconque de a suivi d'un b |
| a.(a+b)*.a + b.(a+b)*.b | mots constitués des lettre a et                             |
|                         | b, commençant et finissant par la même lettre               |

 Les mots décrits par des expressions régulières peuvent être reconnus par des automates

### Link with architecture: Computers are automata

- Every computing machine is an automata
- Computer are *universal* in the sense that the program gives much flexibility in the action performed.
- In fact the basic action of a computer is very repetitive:
  - Read the instruction at \$PC in memory
  - decode the instruction
  - send the decoding to the ALU (or to memory if it is a load)
  - increment \$PC
- Dedicated circuits (ASICs) are automata designed for specific tasks.



- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procAc dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



introduction History Electrons and Logic



- A piece of unique train track for both train directions between the cities T. et K.
- Sensors triggered by train weight on rallways will command red lights when the track is used by a train.
- Modeling:
  - A booleen A (for 'Ampoule') indicating the state of the red light
  - Two booleans (LS for Left Sensor and RS for Rigth sensor) indicating the states of the sensors
  - An automaton to command the red lights



#### The Russian train automaton



#### The Russian train automaton



- Circles are *states* of the automaton (e.g. NoTrain state models the cases where no train stand on the track).
- States specifies output Values (here only one: A)
- Arrows are transitions, labeled by event that triggered them.

|                                          | 4 □ ▶ ∢ (                  |                                   |
|------------------------------------------|----------------------------|-----------------------------------|
| Tanguy Risset                            | ARC: Computer Architecture | 49                                |
| introduction History Electrons and Logic |                            | Automata in langage theory O00000 |
|                                          |                            |                                   |

### Formal Definition of Synchronous Automata

Formaly, a synchronous automaton (as use in ARC course) is a quintuplet  $(I, O, S, T, F, s_0)$  where:

- I is a set of Inputs (binary values usually),
- O is a set of Outputs (binary values usually)
- S is a finite set of states.
- T is the transition function from  $S \times I \rightarrow S$
- ullet F is the output function from S o O
- $s_0 \in S$  is a special state called *initial state*, where we start from.

The *synchronous* specifier indicates that there is a kind of *clock* that triggeres the evaluation of the transition and output fonction.

### Back to the Russian train example



- The Inputs are RS and LS sensors Boolean values
- The Output is the value of Boolean
- The functions (Transition and Output) can be defined by tables ⇒
- X means 'don't care'

| S        | x=(LS, RS) | s'=T(s,x) |
|----------|------------|-----------|
| NoTrain  | 00         | NoTrain   |
| NoTrain  | 01         | TrRight   |
| NoTrain  | 10         | TrLeft    |
| NoTrain  | 11         | XXX       |
| TrRight  | 0X         | TrRight   |
| TrRight  | 1X         | TrRight2  |
| TrRight2 | 1X         | TrRight2  |
| TrRight2 | 0X         | NoTrain   |
| S        | y=F(s)     |           |
| NoTrain  | 0          |           |
| TrRight  | 1          |           |
| TrRight2 | 1          |           |

Tanguy Risset ARC: Computer Architecture 51

introduction 000 Electrons and Logic Processor Architecture Automata in langage theory 000

The occurrence of the control of t

Implementation of a synchronous automaton as a circuit



- s is current state, s' is next state, x are input bits, y are output bits.
- Ck and reset are not considered as inputs
- State change will occur on each rising edge of the Clock.

### Implementation in Logisim

• We need to store 5 States, hence we need at least 3 bits:







• The output function is easy: A is on iff state is "NoTrain"



### Russian train Transition function

| S              | x=(LS, RS) | s'=T(s,x) |
|----------------|------------|-----------|
| 100 (NoTrain)  | 00         | NoTrain   |
| 100 (NoTrain)  | 01         | TrRight   |
| 100 (NoTrain)  | 10         | TrLeft    |
| 100 (NoTrain)  | 11         | XXX       |
| 000 (TrRight)  | 0X         | TrRight   |
| 000 (TrRight)  | 1X         | TrRight2  |
| 001 (TrRight2) | 1X         | TrRight2  |
| 001 (TrRight2) | 0X         | NoTrain   |
| 010 (TrLeft)   | X0         | TrLeft    |
| 010 (TrLeft)   | X1         | TrLeft2   |
| 011 (TrLeft2)  | X1         | TrLeft2   |
| 011 (TrLeft2)  | X0         | NoTrain   |

|        |       |                     |                   | <b>◆□→◆</b> |               |          | 9     | 90  |
|--------|-------|---------------------|-------------------|-------------|---------------|----------|-------|-----|
|        |       | Tanguy Risset       | ARC: Computer Arc | chitecture  |               |          | Ę     | 55  |
|        |       | Electrons and Logic |                   |             | Automata in I | angage t | heory | The |
| Russia | n tra | in Transition fu    | ınction           |             |               |          |       |     |

# How do we build a transition function

- The general method is to start from the truth tables (previous slide) and to apply a systematic method to build a logic function that corresponds to this table (see course 1)
- Or we can use a property of the transition table.
- For instance here, we remark that there is only one event that produces a change of state, in any state



### A possible Transition function

- Select a "next state" function for each state
- compute each of them
- Select the output depending on the value of the state

| Tanguy Risset                            | ARC: Computer Ar |          |                        | 57       |
|------------------------------------------|------------------|----------|------------------------|----------|
| introduction History Electrons and Logic |                  | Automate | Automata in langage th | eory The |
|                                          |                  |          |                        |          |

### Example of the transition from the NoTrain state

| S             | x=(LS, RS) | s'=T(s,x) |
|---------------|------------|-----------|
| 100 (NoTrain) | 00         | NoTrain   |
| 100 (NoTrain) | 01         | TrRight   |
| 100 (NoTrain) | 10         | TrLeft    |
| 100 (NoTrain) | 11         | XXX       |



### The complete Transition Automaton



|                                  | Tanguy Risset       | ARC: Computer Arc |                | , , , , , , , , , , , , , , , , , , , , | 59  |
|----------------------------------|---------------------|-------------------|----------------|-----------------------------------------|-----|
| introduction History<br>0000 000 | Electrons and Logic |                   | Automate<br>00 | Automata in langage theory              | The |

#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- 4 Processor Architecture
- 5 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procà © dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



### Comming back to automata

- Automata are very widely used in computer science in different domains.
- In ARC we use them to control the execution of dedicated synchronous circuits
- As soon as a dedicated circuit is designed, there is an automaton designed.

|                      |                |                     |                        | <b>◆□→◆</b> | ↑                   | ₹ か      | 90  |
|----------------------|----------------|---------------------|------------------------|-------------|---------------------|----------|-----|
|                      |                | Tanguy Risset       | ARC: Computer Arc      | chitecture  |                     |          | 61  |
| introduction<br>0000 | History<br>000 | Electrons and Logic | Processor Architecture |             | Automata in langage | e theory | The |
| Maak                 | and            | Moore automat       | ta                     |             |                     |          |     |

## ivieary and ivioore automata

- We have seen a *Moore automaton*: output only depend on the state (not on the input), usually simpler to handle.
- $\bullet$  The most general form of an automaton has a moore output and a mealy output  $$_{\rm inputs}$$



### Summery: from Algorithm to Circuit

- From algorithm to automata (states and input/output)
- From automata to synchronous automata
- From synchronous automata to digital design

## Lab topic: circuit for integer division

```
n := entrée N
p := entrée P
x := 0
q := 0
tant que x+p < n
    x := x+p
    q := q+1
fin tant que
sortie Q := q</pre>
```

#### Lab topic: proposed circuit to realize it



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- 4 Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procÃ(C)dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



### Toward a Von Neumann computer

- What you have seen:
  - How to build an ALU with transistors (combinatorial circuit)
  - How to build a controller with transistor (automate)
  - Basic Von Newmann behavior:
    - Registers
    - ALU (or datapath)
    - control unit (automaton)
    - Memory (and bus)
- Lets recall the basic Von Neumann behavior

Tanguy Risset ARC: Computer Architecture occasion on a Processor (8 general purpose registers)

































- Let's study in more details the instruction execution:
  - instruction execution cycle (Von Neumann cycle)
  - Instruction pipeline
  - ISA definition
  - RISC instruction set



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- 4 Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, procédure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



#### The "Von Neumann cycle"

- The so-called Von Neumann cycle is simply the decomposition of the execution of an instruction in several independent stages.
- The number of stages depend on the processor, usually 5 stages are commonly used as example:
  - Instruction Fetch (IF)
    - Reads the instruction from memory (at address \$PC) and write it in \$IR.
  - Instruction Decode (ID)
    - computes what needs to be computed before execution: jump address destination, access to register, etc.
  - Execute (EX)
    - executes the instruction: ALU computation if needed
  - Memory Access (MEM)
    - Loads (or stores) data from memory if needed
  - Write Back (WB)
    - Writes the result into the register file if needed



- The RISC paradigm was invented by Berkeley and popularized by Henessy and Patterson in the book on MIPS
- MIPS stands for *Microprocessor without Interlocked Pipeline Stages* and propose and architecture to execute each stage independently



from MIPS website https://www.mips.com/

#### Christian Wolf's slides

- Use Christian Wolf slides for explaining MIPS instruction pipeline
- Here



• Taken from Henessy/patterson book



#### Illustration of bubble on MIPS

- When next instruction cannot be fetched directly (because it need the result of previous instruction for instance) it creates a "bubble"
- For instance: an addition using a register that was just loaded
- The value of the register will be available after the MEM stage of first instruction, hence we can delay on only on cycle, provided there is a shortcut.



- Go back to our previous representation of the processor and memory:
  - Von Neumann computer= Memory + CPU
  - CPU= = control Unit + Datapath
  - Datapath= ALU + Register file



### A pipeline example from MIPS

- Execute the sequence of assemby instruction:
  - load value at address 500 in register R0
  - Add 1 to R0 and put result in R1
  - store value of Register R1 at address 500
- (Think of i=i+1)
- Code:

```
la R0,500
add R1, R0, 1
sw R1,500
```



 Before execution starts, \$PC contains the address of the first instruction: 100



#### Instruction Fetch



Tanguy Risset ARC: Computer Architecture Automate Automata in langage theory occasional control contro

#### Instruction Decode



#### • Execute (nothing for load)





#### Memory access



#### Write Back





- increment \$PC
- Fetch next instruction
- etc. etc.



introduction History Electrons and Logic Processor Architecture Automate Automata in langage theory Th

## Counting CPI for non-pipelined architecture

- CPI= Cycle per instruction
- 5 cycles for executing on instruction
- ullet  $\Rightarrow$  15 cycles for 3 instructions.



Instruction Fetch (for 'load' instruction)



- Instruction Decode (for load)
- Instruction Fetch (for 'nothing' because of a bubble: instruction 'add' delayed)



Tanguy Risset

ARC: Computer Architecture

87

introduction 000

Blectrons and Logic Processor Architecture Automata in langage theory Theorem 000

Occupant Automata in langage theory Occupant Automata in langage in lan

#### cycle 3

- Execute (for load: nothing to do)
- Instruction Decode (for 'nothing')
- Instruction fetch (for 'add')



- Memory access (for load)
- Execute (for 'nothing')
- Instruction Decode (for add)
- Instruction fetch (for store)



Tanguy Risset

ARC: Computer Architecture

990 89

History Electrons and Logic

Processor Architecture Automate Automata in langage theory

#### cycle 5

- Write Back (instruction load)
- Memory access (for 'nothing')
- Execute (instruction add: bypass)
- Instruction Decode store



- Write Back (for 'nothing')
- Memory access (instruction add, nothing to do)
- Execute (instruction store: nothing to do)



#### cycle 7

- Write Back (instruction add)
- Memory access (instruction store: bypass)



### Counting CPI for both architectures

- Non-pipelined architecture:
  - 5 cycles for one instruction
  - $\Rightarrow$  15 cycles for 3 instructions.
- Pipelined architecture:
  - 5 cycles for one instruction
  - 8 cycles for 3 instructions.
  - ⇒ without bubbles, one instruction per cycle
  - A 'jump' instruction interrupt the pipeline (need to wait for the address decoding to fetch next instruction) 

     pipeline stall
  - Some ISA allow to use these delay slots: one or two instruction after the jump are executed before the jump occurs.



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, proc©dure et Pile d'execution
- 14 Coming back to MIPS
- Some additionnal useful information
  - Example of MIPS code



## Set Architecture Statement (ISA)

- The instruction set (Set Architecture statement: ISA) is of paramount importance
  - It determines the basic instructions executed by the CPU.
  - It's a balance between the hardware complexity of the CPU and the ability to express the required actions
  - It is represented in a symbolic way: the assembly code/language (ex: ADD R1,R2)
  - The tool that translates symbolic assembly code in binary code (i.e. machine code) is also called the **assembler**
- Two types of ISA:
  - CISC: Complex Instruction Set Computer
  - RISC: Reduce Instruction Set Computer



## CISC: Complex Instruction Set Computer

- An instruction can code several elementary operations
  - Ex: a load, an add and a store (in memory operations)
  - Ex: computer a linear interpolation of several values in memory
- Need a mode complex hardware (specifically hardware accelerators)
- High variability in size and execution time for different instructions
- Produce a more compact code but more complex to generate
- Vax, Motorola 68000, Intel x86/Pentium

## Example: instructions ISA of Pentium





## RISC: Reduced Instruction Set Computer

- Small simple instructions, all having the same size, and (almost) the same execution time.
- no complex instruction
- Clock speed increase with pipelining (between 3 and 7 pipeline stages)
- Code simpler to generate but less compact
- Every modern processor use this paradigm: SPARC, MIPS, ARM, PowerPC, etc.

## Example: instructions of MSP430 ISA

| 1 operand instruction |    |    |    |    |    |     |     |   |     |   |   |     |         |   |   |
|-----------------------|----|----|----|----|----|-----|-----|---|-----|---|---|-----|---------|---|---|
| 15                    | 14 | 13 | 12 | 11 | 10 | 9   | 8   | 7 | 6   | 5 | 4 | 3   | 2       | 1 | 0 |
| 0                     | 0  | 0  | 1  | 0  | 0  | opc | ode |   | B/W | A | i | Des | st reg. |   |   |

| relative Jumps |    |    |    |          |    |   |   |   |            |          |   |   |   |   |   |
|----------------|----|----|----|----------|----|---|---|---|------------|----------|---|---|---|---|---|
| 15             | 14 | 13 | 12 | 11       | 10 | 9 | 8 | 7 | 6          | 5        | 4 | 3 | 2 | 1 | 0 |
| 0              | 0  | 1  | co | ondition |    |   |   | P | C offset ( | 10 bits) |   |   |   |   |   |

|                                                                                                                                                     | 2 operands instruction |       |  |  |          |    |  |    |     |   |    |   |  |          |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-------|--|--|----------|----|--|----|-----|---|----|---|--|----------|--|--|
| 15         14         13         12         11         10         9         8         7         6         5         4         3         2         1 |                        |       |  |  |          |    |  |    |     |   |    | 0 |  |          |  |  |
|                                                                                                                                                     | o                      | pcode |  |  | Dest reg | ţ. |  | Ad | B/W | Α | As |   |  | Dest reg |  |  |

Examples:

- PUSB.B R4
- JNE -56
- ADD.W R4,R4

Tanguy Risset ARC: Computer Architecture 99

introduction 000 Blectrons and Logic Processor Architecture Automata in langage theory 000

Occording to the control of the co

#### Exemple of Pentium ISA

- Write a simple C program toto.c
- Type gcc -S toto.c and get the toto.s file
- you can also use the *compiler* explorer:

```
https://gcc.godbolt.org/
```

.

```
main() {
    int i=17;
    i=i+42;
    printf("%d\n", i);
}

⇒

(... instructions ...)
movl $17, -4(%rbp)
addl $42, -4(%rbp)
(... printf params...)
call printf
(... instrucitons ...)
```

#### Disassemby

- compile the assembly code: gcc toto.s -o toto
- disassemble with objdump:

```
objdump -d toto
```

```
Adresses
          Instructions binaires
                                     Assembleur
(...)
 40052c: 55
                                            %rbp
                                     push
                                            %rsp,%rbp
 40052d: 48 89 e5
                                     mov
                                            $0x10,%rsp
 400530: 48 83 ec 10
                                     sub
 400534: c7 45 fc 11 00 00 00
                                            $0x11,-0x4(%rbp)
                                     movl
                                            0x2a,-0x4(%rbp)
 40053b: 83 45 fc 2a
                                     addl
                                            -0x4(\%rbp), \%eax
 40053f: 8b 45 fc
                                     mov
 400542: 89 c6
                                            %eax,%esi
                                     mov
% (...)
```

|                                                    | <ul> <li>□ ▶ &lt; □ ▶ </li> </ul> |
|----------------------------------------------------|-----------------------------------------------------------------------------------------|
| Tanguy Risset ARC                                  | : Computer Architecture 101                                                             |
| introduction History Electrons and Logic Processor |                                                                                         |

## common properties of ISA

- An ISA first defines the types of data on which the processors can compute (32 bit memory addresses, integer of various sizes, etc.)
- Then it contains various types of instructions:
  - Computation instructions (add, sub, or, and, ...), with various number of operands
  - Memory addessing instructions (load, store)
  - stack management instructions (push, pop)
  - Flow control instructions (jumps)
  - subroutine calls

### ISA computation instruction

- Instruction dedicated to computations: arithmetics, logical, shift etc.
- Computation instruction operate on registers (if 16 registers are available, 4 bits are sufficient to identify them).
- let's take the example of addition
  - Possible mnemonic for addition: add R0 R1 ightarrow R3
  - in general op Rx, Ry  $\rightarrow$  Rd, where op can be add, mul, sub, or, and etc.
  - ullet Operands can be values instead of registers. In general the instruction name changes: addi RO, #4 o R3.
  - How to code the instruction in binary:
    - 3 registers to name  $\Rightarrow$  3  $\times$  4 = 12 bits
    - A given number of bits to code the operation (8bits: 256 operations)
    - Some bits left for coding constants if needed



### Number of operand

We have defined a so-called three-operands (code trois addresses) addition since the three operands used. There are other solutions :

- 2-operands instructions: op Rd,  $Rx \rightarrow Rd$  (assembleur à deux addresses). One of the operands is overwritten.
- 1-operand instructions: same idea, but Rd is fixed and implicit. It's usually called the accumulator.
- Stack based instructions (0 operand): the processor has a stack (like calculators HP), and an add statement takes its two operands to the top of the stack and stores the result.

### Memory addressing operation

- Basic read/write:
  - $\bullet$  Read [R2]  $\to$  R5 reads the content of address contained in R2, place the result in R5
  - ullet conversely: Write R3 ightarrow [R6]
  - ullet With a constant: Read [#46] ightarrow R5
- indirect addressing:
  - Read [R2+R3]  $\rightarrow$  R5  $((R5\leftarrow[R2+R3])$
- with auto-increment
  - Read [R2+]  $\rightarrow$  R5 (R5 $\leftarrow$  [R2]; R2=R2+1)

|                                          |                  | 4 □ ▶ 4 ₫   |                   | ₹ %       | 00  |
|------------------------------------------|------------------|-------------|-------------------|-----------|-----|
| Tanguy Risser                            | ARC: Computer Ar | rchitecture |                   | 1         | .05 |
| introduction History Electrons and Logic |                  | Automate    | Automata in langa | ge theory | The |
|                                          |                  |             |                   |           |     |

## Stack management instruction

- High-level languages use a lot of Stacks (last in, first out, used for managing the calls of procedures).
- The stack is stored in memory
- A specific register is used to access the top of stack: \$SP for Stack
   Pointer
- instruction Push R1 pushes R1 on stack i.e.:

```
Write R1 
ightarrow [SP] Add SP, 1 
ightarrow SP
```

- SP+1 means "add to SP the size of the word" (32 or 64 bits)
- Instruction Pop R1 will do the opposite: Read [SP] ightarrow R1 Sub SP, 1 ightarrow SP

#### Flow control instructions

- The main flow control instruction (to implement loops and ifs) are the jumps:
  - Relative Jumps
    - GoForward #123 (jump to current address+123)
    - GoBackward #123
    - Warning: less than 32 bits for the constant!
  - Absolute jumps
    - Jump #1234 (jump to address 1234)
  - Conditional jumps
    - GoForward #123 IfPositive (i.e. if last operation's result was positive)
    - necessitates the presence of flags
    - Usually at four flags: C (carry), N (negative), Z (Zero), V (overflow)
    - In some ISA, flags are replaced by predicates (any condition can be used as flag).

#### Subroutine Calls

- The call (i.e. jump at the first instruction) of a procedure is done with the instruction call
  - call Label
- The label is a symbolic way to indicate where is stored the code of the procedure
- The call instruction performs a jump and keep tracks of the current address in order to be able to come back here after the execution of the instruction Return (usually use the stack for that)
- The Return instruction does not returns values (such as the result of functions),
- Results and parameters of functions are transmitted either on the stack or in specific register. This is defined by the ABI (Application Binary Interface).
- The ABI allow functions produced from different compiler to call themselves (i.e. library)

### Interrupts (ISR for interrupt service routine)

- The instruction flow can be interrupted at any time by an interrupt:
  - Packet arrived on network card
  - Sound card required more samples to play
  - a character has been stroked on the keyboard
  - etc.
- Interrupt are almost equivalent of function calls:
  - They interrupt the current instruction flow to execute the interrupt handler
  - Then they continue the execution where is has been stopped
  - Ideally they have no impact on the function being executed, hence all registers are saved on the stack before jumping to interrupt handler
- Interrupts and Interrupts handler will be studied in more details in CRO course



- Change mode instructions
  - modes interrupt, user, supervisor etc.
- Synchronization instructions
  - for multi-core systems

introduction History Electrons and Logic Processor Architecture Automate Automata in langage theory T

#### ISA Example: MSP430

- MSP430 ISA is an instruction set for micro-controller very low consumption (smart cards, RFID tags).
- 16-bit RISC instruction set with 16-bit registers .
- There are instructions of one and two operands.
- If these operands are registers, the instruction holds in one word of 16 bits.
- The operands can also be a memory box whose address (on 16 bits) is coded in one of the words following the instruction. In this case the instructions are 32 or 48 bits.
- Some registers are special: the \$PC is R0,
- the flags are in the register R1, and the register R2 can take different values useful constants (0, 1, -1 ...).
- the number of cycles that each instruction takes is exactly equal to the number of memory accesses that it makes.

## Example: instructions of MSP430 ISA

|    | 1 operand instruction |    |    |    |    |     |     |   |     |    |   |     |         |   |   |
|----|-----------------------|----|----|----|----|-----|-----|---|-----|----|---|-----|---------|---|---|
| 15 | 14                    | 13 | 12 | 11 | 10 | 9   | 8   | 7 | 6   | 5  | 4 | 3   | 2       | 1 | 0 |
| 0  | 0                     | 0  | 1  | 0  | 0  | opc | ode |   | B/W | Ac | d | Des | st reg. |   |   |

| relative Jumps |    |    |    |          |    |   |   |    |            |          |   |   |   |   |   |
|----------------|----|----|----|----------|----|---|---|----|------------|----------|---|---|---|---|---|
| 15             | 14 | 13 | 12 | 11       | 10 | 9 | 8 | 7  | 6          | 5        | 4 | 3 | 2 | 1 | 0 |
| 0              | 0  | 1  | co | ondition |    |   |   | Pe | C offset ( | 10 bits) |   |   |   |   |   |

| 2 operands instruction |       |    |    |          |    |   |    |     |   |    |   |   |          |   |   |
|------------------------|-------|----|----|----------|----|---|----|-----|---|----|---|---|----------|---|---|
| 15                     | 14    | 13 | 12 | 11       | 10 | 9 | 8  | 7   | 6 | 5  | 4 | 3 | 2        | 1 | 0 |
| c                      | pcode |    |    | Dest reg | ţ. |   | Ad | B/W | А | AS |   |   | Dest reg | • |   |

Exemples:

- PUSB.B R4
- JNE -56
- ADD.W R4,R4

### ISA example: ARM

- ARM ISA is a 32-bit RISC instruction set with 16 registers found among other things in all mobile phones.
- All the instructions are coded in exactly one memory word (32 bits).
- The instructions are 4 operands: the fourth is an offset that can be applied to the second operand.
- The second operand, and the offset can be constants. For example, add R1, R0, R0, LSL #4 calculates in R1 the multiplication of R0 by 17, without using the multiplier (slow, and sometimes otherwise absent).
- For embedded systems, ARM produced the THUMB ISA (instructions with 2 operands on 16 bits)
- In recent ARM system, instruction of 16 and 32 bits can be mixed



#### Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- 13 Function, proc©dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



### Study a real ISA: MIPS

- We study in more detail a particular assembly code
- Course inspired from
  - Architecture course of Peter Niebert and Séverine Fratani (U. Marseille)
    - http://pageperso.lif.univ-mrs.fr/~peter.niebert/archi2014.php
  - MIPS web site https://www.mips.com/
  - And of Course F. de Dinechin IF Architecture course (with bits of Christian Wolf)

|                      |                |                     |                        | <b>◆□ → ◆</b> ₫ |                     | ₹ %      | 90  |
|----------------------|----------------|---------------------|------------------------|-----------------|---------------------|----------|-----|
|                      |                | Tanguy Risset       | ARC: Computer Arc      | chitecture      |                     | 1        | 15  |
| introduction<br>0000 | History<br>000 | Electrons and Logic | Processor Architecture |                 | Automata in langage | e theory | The |
| MIDC                 | Droc           | 0000K               |                        |                 |                     |          |     |

#### MIPS Processor

- MIPS stands for Microprocessor without Interlocked Pipeline Stages
- MIPS designed by MIPS Computer Systems in 1985.
- Many version up to today (MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V and MIPS32, MIPS64 as well)
- Used in PC, and servers (DEC, NEC, Silicon Graphics) and for video games (Nintendo 64, Sony PlayStation, PlayStation 2)
- Gave birth to RISC-V, an open-source architecture.



## MIPS Processor organisation

- a register-to-register (or load/store) architecture
- MIPS use 3-adress instructions (destination is the first operand)
- 32 registers
- A program counter (\$PC) of 32 bits
- an Instruction register (\$IR) of 32 bits
- Addressable memory of 2<sup>32</sup> bytes
  - $\Leftrightarrow$  2<sup>30</sup> words of 4 bytes

|                      |      |                     |                   | <b>→</b> □ <b>→ → </b> | <b>₽▶ ∢ ≣ ▶ ∢ ≣ ▶</b> | 1 9      | 90  |
|----------------------|------|---------------------|-------------------|------------------------|-----------------------|----------|-----|
|                      |      | Tanguy Risset       | ARC: Computer Arc | chitecture             |                       | 11       | 17  |
| introduction<br>0000 |      | Electrons and Logic |                   | Automate<br>00         | Automata in langage   | e theory | The |
| unders               | tand | ing MIPS assen      | nbly              |                        |                       |          |     |

• From C to assembly:

prog.c

prog.s

```
$t0, 4($gp)
                           # fetch N
lw
       $t0, $t0, $t0
                           # N*N
mult
       $t1, 4($gp)
lw
                           # fetch N
       $t2, $zero, 3
                           # 3
ori
       $t1, $t1, $t2
mult
                           # 3*N
       $t2, $t0, $t1
add
                           # N*N + 3*N
       $t2, 0($gp)
                           # i = ...
SW
```

# MIPS assembly: compiler optimization (academic)

From C to optimized assembly:

```
mipsel-linux-gcc prog.c -S -O3 -o prog.s
```

prog.c

```
prog.s
```

```
lw $t0, 4($gp) # fetch N
add $t1, $t0, $zero # cp N to $t1
addi $t1, $t1, 3 # N+3
mult $t1, $t1, $t0 # N*(N+3)
sw $t1, 0($gp) # i = ...
...
```

## MIPS register

- 32 registers in the *register file*
- Named
  - by their number: \$0 \$1 ...\$31
  - or by their name \$zero \$at \$v0 \$v1 \$a0 ...\$a3 ...
- \$0 (\$zero) contains value 0
- \$a0 ...\$a3 are used to pass (first four) arguments of a function call
- \$v0 \$v1 are used to transmit functions result
- \$s0 ...\$s7 and \$t0 ...\$t9 are working registers, used for CPU computations
- \$sp is the stack pointer
- \$fp is the frame pointer (explained later)
- \$ra contains the return address (after the end of current function)
- \$gp is a pointer to global area
- \$k0, \$k1 and \$at are reserved register (for kernel and assembler)

# MIPS Memory map

- The Memory Map is a convention to organize memory that must respect each code to be compatible with others.
- The MIPS memory map (very similar to all memory map) is simple
- Here we have only one physical memory chip: the RAM.



# MIPS assembly addressing mode

Addressing mode means: how the address is computed in an assembly instruction

| format                        | address computation                     |
|-------------------------------|-----------------------------------------|
| \$register                    | content of register                     |
| imm                           | immediate value                         |
| imm (\$register)              | immediate + content of register         |
| label                         | addresse of label                       |
| $label \pm imm$               | addresse of label $\pm$ immediate value |
| $label \pm imm \; (register)$ | addresse of label $\pm$                 |
|                               | (immediate value + content of register) |

# Example of MIPS adressing mode

- add \$s0, \$s2, \$s1
  - puts in \$s0 the value of \$s1 plus the value of \$s2.
  - \$s0=\$s1+\$s2
- addi \$s0, \$s1, 1
  - puts in \$s0 the value of \$s1 plus 1.
  - \$s0=\$s1+1
- lw \$s0, 10(\$s3)
  - puts in \$s0 the value situated in memory at the address obtained by adding 10 to the content of \$s3.
  - \$s0=Memory[\$s3+10]
- bne \$s0, \$s3, label
  - branch to address of label if values in \$s0 and \$S3 are different.
  - if (\$s0 != \$s3) then \$PC=label

|                      |          |                     |                   | <b>→</b> □ <b>→ → </b> | ₽ ► ◀ ≣ ► ◀ ≣  | <b>→ =</b> | 990     |
|----------------------|----------|---------------------|-------------------|------------------------|----------------|------------|---------|
|                      |          | Tanguy Risset       | ARC: Computer Arc | chitecture             |                |            | 123     |
| introduction<br>0000 |          | Electrons and Logic |                   |                        | Automata in la | ngage the  | ory The |
| _                    | <u> </u> | MDC                 |                   |                        |                |            |         |

## Format of MIPS instructions

- 3 types of format: R-Type, I-Types and J-Types
- R-types:

| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
|--------|--------|--------|--------|--------|--------|
| ор     | rs     | rt     | rd     | shamt  | func   |

- Used for 3-register instructions
- op is the operation code or opcode that specifies the operation
- rs and rt are the first and second source register
- rd is the destination register
- shamt is used for shift instruction
- func is used with op to select arithmetic operation

## I-Types instruction

• I-Types instruction are used for load, store, branch and immediate instruction.

| 6 bits | 5 bits | 5 bits | 16 bits |
|--------|--------|--------|---------|
| ор     | rs     | rt     | Address |

- rs is a source register (an address) for loads, store
- rs is an operand for conditionnal branch
- rt is a source register for branch
- rt is a destination register for other I-Types instruction
- The address field is a 16 bit's integer in two's-complement code, ranging from -32 768 to 32 767 (remind that this is a problem in many cases)

|                      |       |                     |                        | 4 □ → 4 ₫      | □ ▶ ◀ ■ ▶          | ₹ 900°        |
|----------------------|-------|---------------------|------------------------|----------------|--------------------|---------------|
|                      |       | Tanguy Risset       | ARC: Computer Ar       | chitecture     |                    | 125           |
| introduction<br>0000 |       | Electrons and Logic | Processor Architecture | Automate<br>00 | Automata in langag | ge theory The |
| J-Type               | s ins | struction           |                        |                |                    |               |

J-Types instruction are used for Jump to absolute address
 6 bits
 26 bits

| ор | Address |
|----|---------|

- The address field is a 26 bit's integer containing the address of the word, hence the real address is obtain by multiplying by four (shifting two bits).
- can jump from address 0 to  $2^{28}$ =256MB from \$PC.
- For longer jump, on can use the instruction jr: jr \$ra
   jump to 32 bit address contained in register \$ra

## Basic arithmetic and logic instruction

- R-Types instructions: add, sub, mul, div, and, or, xor
  - add \$t0, \$t1, \$t2 // \$t0 = \$t1 + \$t2
  - mul \$s0, \$s1, \$a0 // \$s0 = \$s1 \* \$a0, pseudo
- I-types for immediate operand operation:
  - addi \$t0, \$t1, 4 // \$t0 = \$t1 + 4
  - addi \$t0, \$0, 4 // \$t0 = 4
  - li \$t0, 4 // \$t0 = 4, pseudo

|                      |                     |                  | 4 □ ▶ 4 ⊑      | J           | = =                    | *) C   | 7 (4 |
|----------------------|---------------------|------------------|----------------|-------------|------------------------|--------|------|
|                      | Tanguy Risset       | ARC: Computer Ar | chitecture     |             |                        | 12     | 27   |
| introduction<br>0000 | Electrons and Logic |                  | Automate<br>00 | Automata ir | ı langage <sup>.</sup> | theory | Th   |
|                      |                     |                  |                |             |                        |        |      |

## Load and store

- MIPS load and store operation use indexed addressing
  - the address operand specifies a signed constant and a register
  - These values are added to generate effective address
- byte instruction: 1b and sb transfer one byte
  - 1b \$t0, 20(\$a0) // \$t0=Memory[\$a0+20]
  - sb \$t0, 20(\$a0) // Memory[\$a0+20]=\$t0
  - sb stores only the lowest byte of operand register
- Word instruction: 1w and sw operates on word (i.e. 32 bits)
- Remind that address have to be aligned to 32 bit world, hence must be multiple of 4.

#### **Branches**

- Conditional branch
  - bne \$t0, \$t1, Label
  - if \$t0 and \$t1 have different values, the next instruction to execute is at address Label
  - beg \$t0, \$t1, Label // same thing if \$t0=\$t1
- Unconditionnal branch
  - i toto // next instruction executed is at address toto
  - jr \$s2 // next instruction executed is at address contained in \$s2
- These are the only way of implementing loops in assembly:

```
li $t2, 0
       li $t3, 1
while: beg $t1, $0, done
       add $t2, $t1, $t2
       sub $t1, $t1, $t3
       j while
done:
```

```
t2 = 0
while (t1 != 0) {
   t2 = t2 + t1
   t1=t1-1
```

Tanguy Risset

ARC: Computer Architecture

naa

Electrons and Logic

Processor Architecture Automate Automata in langage theory

## Function control flow in MIPS

- MIPS uses the jump-and-link (jal) instruction to call functions
  - Example:

- saves the return address (i.e. the address of the following instruction) in the \$ra register and jumpt to the address of Fact
- At the end of the execution of Fact, the instruction jr \$rajumps back to the address stored in \$ra
- Arguments transmited to Fact are stored in registers \$a0 ...\$a3
- Return values of Fact are stored in registers \$v0 \$v3

## Who save the register during Function call?

- When a function call occurs: jal Fact, who save the register?
  - The Caller (who knows which register he will use after the call)?
  - Or the callee (who knows which register he will use during its execution)?
- This convention is part of the *calling convetion* or ABI *application* binary interface.
- For MIPS:
  - \$t0 \$t9 \$a0 \$a3 \$v0 \$v1 are caller saved (if needed)
  - \$s0 \$s7 \$ra are callee saved (if needed)



# Function call example with MIPS

- Let says: function B calls function C
- Function B wants to save \$t0, \$t1 and \$a0 because it will need them after the return of C.
- this is done using the stack via the stack pointer \$sp

#### The Stack

- The stack is use to store all local information (in the sense local to the current function)
- That includes (say for function C):
  - local variable

jr \$ra

- Callee saved register if needed
- Return address (i.e. the instruction following the jal C instruction).
- (sometimes) the parameters passed to C
- (sometimes) the result of C
- In many ISA, the parameters and the results are passed through dedicated registers
- All these data constitute the frame of the fonction instance.
- the frame pointeur points to the frame of the current function
- For MIPS, the frame pointer is \$fp

|                |       |                |               |                        | <b>◆ □ → ◆ </b> | →                  |           | 20  |
|----------------|-------|----------------|---------------|------------------------|-----------------|--------------------|-----------|-----|
|                |       |                | Tanguy Risset | ARC: Computer Ar       | chitecture      |                    | 13        | 33  |
| introc<br>0000 |       | History<br>000 |               | Processor Architecture | Automate<br>00  | Automata in langag | ge theory | The |
| Fui            | nctio | on B           | calls C       |                        |                 |                    |           |     |

```
В
                      beguinning of
    sw $t0,0($sp)
                       saving $t0 in stack
    sw $t1,-4($sp)
                       saving $t1 in stack
    sw $a0,-8($sp)
                       saving
                               $a0 in stack
    sub $sp,$sp,12
                       correct stack pointer
                       call to C function
    jal C
    lw $a0,4($sp)
                      restoring return addresse of B from stace
    lw $t1,8($sp)
                       restoring $s1 from stack
    sw $t0,12($sp)
                      restoring
                                  $s0
    add $sp,$sp,12
                       adjusst stack pointeur value
```

end of B

introduction History Electrons and Logic Processor Architecture Automate Automata in langage theory The

# Sketching code of C function

```
$sp,$sp,40
                        # C need 40 Bytes for its frame
subu
        $ra,32($sp)
                        # store return address (inst. in B)
SW
        $fp,28($sp)
                        # store frame pointer
SW
        $s0,24($sp)
                        # store $s0 (because C uses it)
SW
        $fp,$sp
                        # $fp <- $sp: frame pointer of C se
move
        $ra,32($sp)
                        # $ra <- return address (in B)
lw
        $fp,28($sp)
                        # $fp <- frame pointeur of B
lw
        $s0,24($sp)
lw
                        # restore $s0
        $sp,$sp,40
                        # $sp <- $sp+40, restore B stack po
addu
        $ra
                        # return to $ra (B function)
j
```

|                     |                   | <b> </b>       |                    | <b>₹</b> *) Q (* |
|---------------------|-------------------|----------------|--------------------|------------------|
| Tanguy Risset       | ARC: Computer Arc | chitecture     |                    | 135              |
| Electrons and Logic |                   | Automate<br>00 | Automata in langag | e theory The     |

## Table of Contents

- introduction
- 2 History

C:

- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- 12 MIPS ISA
- Function, procédure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code

## Procedure abstraction

- Let's pause a while to come back to high level langage
- What is a function (or a procedure)?
- How its isolation mecanisme (local variable) is implemented?
- This is implemented with a very fundamental mecanism: the Stack and the Activation Record (or Frame) of each procedure.

|                      |      |                     |                        | <b>◆ □ → ◆ ₫</b> | ₹ P → E P → E P  | ₹       | 2     | 0   |
|----------------------|------|---------------------|------------------------|------------------|------------------|---------|-------|-----|
|                      |      | Tanguy Risset       | ARC: Computer Arc      | chitecture       |                  |         | 137   |     |
| introduction<br>0000 |      | Electrons and Logic | Processor Architecture |                  | Automata in lang | gage th | neory | The |
| Notion               | of r | procedure           |                        |                  |                  |         |       |     |

- Procedures (or functions) are the basic units for compilers
- Three important abstraction:
  - Control abstraction: parameter passing and result transmission
  - Memory abstraction: variable lifetime (local variables)
  - Interface: procedure's signature

## Procedure Control Transfer

- Transfer mechanism of control between procedures:
  - when calling a procedure, the control is given to the procedure called;
  - when this called procedure ends, the control is returned to the calling procedure.
  - Two calls to the same procedure create two em independent instances (or invocations).
- two useful graphic representations:
  - The call graph: represents the information written in the program.
  - The call tree: represents a particular execution.



# Call Tree procedure calc; begin { calc} ... end; procedure call1; var y... procedure call2 var z: ... procedure call3; var y.... begin { call3} x:=... calc; end;

begin { call<sub>2</sub>}

z := 1;

calc:

call<sub>3</sub>;

end:

begin  $\{ call_1 \}$   $call_2;$ 

end;

Call tree for one particular execution:



main calls call<sub>1</sub>
call<sub>1</sub> calls call<sub>2</sub>
call<sub>2</sub> calls calc
calc returns to call<sub>2</sub>
call<sub>2</sub> calls call<sub>3</sub>
call<sub>3</sub> callscalc
calc returns to call<sub>3</sub>
call<sub>3</sub> returns to call<sub>2</sub>
call<sub>2</sub> returns to call<sub>1</sub>
call<sub>1</sub> returns to main



### **Execution Stack**

- The transfer of control mechanism between procedures is implemented thanks to the *execution stack*.
- The programmer has this vision of virtual memory:



- The *heap* is used for dynamic allocation.
- The *stack* is used for the management of contexts of procedures (local variable, etc.)

#### Function call: status of the stack



#### Activation record

- Calling a procedure: Stacking the activation record (or frame).
- Need of a dedicated pointer for that: the activation record pointer (ARP) or frame pointeur (\$fp))
- The frame allows to set up the *context* of the procedure.
- This frame contains
  - The space for local variables declared in the procedure
  - Information for restoring the context of the calling procedure:
    - Pointer to the frame of the calling procedure (ARP or FP for em frame pointer).
    - Address of the return instruction (statement following the call of the appellant proceedings).
    - Eventually save the state of the registers at the time of the call.

## Content of the Frame







#### Table of Contents

- introduction
- 2 History
- Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- **12** MIPS ISA
- 13 Function, procédure et Pile d'execution
- Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code



## Coming back to previous call example with B and C

- Let says: function B calls function C
- Function B wants to save \$t0, \$t1 and \$a0 because it will need them after the return of C.
- this is done using the stack via the stack pointer \$sp

#### The Stack

- The stack is use to store all local information (in the sense local to the current function)
- That includes (say for function C):
  - local variable
  - Callee saved register if needed
  - Return address (i.e. the instruction following the jal C instruction).
  - (sometimes) the parameters passed to C
  - (sometimes) the result of C
  - In many ISA, the parameters and the results are passed through dedicated registers
- All these data constitute the frame of the fonction instance.
- the frame pointeur points to the frame of the current function
- For MIPS, the frame pointer is \$fp

|                      |                |               |                        | 4 □ → 4 ₫  | □                  | ₹ 4      | ) 9 (3 |
|----------------------|----------------|---------------|------------------------|------------|--------------------|----------|--------|
|                      |                | Tanguy Risset | ARC: Computer Arc      | chitecture |                    |          | 149    |
| introduction<br>0000 | History<br>000 |               | Processor Architecture |            | Automata in langag | ge theor | y The  |
| Eupoti               | on D           | calle C       |                        |            |                    |          |        |

#### Function B calls C

```
В
                      beguinning of
    sw $t0,0($sp)
                       saving $t0 in stack
    sw $t1,-4($sp)
                       saving $t1 in stack
    sw $a0,-8($sp)
                       saving
                               $a0 in stack
    sub $sp,$sp,12
                       correct stack pointer
                       call to C function
    jal C
    lw $a0,4($sp)
                       restoring return addresse of B from stace
    lw $t1,8($sp)
                       restoring $s1 from stack
    sw $t0,12($sp)
                       restoring
                                  $s0
    add $sp,$sp,12
                       adjusst stack pointeur value
    jr $ra
                       end of B
```

## Sketching code of C function

```
C:
            $sp,$sp,40
                           # C need 40 Bytes for its frame
    subu
            $ra,32($sp) # store return address (inst. in B)
    SW
            $fp,28($sp)
                           # store frame pointer
    SW
                           # store $s0 (because C uses it)
            $s0,24($sp)
    SW
            $fp,$sp
                           # $fp <- $sp: frame pointer of C se
    move
            $ra,32($sp)
                           # $ra <- return address (in B)
    lw
            $fp,28($sp)
                           # $fp <- frame pointeur of B
    lw
            $s0,24($sp)
                           # restore $s0
    lw
            $sp,$sp,40
                           # $sp <- $sp+40, restore B stack po
    addu
                           # return to $ra (B function)
            $ra
    j
```

## MIPS Assembly for programme fib

#### Fibbonacci suite program:

```
int fib (int i)
{
   if (i<=1) return(1);
   else return(fib(i-1)+fib(i-2));
}
int main (int argc, char *argv[])
{
   fib(2);
}</pre>
```

# Assembleur MIPS pour programme fib

```
fib:
       .frame
                    $fp,40,$ra
                                      # vars= 8, regs= 3/0, args= 16, extra= 0
       .mask
                   0xc0010000,-8
       .fmask
                    0x00000000,0
       subu
                  $sp,$sp,40
                                     # SP <- SP-40 :AR de 40 octet (10 mots)
       SW
                 $ra,32($sp)
                                     # stocke adresse retour SP+32
                $fp,28($sp)
                                     # stocke ARP appelant SP+28
       SW
                $s0,24($sp)
                                     # sauvegarde registre $s0
       SW
                                     # ARP <- SP
       move
                  $fp,$sp
       SW
                 $a0,40($fp)
                                     # stocke Arg1 dans la pile (ARP+40)
       lw
                $v0,40($fp)
                                     # charge Arg1 dans $v0
                                     # v0 < 1 si v0 < 0 sinon
       slt
                 $v0,$v0,2
                                     # branch L2 si $v0==0
       beq
                 $v0,$0,$L2
                                     # $v0 <- 0x1 ($v0 sera le registre contenant le res)
       li
                $v0,1
       SW
                $v0,16($fp)
                                     # stocke le resultat dans la pile
                                     # saute à L1
$L2:
       lw
                $v0,40($fp)
                                     # charge Arg1 dans $v0
                  $v0,$v0,-1
       addu
                                     # retranche 1
                  $a0,$v0
                                     # $a0 <- $v0 ($a0 contient Arg1 pour l'appel recursif)
       move
                 fib
                                      # jump and link fib ($ra<-next instr)</pre>
       ial
       move
                  $s0,$v0
                                      # $s0 <- $v0 ($v0: res appel fib)
       lw
                 $v0,40($fp)
                                      # charge Arg1 dans $v0
       addu
                  $v0,$v0,-2
                                      # retranche 2
                  $a0,$v0
                                      # $a0 <- $v0 ($a0: contient Arg1 pour l'appel recursif)
       move
       jal
                 fib
                                      # jump and link fib ($ra<-next instr)</pre>
                  $s0,$s0,$v0
                                      # $s0 <- $s0+$v0 ($v0: res appel fib)
       addu
                                      # stocke le resultat dans la pile
                $s0,16($fp)
       SW
```

Tanguy Risset ARC: Computer Architecture 153 History Electrons and Logic Processor Architecture Automate Automata in langage theory introduction 

< □

## Assembleur MIPS pour programme fib

```
$L1:
                 $v0,16($fp)
                                      # $v0 <- resultat
       lw
                                      # SP <- ARP
                   $sp,$fp
       lw
                 $ra,32($sp)
                                      # $ra <- adresse retour
       ٦w
                 $fp,28($sp)
                                      # ARP <- ARP appelant
                 $s0,24($sp)
       lw
                                      # restaure $s0
       addu
                   $sp,$sp,40
                                      # SP->SP+40
                $ra
                                      # jump adresse retour
       i
        .end
                   fib
        .align
                     2
        .globl
                     main
        .ent
                   main
main:
        .frame
                     $fp,24,$ra
                                      # vars= 0, regs= 2/0, args= 16, extra= 0
                    0xc0000000,-4
                     0x00000000,0
        .fmask
# partie ajoutA@e pour afficher le resultat
.data
str: .asciiz "Le resultat est "
.text
       subu
                   $sp,$sp,24
                                   # SP <- SP-24 :AR de 24 octet (6 mots)
                 $ra,20($sp)
                                   # stocke adresse retour SP+20
       SW
                 $fp,16($sp)
                                   # stocke ARP appelant SP+16
       SW
                                   # ARP <- SP
       move
                   $fp,$sp
                                   # stocke Arg1 dans la pile (ARP+24)
       SW
                 $a0,24($fp)
                                   # stocke Arg2 dans la pile (ARP+48)
                 $5,28($fp)
       SW
                 $a0,2
                                   # $a0 <- 2 ($a0: Arg1)
       li
                  fib
                                   # jump and link fib ($ra<-next instr)</pre>
       jal
# partie ajoutAce pour afficher le resultat
       move $16,$2
                              # $16 <- resultat de l'appel a fib
                           # v0 <- code pour afficher une chaine (4) \hfill\Box
       li $v0, 4
```

90 Q Q

## Table of Contents

- introduction
- 2 History
- 3 Electrons and Logic
- Processor Architecture
- 6 Automate
- 6 Automata in langage theory
- The Russian train example
- 8 Coming back to generic automata
- 9 Introduction
- 10 The "Von Neumann" cycle (Instruction stages)
- Instruction Set Architecture (ISA)
- **112** MIPS ISA
- 13 Function, procÃ(C)dure et Pile d'execution
- 14 Coming back to MIPS
- 15 Some additionnal useful information
  - Example of MIPS code

|  | Tanguy Risset       | ARC: Computer Are | chitecture |                            | 155          |
|--|---------------------|-------------------|------------|----------------------------|--------------|
|  | Electrons and Logic |                   | Automate   | Automata in langage theory | y <b>The</b> |

## Assember directives

| 1.                  |                                                             |  |  |
|---------------------|-------------------------------------------------------------|--|--|
| .align n            | Align the next datum on specified byte boundary $(0=byte,$  |  |  |
|                     | 2=word, etc.).                                              |  |  |
| .ascii str          | store the string in memory, but do not null-terminate it.   |  |  |
| .asciiz str         | Store the string in memory and null-terminate it.           |  |  |
| .byte b1,, bn       | Store the n values in successive bytes of memory.           |  |  |
| .data <addr></addr> | The following data items should be stored in the data seg-  |  |  |
|                     | ment                                                        |  |  |
| .double d1,, dn     | Store the n floating point double precision numbers in suc- |  |  |
|                     | cessive memory locations.                                   |  |  |
| .extern sym size    | Declare that the datum stored at sym is size bytes large    |  |  |
|                     | and is a global symbol.                                     |  |  |
| .globl sym          | Declare that symbol sym is global and can be referenced     |  |  |
|                     | from other files.                                           |  |  |
| .space n            | Allocate n bytes of space in the current segment.           |  |  |
| .text <addr></addr> | The next items are put in the user text segment.            |  |  |
| .word w1,, wn       | Store the n 32-bit quantities in successive memory words.   |  |  |

# example 1 (Fratini/Niebert)

```
bne $s0, $s1, Test
add $s2, $s0, $s1
Test:
```

```
Tanguy Risset ARC: Computer Architecture 157

introduction History 000 Electrons and Logic Processor Architecture Automate 000000

example 2 (Fratini/Niebert)
```

```
beq $s4, $s5, Lab1
add $s6, $s4, $s5
j Lab2
Lab1:sub $s6, $s4, $s5
Lab2:
```

# example 3 (Fratini/Niebert)

```
li $t2, 0
li $t3, 1
while:beq $t1, $0, done
  add $t2, $t1, $t2
  sub $t1, $t1, $t3
  j while
done:
```

## example 4 (U. Illinois)

```
.data
                         # declare storage for var1; initial
var1:
        .word
               23
                         # value is 23
        .text
__start:
        lw $t0, var1
                            # load contents of RAM location in
                            # register $t0: $t0 = var1
        li $t1, 5
                            # $t1 = 5 ("load immediate")
        sw $t1, var1
                            # store contents of register $t1
                            #into RAM: var1 = $t1
done
```

## example 5 (U. Illinois)

```
.data
array1: .space 12
                            declare 12 bytes of storage to
                         # hold array of 3 integers
        .text
                                    load base address of array
__start: la $t0, array1
                                 #register $t0
        li $t1, 5
                                 #$t1 = 5 ("load immediate")
        sw $t1, ($t0)
                                 #first array element set to 5;
                                 #indirect addressing
        li $t1, 13
                                 #$t1 = 13
        sw $t1, 4($t0)
                                 #second array element set to 1
        li $t1, -7
                                 #$t1 = -7
        sw $t1, 8($t0)
                                 #third array element set to -7
done
```

|                                |                     |                       |                            | 10  |
|--------------------------------|---------------------|-----------------------|----------------------------|-----|
|                                | Tanguy Risset ARC:  | Computer Architecture | 1                          | .61 |
| introduction History Electrons | and Logic Processor |                       | Automata in langage theory | The |

# Documentation on MIPS assembly

More precise documentation on MIPS assembly code can be obtained at:

- http://igm.univ-mlv.fr/ens/IR/IR1/2007-2008/Archi/ManuelSPIM.php (brief documentation from U. Marne la vallée)
- http://logos.cs.uic.edu/366/notes/mips%20quick%20tutorial.htm (brief documentation from U. of illinois at Chicago).
- https://en.wikibooks.org/wiki/MIPS\_Assembly, wikibook
- https://www.cs.unibo.it/~solmi/teaching/arch\_2002-2003/AssemblyLanguageProgDoc.pdf, MIPS Assembly language programmer's Guide.

introduction History Electrons and Logic Processor Architecture Automate Automata in langage theory The

# Du langage à l'exécution



## Architecture view from the programmer

- Modern systems allow
  - To run multiple independent programs in parallel (process)
  - To access memory space larger than physical memory available (virtual memory)
- For the programmer: all this is transparent
  - Only one program runs with very large memory available
- The processor view memory contains:
  - The code to execute
  - Static data (size known at compile time)
  - Dynamic data (size known at runtime: the heap, and the space needed for the execution itself: the battery)
- The programmer sees only the data (static and dynamic)



• the complete process will translate a C program into code executable (loading and execution will take place later).



- We often call *compilation* the set compiler + assembler
- The gcc compiler also includes an assembler and linking process (accessible by options)

## Your compilation process

- The programmer:
  - Write a program (say a C program: ex.c)
  - Compiles it to an object program ex.o
  - links it to obtain an executable ex

#### content of ex.c

```
#include <stdio.h>
int main()
{
   printf("hello World\n");
   return(0);
}
```



Tanguy Risset

ARC: Computer Architecture

introduction History occomposition

Automate Automata in langage theory occomposition

Automate occompositi

• The compilation process is divided in 3 phases:



# Compilation: the front-end

- The front end of an embedded code compiler uses the same techniques as traditional compilers (we can want to include assembler parts directly)
- Parsing LR(1): the parser is usually generated with dedicated metacompilation tools such as Flex et bison for GNU

|                      |                | Tanguy Risset  | ARC: Computer Ar       | □ > 4 \(\begin{align*} \text{□} | <b>1</b> 69 |           |
|----------------------|----------------|----------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------|
| introduction<br>0000 | History<br>000 |                | Processor Architecture | Automata in langage                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | theory (    | <b>Γh</b> |
| Compi                | latio          | n: The middle- | end                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |             |           |

- Some phases of optimizations are added, they can be very calculative
- Some example of optimisation independent of the target machine architecturre
  - Elimination of redundant expressions
  - dead code elimination
  - constant propagation
- Warning: optimization can hinder the understanding of the assembler (use the -O0 options with tt gcc)



Compilation: The back-end

- The code generation phase is dedicated to architecture target.
   Retargetable compilation techniques are used for architectural families.
- The most important steps important are:
  - Code selection
  - Register allocation
  - instruction scheduling

- The gcc command runs several program depending on the options
  - The pre-processer cpp
  - The compiler cc1
  - The assembleur gas
  - The Linker 1d
- gcc -v allow to visualize the different programs called by gcc



## The pre-processer cpp or gcc -E

- the task of the pre-processor are :
  - elimination of comments.
  - inclusion of source files
  - macro substitution (#define)
  - conditionnal compilation.
- Example:

```
#define MAX(a, b) ((a) > (b) ? (a) : (b))
ex1.c
       f=MAX(3,b);
```

# The compiler cc1 or gcc -S

- generate assembly code
- gcc -S main.c -o main.S
- Exemple :

```
void main()
{ int i;
 i=0;

while (1)
 {
    i++;
    nop();
 }
}
```

```
Tanguy Risset

ARC: Computer Architecture

introduction ooo | History ooo | Computer Architecture | Automata in langage theory ooo | Computer Architecture | Automata in langage theory ooo | Computer Architecture | Automata in langage theory ooo | Computer Architecture | Automata in langage theory | Computer Architecture | Computer Architecture | Automata in langage theory | Computer Architecture | Computer Architecture
```

Assembly code generated (for MSP430)

```
#2558,
                SP
                           ; stack initialization de la pile
mov
                           : r4 <- SP
       r1,
            r4
mov
       #0, 0(r4)
                             i initialization
mov
       0(r4)
                            i++
inc
                           ; nop();
nop
                             unnconditionnal jump (PC-6):
jmp
       $-6
        SP
incd
      #0x1158
br
```

# code produce by mspgcc -S

```
.global
               main
        .type
                      main, @function
main:
/* prologue: frame size = 2 */
.L__FrameSize_main=0x2
.L__FrameOffset_main=0x6
                 #(__stack-2), r1
        mov
                 r1,r4
        mov
/* prologue end (size=3) */
                 #11o(0), @r4
        mov
.L2:
                 #llo(1), @r4
        add
        nop
                  .L2
        jmp
/* epilogue: frame size=2 */
        add
                 #2, r1
                 #__stop_progExec__
/* epilogue end (size=3) */
/* function main size 14 (8)
```

Tanguy Risset

ARC: Computer Architecture

Electrons and Logic

Processor Architecture Automate Automata in langage theory

Assembler as ou gas

- transform an assembly code into object code (binaire representation of symbolic assembly code)
- Option -c of gcc allow to conbine compilation et assembly gcc -c main.c -o main.o

## Linking: 1d

- Produce the executable (a.out by default) from object code of programs and library used
- There are two ways to use libraries in a program
  - Dynamic or shared libraries (default option): the code of the library is not included in the executable, the system dynamically loads the code of the library in memory when calling the program. We need than only *one* version of the library in memory even if several programs use the same library. The library must be em installed on the machine, before running the code.
  - Static libraries: the code of the library is included in the executable. The
    executable file is bigger but you can run it on a machine on which the
    library is not installed.

| Tanguy Ris                               | sset ARC: Computer Arc | chitecture        | 179                   |
|------------------------------------------|------------------------|-------------------|-----------------------|
| introduction History Electrons and Logic |                        | Automate Automata | in langage theory The |

# Binary file manipulation

#### Some usefull command:

nm

Allow to know symboles (i.e. label: function names) used in an object file or executable

```
trisset@hom\$ nm fib.elf | grep main
000040c8 T main
```

 objdump allow to anlayze a binary file. For instance it can get correspondance between binary representation and assembly code trisset@hom\$ objdump -f fib

```
fib: file format elf32-msp430
architecture: msp:43, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00001100
```