## Endterm Exam(ple)

Computer Organisation



## Please read the following information carefully!

- On a real exam, this is gonna be full of info you should read.
- $\bullet$  It should take you 90 minutes to complete this practice exam.
- The answers provided are meant for checking your results, not as example answers. We expect you to show your work and provide full explanations and calculations for your answers.

1. (400 points) The Coati are playing an online game called "Among them" with Taico. The purpose of this game is to find the imposter IO device on a bus with many devices. All normal devices respond to a 16-bit address ignoring two address lines; which 2 lines differs per device. The imposter tries to steal secret data and thus listens to more addresses (ignores more address lines).

The first address to appear on the bus is: OxBABF. The following devices respond

- A
- B

The next address that appears on the bus is: Oxbabe. The following devices respond:

- B
- C

The next address that appears on the bus is: OxBACE. The following devices respond:

- C
- D

The next address that appears on the bus is: OxBACF. The following devices respond:

• D

Taico now calls for an emergency meeting. Which device do you think is the imposter and why?

- 2. (400 points) Consider a byte-addressable, 4-way set-associative cache with 64 blocks of 64 bytes each. This cache serves a 2 MiB byte-addressable main memory. Show your work for each subquestion.
  - (a) How many bits are needed to address each byte within a block?

```
Solution: 64 = 2^6 bytes \rightarrow 6 bits needed (byte-addressable)
```

(b) How many bits are required to address each set?

(b) <u>4 bits</u>

**Solution:** 4-way =  $2^2$  blocks/set and 64 blocks =  $2^6$  blocks  $\rightarrow \frac{2^6}{2^2} = 2^4$  sets  $\rightarrow 4$  bits needed

(c) How many bits are required for the tag?

(c) \_\_\_\_**11 bits** 

**Solution:** 2 MiB =  $2^{21}$  bytes  $\rightarrow$  21 bits needed to address byte-addressable memory  $\rightarrow$  tag = 21 - 6 - 4 = 11 bits needed for tag

3. (450 points) Mark Owl, together with Hugo Owl and translator Irma Coati announced in the press conference at 19:30 total lockdown of the forest due to COVID-42.

Padmé Owl is trying to protect the forest from COVID-42. To do this she is going to buy 1000 facial masks (one for each animal in the forest) at the Cora hypermarket which is 7 mins of flying from her home. Doing the actual shopping takes 7 minutes. Her husband Anakin Owl in the meanwhile goes to the CBA supermarket which is 3 minutes of flying from home to buy 1000 disinfecting lotions. The shopping takes 14 minutes. These masks and the lotions now need to be distributed before the total lockdown of the forest at 23:00 sharp when no one is allowed to walk or fly around the forest anymore. Distribution is done by the Coati; each coati can bring only one item to one animal in the forest and one roundtrip takes 5 minutes.

How many Coati are needed to do the distribution? Show your work.

3. \_\_\_\_**55** 

Solution: From 19:30 till lockdown at 23:00 is 3.5 hours (210 minutes)

Padmé takes 7 + 7 + 7 = 21 minutes Anakin takes 3 + 3 + 14 = 20 minutes.

Distibution can already start when Anakin returns! That leaves 189 minutes for distributing masks, which is 37.8 deliveries per coati  $\rightarrow$  37, so we need 27.02  $\rightarrow$  28 coati to distribute masks.

For lotion: 190 minutes left, so 38 deliveries per coati, hence 27 (26.3) coati needed.

In total: 55 coati are needed

4. (425 points) Consider a CPU with a 15-stage pipeline. Assume that each stage takes one clock cycle to execute, and that conditional jumps are the only cause of pipeline stalls. The target address of a conditional jump is calculated in the fourth stage of the pipeline. Assume that 15% of the executed instructions are conditional jumps, and that the CPU does use branch prediction, which is correct 65% of the time.

What is the average throughput of this CPU, in number of instructions per clock cycle? Show your work.

4. \_\_\_\_**0.86** 

**Solution:** delay is 3 cycles

 $(0.85 + 0.65 * 0.15) * 1 + 0.15 * 0.35 * 4 = 0.9475 + 0.21 = 1.1575 \rightarrow 1/1.1575 = 0.86$ 

- 5. (350 points) Something went horribly wrong in Simon's computer and the log for his cache replacement algorithm is now scrambled. He knows that the following operations have occured but he does not know in which order:
  - 1. Block 0 was hit.
  - 2. Block 3 was hit.
  - 3. Block 5 was hit.
  - 4. A cache miss occured.

He also knows that his cache uses an LRU (least recently used) cache replacement algorithm as well as what the age of each of the blocks was before and after the operations. The following table displays them:

| block | before | after |
|-------|--------|-------|
| 0     | 001    | 010   |
| 1     | 010    | 101   |
| 2     | 101    | 111   |
| 3     | 110    | 001   |
| 4     | 100    | 110   |
| 5     | 011    | 011   |
| 6     | 000    | 100   |
| 7     | 111    | 000   |

In what order did the operations occur?

 $\Box$  (4), (3), (2), (1)

X (3), (1), (2), (4)

 $\Box$  (4), (1), (3), (2)

 $\Box$  (4), (2), (1), (3)

| Solution: |       |       |       |       |       |      |
|-----------|-------|-------|-------|-------|-------|------|
|           | block | start | HIT 5 | HIT 0 | HIT 3 | MISS |
|           | 0     | 001   | 010   | 000   | 001   | 010  |
|           | 1     | 010   | 011   | 011   | 100   | 101  |
|           | 2     | 101   | 101   | 101   | 110   | 111  |
|           | 3     | 110   | 110   | 110   | 000   | 001  |
|           | 4     | 100   | 100   | 100   | 101   | 110  |
|           | 5     | 011   | 000   | 001   | 010   | 011  |
|           | 6     | 000   | 001   | 010   | 011   | 100  |
|           | 6     | 111   | 111   | 111   | 111   | 000  |
|           |       | •     |       |       |       |      |

6. (400 points) You are the new computer hardware journalist for the highly esteemed Vulpes TV channel. You have been watching the latest presentations by Pear, DMA, MVideo, and Antel regarding their newest, shiniest hardware and your job is to make sure your TV channel does not accidentally broadcast misinformation. Identify what you think are the bits of misinformation in their hardware presentations (there could of course also be none) and explain why in a maximum of 2 sentences per piece of misinformation.

Pear said that their new APhone 13 can also charge the headset if provided enough power via USB-C and that their phone is now 4x faster because it is now using SDRAM instead of SRAM. It comes with power adapter, wireless headset, replacable battery, and a new open-source operating system for just under 600 euro.

Solution: SDRAM is not faster; SRAM is actually faster

DMA announced their new flagship processor will have 64 cores, consume 225 Watts of power, and is 21% faster. It supports up to 8 TiB of RAM.

Solution: No misinformation detected

MVideo's newest videocard now has 16 GiB of memory, 64 data lines and superfast GDDR7 memory chips with a 128Kx256 organisation. It can run Among Them with more than 60 frames per second!

Solution: 64 bit memory lane with x256 memory organisation does not make sense

Antel's newest processor now has a new SSE5 instruction set, 8 MiB of caches, and to further improve video playback they switched to non-interleaved memory.

Solution: non-interleaved memory is not faster for consecutive reads like video playback

- 7. (350 points) For the following questions, assume that both the bus and register X are ALU inputs, and that accessing memory takes more than one clock cycle. Note: R2 uses register addressing and (R1) uses register indirect addressing.
  - (a) Give the sequence of microinstructions that executes the instruction  $R1 \leftarrow R2 + (R1)$  in the least amount of time.

## Solution:

- 1.  $MAR_{in}, R1_{out}, READ$
- 2.  $X_{in}$ ,  $R2_{out}$ , WMFC
- 3.  $MDR_{out}, X_{out}, ALU_{add}, Z_{in}$
- 4.  $Z_{out}$ ,  $R1_{in}$

 $something_{in}$  and  $something_{out}$  can be swapped.

(b) Explain why your solution takes the least amount of time.

**Solution:** WMFC must happen as the last step in reading input to be most efficient.

8. (425 points) Consider the design of an OS that supports 64-bit processors equipped with up to 256 GiB of memory, and address spaces of up to 16 TiB. This architecture uses pages of 8 KiB. Assume that page table entries are byte-aligned and contain, next to the mapping info, 4 control bits.

What is the size of a page table with this OS? Show your work.

8. \_\_\_\_**8 GiB** 

```
Solution: nr of entries = \frac{16*2^{40}}{8*2^{10}} = 2*2^{30} entries sizeof(entry) = ceil((log^2(\frac{256*2^{30}}{8*2^{10}}) + 4)/8) = 4 B Size of page table = #entries * sizeof(entry) = 8 GiB
```

9. (400 points) Otto wants to start investing and has therefore decided to create a trading bot in assembly. In order to be successful, his bot needs to be fast. He has therefore decided to investigate which pipeline configuration would be most optimal for the piece of code defined below.

He can choose between:

- 1. A four stage pipeline with two decode units where each stage takes 1 cycle to complete but reading from memory causes a 2 cycle stall. The CPU has a clock rate of 1.1Hz.
- 2. A four stage pipeline with one decode unit where each stage takes 1 cycle and the CPU has a clock rate of 1Hz.

Both CPUs have operand forwarding.

The code he wants to run on these pipelines is as follows:

```
1 movq (% rsi), %r9
2 movq (% rbx), %r8
3 subq %r9, %r8
```

Calculate the execution time for both and advise Otto on which system to go for.

Motivate your answer by filling in the pipeline tables and showing your work for any relevant calculations.

|    | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|----|---|---|---|---|---|---|---|---|---|
| I1 | F | D | D | D | Е | W |   |   |   |
| 12 |   | F | D | D | D | Е | W |   |   |
| 13 |   |   | F |   | D |   | Е | W |   |

System 1

|    | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|----|---|---|---|---|---|---|---|---|---|
| I1 | F | D | Е | W |   |   |   |   |   |
| 12 |   | F | D | Е | W |   |   |   |   |
| 13 |   |   | F | D | Е | W |   |   |   |

System 2

```
Solution: T = 8/1.1 = 7.27 \ s
T = 6/1 = 6 \ s
\rightarrow \text{System 2 is more optimal.}
```

10. (400 points) Consider a memory which uses a ternary system instead of the standard binary. This means that a 'trit' (similar to a binary 'bit') can hold the values 0, 1, or 2. In this organisation, a 'tryte' (similar to a binary 'byte') is made up of 9 trits (or 3<sup>2</sup> trits).



Our memory has a 2187xA, tryte-addressable organisation. There are 2 address lines at B.

(a) How many address lines are there at C? Show your work.

(a) \_\_\_\_**5**\_\_\_

**Solution:**  $2187 = 3^7 \rightarrow 7 - 2 = 5 \rightarrow \text{there are 5 address lines at C}$ 

(b) How many data lines (A) are there? Explain your answer.

(b) \_\_\_\_\_**9**\_\_

**Solution:** The memory is tryte-addressable and 1 tryte = 9 trits  $\rightarrow$  there are 9 data lines

End of exam.