Yogi rides the new 6 wheeled robotic base platform. Designed in Solidworks and milled on my CNC machine.

Monday January 24 , 2022
Font Size

Your First CPU - Chapter 1 - Basic CPU - Principles of CPU Design


Article Index
Your First CPU - Chapter 1 - Basic CPU
Principles of CPU Design
YFCPU Source Code
All Pages

Computer Timing & Clock Speeds

Given that the same integrated process technology is used, so as to compare apples to apples, a CPU's maximum clock speed is determined by how much logic delay exists between the start of a CPU's single instruction cycle and when all the bits settle on the correct state or the "static state". This is obviously taken as worst case scenario as some simple instructions will have less delay than a complicated instruction, For example, a boolean logic operation versus a multiply or even worse a divide.

For illustrative purposes, imagine a complex domino effect with many branching dominos. As the cycle starts, each domino is flipped in sequence, though some spawn new branches which run in parallel, but in the end the longest domino path determines how long the domino cycle takes to reach the static state.

If you wish to increase the speed of your CPU you could remove the instructions with the longest execution delays. You could then increase the cycles per second until you are encroaching on the delay of the next longest instruction. However, instructions with long execution delays could be performing a useful function that would take a handful of simpler instructions to replace. The question is, will your software use that instruction enough to justify slowing down the clock speed all instructions? This is the argument between the pro CISC and the pro RISC groups.


Reduced Instruction Set Computer or Complex Instruction Set Computer? Believe it or not, computer processor technology began as CISC type CPUs. Processors became more advanced by implementing more complex instructions that could do the same work as many simpler instructions. The trade-off of RISC versus CISC is more in favor of CISC when cpu's had a clock speed of only a few megahertz and compilers were very basic or assembly was used. Now that processing speeds are in the gigahertz RISC is winning the battle as a small savings in cycle time means significantly more instructions can be executed per second and intelligent compilers take more advantage of RISC type architecture. Modern x86 PCs though originally CISC are typically now RISC at the core with a CISC front-end and back-end conversion. This trend was started by AMD years ago to create competitive Intel compatible processors with less investment but has similarly now been adopted by Intel with Core 2 CPUs. Even RISC instruction sets are becoming more complex so the line between RISC and CISC is drawing closer.

See Other Resources below for more information on RISC and CISC architectures.

Designing our first basic CPU

Our first attempt at creating a cpu will be very simplistic. We will keep the entire cpu module in a single file, including the program assembler code.

Our basic CPU will execute a single instruction in 4 clock cycles. These four cycles are named Fetch, Decode, Execute and Store (aka. Write Back). We define tokens for these states starting at line 11. These four states define a Finite State Machine (FSM) implemented using a case statement starting at line 80 (same as switch/case in the C language.) The Fetch, Decode and Store states are very simple, but the Execute state, which comprises the cpu's ALU ( Arithmetic and Logic Unit) is more complex and is implemented using another case statement (line 102). The decoded opcode is used by the ALU to determine the operation performed on the input data registers RA & RB. Each ALU operation, or instruction opcode, is declared starting at line 20.

The instruction memory and register file is declared at line 29. Global declared module parameters determine the size of these memories, namely rf_size, im_size and opcode_size. With rf_size equal to 4, this means we will use 4 bits to address the register file. Thus, our register file can and will contain 2^4 (=16) registers. The bit width of the instruction word is affected by the bit width of the register file fields. We have 4 bits for the opcode, three register file fields (RA, RB & RD) so our instruction word bit length is 16 bits (4+3x4).

We have only a single instruction word format so our instruction decoder is very simple, split the IR register into OPCODE, RA, RB & RD (line 93). An instruction word format defines which bits of the instruction word define fields such as registers, constants or other types. The format used by the decoder is defined typically by the opcode. A typical cpu may have 3 or 4 instruction word formats and in later chapters we will add more.

Here is a breakdown of the four cycles of our basic cpu.

FETCH: The next instruction pointed to by the PC register is loaded into the IR (Instruction Register) register.

DECODE: The PC (Program Counter) is incremented (could also have been done in the fetch cycle), and the IR register is decoded into the OPCODE register, and the input register addresses RA & RB, and the write back register RD. In later chapters, this decoder may use different instruction word formats.

EXECUTE: Switch/case on the defined instruction opcodes using the OPCODE register and perform the defined operation using the addressed data referenced by the input register fields RA & RB. In some cases, RA and RB may be taken as constants and not access the register file at all such as the opcode LRI (Load Register Immediate). The return value from the ALU is stored in the W register.

STORE: Write the W register (the return value from the ALU), into register file memory.

Remember, in our basic cpu, we have not included any branch instructions yet. So our program performs a few basic calculations and then quits. You can view the assembly code for the sample program implemented as a constant array starting at line 59. (Note. Verilog syntax defines the "4'd" prefix to constants to mean a decimal number of 4 bits wide, so don't let all those 4'dnn prefixed numbers scare you, they are simply RA,RB,RD register addresses in decimal.)

We could probably improve the efficiency by doing more in the simple cycles and perhaps get down to 3 cycles per instruction, however this 4 cycle RISC cpu is a typical design and in later chapters we will expand on each cycle's duties and pipeline the cpu for single instruction per cycle execution.

Other Resources

Reduced Instruction Set Computers - Wiki

A CPU Datapath Simulator


User Rating: / 64

RSS Feeds

Visitor Poll

What sort of peripherals do you desire in a robotics main board? (you may vote more than once.)

Who's Online

We have 5 guests online