A Simple CPU in under 100 Lines of Verilog
Have you ever sat around wondering if ChatGPT can whip you up a CPU design? Well, since the team at FPGAjobs is doing Dry January, we needed a way to entertain ourselves. So! We figured we'd try our hand at:
- Kicking ChatGPT's tires by asking us to write a basic CPU,
- Use that to understand the pieces of what goes into a CPU,
- Start pulling back the curtain on instruction sets, and
- Figure out what other kinds of tools you need to actually verify a CPU properly
As a disclaimer: this is all in fun. We're a little more interested in learning something fun about CPUs, and, tangentially, about ChatGPT, than we are in writing a production ready MCU block for your next FPGA or RTL design. The joy is in the process, not the result!
A Simple CPU
Let's get right down to it, in the interest of showing that we mean business. Here's what ChatGPT produced for us - comments added are mostly our own:
module simple_cpu(
clock,
reset,
alu_out
);
input clock, reset;
output reg [7:0] alu_out;
// Registers
reg [7:0] register_file [0:1]; // Two 8-bit registers
reg [2:0] state; // State for FSM control
reg [7:0] program_counter; // Program Counter
// Program Memory
reg [7:0] program_memory [255:0];
// Internal buses
wire [7:0] instruction = program_memory[program_counter];
wire [2:0] opcode = instruction[7:5];
wire reg_dest = instruction[5];
wire reg_src = instruction[6];
wire [7:0] immediate = {4'b0000, instruction[3:0]};
always @(posedge clock or posedge reset) begin
if (reset) begin
program_counter <= 0;
register_file[0] <= 0;
register_file[1] <= 0;
alu_out <= 0;
state <= 0;
end else begin
case (state)
0: begin // Fetch
state <= 1;
end
1: begin // Decode and Execute
case (opcode)
3'b000: alu_out <= register_file[reg_dest] - register_file[reg_src]; // SUB
3'b001: alu_out <= register_file[reg_dest] + register_file[reg_src]; // ADD
3'b010: alu_out <= register_file[reg_dest] & register_file[reg_src]; // AND
3'b011: alu_out <= register_file[reg_dest] | register_file[reg_src]; // OR
3'b100: alu_out <= register_file[reg_dest] ^ register_file[reg_src]; // XOR
3'b101: alu_out <= opcode; // NOP
3'b110: alu_out <= immediate; // LOAD0 (Imm)
3'b111: alu_out <= immediate; // LOAD1 (Imm)
default: alu_out <= alu_out;
endcase
state <= 2;
end
2: begin // Write Back
if (opcode >= 3'b110) begin // LOAD
register_file[reg_dest] <= alu_out;
end
if (opcode < 3'b101) begin // Arithmetic + Logic
register_file[1'b0] <= alu_out;
end
state <= 3;
end
3: begin // Increment PC
program_counter <= program_counter + 1;
state <= 0;
end
endcase
end
end
endmodule
OK, cool - we can spit out some Verilog code, and claim it's a CPU. But is it, really? Let's dig down into the details.
What does a CPU do?
When you peel back all of the abstractions, peripherals, virtualization, and other bells and whistles in modern devices, a CPU really only does four basic things:
- fetch an instruction from memory
- decode what the instruction means
- execute the instruction
- increment the program counter
At its most fundamental level, this is all any processor does, ad infinitum. It's a small infinite loop that's used to process information, and it's frequently called a pipeline.
You can think of the pipeline as a channel through which information flows.
A Really Basic CPU
In practice, you can create a processing pipeline with a pretty simple set of logical components and elements. You need:
- Memory, for storing the pipeline's instructions and data
- A counter, which allows you to iterate through your program memory,
- Control logic, which orchestrates data flow and manages state based on instructions,
- An arithmetic unit, which executes logical and mathematical operations
Given that, let's make a little diagram of this system:
Fundamentally, all a processor does is math and boolean logic at high speed. This is a big statement, but not an inaccurate one. Everything inside a computer is a number represented as a set of binary bits. All the processor is doing is reading, writing, and modifying those binary numbers. It's a bit incredible to think of all the things humanity has managed to fit inside this convenient numeric model. Text, colors, wireless communications, network packets, blinking lights, vibrating motors - to a processor, these are all just numbers!
Academic digressions aside: we've described all the pieces of a computer. Does our generated Verilog match this description? Let's take a look at how a few pieces of this Verilog file meet those criteria:
Memory
There's nothing at all special about the memory in this design. It's a basic 8-bit Verilog array with 256 locations.
// Program Memory
reg [7:0] program_memory [255:0];
Right now, this one line is all our program memory is. There isn't even a means to get a computer program into it!
Control Logic
The control structure is also extremely simple. It's basically a counter loop!
case (state)
0: begin // Fetch
state <= 1;
end
1: begin // Decode and Execute
state <= 2;
end
2: begin // Write Back
state <= 3;
end
3: begin // Increment PC
program_counter <= program_counter + 1;
state <= 0;
end
endcase
The control logic just iterates through four states, and takes a specific action in each state:
- First, it fetches an instruction from the program memory,
- Next, it decodes what the instruction means, and executes the instruction by storing the result in the ALU,
- Then, it writes back the instruction to memory (if the instruction requires it),
- Finally, the controller increments the program counter, moving the state machine to the next location in program memory.
Execution
This processor does all decode and execution in a single state: the second control stage. The opcode determines what operation is performed on the contents of the registers. All results are stored, temporarily, at least, in the alu_out register.
case (opcode)
3'b000: alu_out <= register_file[reg_dest] - register_file[reg_src]; // SUB
3'b001: alu_out <= register_file[reg_dest] + register_file[reg_src]; // ADD
3'b010: alu_out <= register_file[reg_dest] & register_file[reg_src]; // AND
3'b011: alu_out <= register_file[reg_dest] | register_file[reg_src]; // OR
3'b100: alu_out <= register_file[reg_dest] ^ register_file[reg_src]; // XOR
3'b101: alu_out <= opcode; // NOP
3'b110: alu_out <= immediate; // LOAD0 (Imm)
3'b111: alu_out <= immediate; // LOAD1 (Imm)
default: alu_out <= alu_out;
endcase
These opcode case values constitute the processor's instruction set. They represent the whole range of actions that the processor can take. Looking at the list above, it can't do much! The processor is only capable of:
- Three simple boolean functions: AND, OR, and XOR,
- Two math operations: ADD, and SUBTRACT,
- Two LOAD operations: one for each register, and
- One NOP - which does nothing.
Writeback
Writeback refers to the processor moving data into or out of registers or memory.
2: begin // Write Back
if (opcode >= 3'b110) begin // LOAD
register_file[reg_dest] <= alu_out;
end
if (opcode < 3'b101) begin // Arithmetic + Logic
register_file[1'b0] <= alu_out;
end
state <= 3;
end
It's worth noting that the instruction set uses a range of values to determine writeback destinations. Any opcode greater than or equal to b110 writes the ALU value to a destination register. Any opcode less than b101 writes to register 0. Instruction b101 doesn't do anything at all - it's the NOP instruction.
Increment
The final stage in this process is to increment the program counter:
3: begin // Increment PC
program_counter <= program_counter + 1;
state <= 0;
end
...which starts the whole data process over again.
What's wrong with this CPU?
Lots! Here's just a few things this CPU is missing:
- ISA specifications: This instruction set is one that ChatGPT fabricated out of thin air. We don't know anything about it aside from a few comments we inserted!
- Jump instructions: It's pretty hard to implement table stakes programming concepts, like functions, without a jump instruction. This "CPU" is barely more than an arithmetic state machine at this point.
- A testbench: We have done absolutely nothing to confirm the behavior of this CPU! There's no testbench or any kind of stimulus file. Bad ChatGPT! No donut!
- A compiler: Writing bytecode by hand is awful. To be clear, you can edit raw patterns of ones and zeroes to make a computer program - and, sometimes, get paid handsomely to do so! It's just not a very fast or intuitive way to write a computer program. A compiler - a program that translates human readable code into machine bytecode - would go a long way towards helping us verify that this is actually a processor.
So, here's the million dollar followon question: can we make this a proper CPU?
Let's see what else we can get ChatGPT to produce for us. Stay tuned!