Getting Solid on Metastability
Metastability as a common design concern in FPGAs. Unfortunately, it's a concept that most folks have to learn about the hard way - in a design that isn't quite working right. What's the deal with metastability? Why does it matter for FPGA designers? And how can we modify our designs to prevent or deal with metastable states?
What is Metastability?
Metastability is a phenomenon where a digital flip-flop (like a D flip-flop) enters an undefined state, somewhere between a logical '0' and a logical '1'. This happens when the input to the flip-flop changes too close to the clock edge. More accurately: it's when the input signal of the flip-flop violates the flip-flop's setup or hold time requirements. In this scenario, the output can become unpredictable and may take a longer time than a clock period to resolve to a stable state. Even worse: it might oscillate before settling.
Why is it a Concern in FPGA Design?
FPGAs often deal with multiple clock domains, especially in complex designs. When signals pass from one clock domain to another, they can potentially violate setup or hold times due to the phase difference or frequency difference between the two clocks. This can introduce metastability at the flip-flops in the clock crossing. If metastable events propagate to other parts of the design, they can cause functional errors or even system crashes. For example, if a metastable signal is used as a memory address or a control signal, it might lead to unpredictable behavior. Errors of this kind are extremely frustrating to debug: they are random in their occurrence, and produce random bit values!
Note that this is just as true for GPIO and external asynchronous signals as it is for signals within the FPGA. The transition from IO to slice fabric is the ultimate clock domain crossing!
How is Metastability Managed?
- Synchronization Circuits: The most common method to handle metastability is to use synchronization flip-flops. A signal crossing from one clock domain to another is passed through a series of flip-flops (usually two or more) clocked by the destination clock. This chain of flip-flops reduces the probability of metastability propagating further into the design.
- Timing Constraints: Properly setting up timing constraints ensures that the tools are aware of the paths that cross clock domains. This allows the synthesis and place-and-route tools to optimize the design without risking metastability.
- Avoiding Clock Domain Crossing (CDC): Wherever possible, designers try to keep signals within the same clock domain. If data must be transferred between clock domains, designers might use techniques like FIFOs or handshake protocols to ensure safe transfer.
- CDC Analysis Tools: These are specialized tools that help in identifying and verifying clock domain crossings in the design. They can highlight potential areas of concern, allowing designers to address them proactively.
A Simple Dual-Flop Synchronizer
The following shows a very simple synchronizer written in Verilog:
module sync_ff ( input wire clk_src, // Source Clock input wire clk_dest, // Destination Clock input wire async_signal, // Asynchronous signal from source domain output wire sync_signal // Synchronized signal in destination domain ); // Two-stage synchronization flip-flops reg ff1 = 0; reg ff2 = 0; reg ff3 = 0; always @(posedge clk_src or negedge reset) begin if (!reset) ff1 <= 0; else ff1 <= async_signal; // Capture the asynchronous signal on the rising edge of source clock end // always @(posedge clk_src or negedge reset) begin always @(posedge clk_dest or negedge reset) begin if (!reset) begin ff2 <= 0; ff3 <= 0; end else ff2 <= ff1; // Capture the output of the first flip-flop on the rising edge of destination clock ff3 <= ff2; // Allow ff2's signal to stabilize for one clock cycle end end // always @(posedge clk_dest or negedge reset) begin assign sync_signal = ff3; // Output the synchronized signal endmodule
This synchronizer gives the input signal, async_signal, one clock cycle in its original clock domain, to latch into flop ff1. This signal gets latched into the destination clock domain by ff2 and ff3. The first register in the destination domain serves to hold the signal for one cycle, and iron out any potential metastability issues.
This ultimately synthesizes out to a circuit like the one below:
Further Reading
Colin O'Flynn went the extra mile in demonstrating the exact effects of metastability by using a Xilinx FPGA's internal delay lines in tandem with a high speed scoop. We highly recommend you check out his writeup if you want to learn more about FPGA synchronization!