A synthesizable VHDL implementation of a 64-entry Content Addressable Memory (CAM) designed for high-speed lookup operations in FPGA applications.
- Overview
- Features
- Architecture
- Interface Specification
- Memory Configuration
- File Structure
- Usage
- Synthesis
- Testing
- Performance
- Applications
- Contributing
This project implements a fully synthesizable Content Addressable Memory (CAM) in VHDL, capable of storing 64 key-data pairs with simultaneous search capability. Unlike traditional memory systems that require addresses to access data, CAM allows direct content-based lookup, making it ideal for applications requiring fast search operations such as routing tables, cache implementations, and pattern matching systems.
- 64-Entry Capacity: Stores up to 64 key-data pairs
- 22-bit Key Width: Supports keys up to 4,194,304 unique values
- 32-bit Data Width: Each entry stores 32-bit associated data
- Single Clock Cycle Lookup: Combinatorial matching for high-speed operation
- Priority Encoding: Returns lowest-index match when multiple entries match
- Round-Robin Replacement: Automatic replacement policy for full CAM scenarios
- Synchronous Write Operations: Clock-synchronized data storage
- Hit/Miss Detection: Explicit feedback for search operations
- Fully Synthesizable: Designed for FPGA implementation
The CAM implementation follows a modular architecture with clear separation of concerns:
CAM Top Level
├── KEY_FILE (Tag Storage & Matching)
│ ├── D_FF Array (64 × 22-bit registers)
│ ├── MATCHING_CIRCUIT
│ │ ├── COMPARATOR Array (64 parallel comparators)
│ │ └── PRIORITY_ENCODER (64→6 bit encoder)
│ ├── REPLACEMENT_POINTER (Round-robin counter)
│ ├── MULTIPLEXER (Address selection)
│ └── DECODER (Write enable generation)
└── DATA_FILE (Associated Data Storage)
└── Register Array (64 × 32-bit entries)
- MATCHING_CIRCUIT: Parallel comparison of input key against all stored tags
- PRIORITY_ENCODER: Identifies the lowest-index matching entry
- REPLACEMENT_POINTER: Maintains next available location for new entries
- DECODER: Generates write enable signals for tag storage
- DATA_FILE: Stores associated data for each CAM entry
| Port | Width | Description |
|---|---|---|
UNI_CLK |
1-bit | System clock input |
UNI_RST |
1-bit | Synchronous reset (active high) |
UNI_KEY_IN |
22-bit | Search/write key input |
UNI_WR_EN |
1-bit | Write enable signal |
INPUT_DATA |
32-bit | Data to be written with key |
| Port | Width | Description |
|---|---|---|
HIT |
1-bit | Match found indicator (1=hit, 0=miss) |
OUTPUT_DATA |
32-bit | Associated data for matched key |
The CAM is configured through the CAM_PKG.vhd package file:
-- Configuration Constants
KEY_IN_SIZE : 22 bits -- Input key width
KEY_FILE_OUTPUT : 6 bits -- Address width (log₂(64))
DATA_SIZE : 32 bits -- Associated data width
DECODER_OUTPUT_VECTOR : 64 entries -- Total CAM entries
TAGS_SIZE : 1408 bits -- Total tag storage (64×22)- Key Storage: 64 entries × 22 bits = 1,408 bits
- Data Storage: 64 entries × 32 bits = 2,048 bits
- Total Memory: 3,456 bits ≈ 432 bytes per CAM instance
- Addressing Range: 2²² = 4,194,304 unique key values
├── images
├── src
| ├── packages
| | └── CAM_PKG.vhd # Configuration package
| └── modules
| ├── CAM
| | ├── CAM.vhd # Top-level CAM entity
| | └── CAM_TB.vhd # Comprehensive testbench
| ├── DATA_FILE
| | └── DATA_FILE.vhd # Associated data storage
| └── KEY_FILE
| ├── KEY_FILE.vhd # Key storage and matching logic
| ├── MATCHING_CIRCUIT.vhd # Parallel key comparison
| ├── COMPARATOR.vhd # Single key comparator
| ├── PRIORITY_ENCODER.vhd # Match priority resolution
| ├── REPLACEMENT_POINTER.vhd # Round-robin replacement
| ├── DECODER.vhd # Write enable decoder
| ├── MULTIPLEXER.vhd # Address multiplexer
| └── D_FF.vhd # D flip-flop with enable
|
└── compile.tcl # Tcl file to compile all the files in a proper configuration
-- Store key-data pair
UNI_KEY_IN <= "0000000000000000000001"; -- Key = 1
INPUT_DATA <= x"AAAA0001"; -- Data = 0xAAAA0001
UNI_WR_EN <= '1'; -- Enable write
-- Wait for clock edge-- Search for key
UNI_KEY_IN <= "0000000000000000000001"; -- Search key = 1
UNI_WR_EN <= '0'; -- Disable write
-- Check HIT signal and OUTPUT_DATA on next clock cycleentity TOP_DESIGN is
port (
CLK : in std_logic;
RST : in std_logic;
-- CAM interface signals
);
end TOP_DESIGN;
architecture STRUCT of TOP_DESIGN is
component CAM is
port (
UNI_CLK : in std_logic;
UNI_RST : in std_logic;
UNI_KEY_IN : in std_logic_vector(21 downto 0);
UNI_WR_EN : in std_logic;
INPUT_DATA : in std_logic_vector(31 downto 0);
HIT : out std_logic;
OUTPUT_DATA : out std_logic_vector(31 downto 0)
);
end component;
begin
CAM_INST: CAM port map (
UNI_CLK => CLK,
UNI_RST => RST,
-- Connect other signals...
);
end STRUCT;- Logic Elements: ~2,000-3,000 LEs
- Memory Bits: 3,456 bits (can use BRAM or distributed RAM)
- DSP Blocks: 0
- Maximum Frequency: 200+ MHz (device dependent)
- Target Device: Optimized for modern FPGA families (Intel Arria/Cyclone, Xilinx Zynq/Kintex)
- Timing Constraints: Set appropriate clock constraints for your target frequency
- Resource Optimization: Consider using BRAM for data storage in resource-constrained designs
- Pipeline Considerations: Single-cycle operation for maximum throughput
# Set top-level entity
set_global_assignment -name TOP_LEVEL_ENTITY CAM
# Add source files
set_global_assignment -name VHDL_FILE CAM_PKG.vhd
set_global_assignment -name VHDL_FILE CAM.vhd
# ... (add all source files)
# Set timing constraints
create_clock -name "UNI_CLK" -period 10.000 [get_ports {UNI_CLK}]The comprehensive testbench (CAM_TB.vhd) validates:
- ✅ Reset Functionality: Proper initialization
- ✅ Write Operations: Key-data pair storage
- ✅ Search Operations: Hit/miss detection
- ✅ Data Retrieval: Correct associated data output
- ✅ Priority Handling: Lowest-index match priority
- ✅ Replacement Policy: Round-robin replacement behavior
# ModelSim/QuestaSim
vlib work
vcom CAM_PKG.vhd CAM.vhd [all_source_files] CAM_TB.vhd
vsim CAM_TB
run -all
# GHDL
ghdl -a CAM_PKG.vhd CAM.vhd [all_source_files] CAM_TB.vhd
ghdl -e CAM_TB
ghdl -r CAM_TB --stop-time=500ns- Lookup Latency: 1 clock cycle (combinatorial matching)
- Write Latency: 1 clock cycle (synchronous storage)
- Throughput: 1 operation per clock cycle
- Maximum Frequency: >200 MHz (FPGA dependent)
- Entries: Easily configurable by modifying
DECODER_OUTPUT_VECTOR - Key Width: Adjustable via
KEY_IN_SIZEparameter - Data Width: Configurable through
DATA_SIZEparameter
This CAM implementation is ideal for:
- Routing Tables: IP address lookup in network processors
- Cache Controllers: Tag comparison in processor caches
- Pattern Matching: Real-time pattern detection systems
- Security Systems: Access control and firewall applications
- Database Acceleration: Hardware-accelerated database indexing
- AI/ML Inference: Feature matching in neural networks
- ✅ High-speed parallel search capability
- ✅ Single clock cycle operation
- ✅ Modular, maintainable design
- ✅ Fully synthesizable for FPGA implementation
- ✅ Configurable entry count and widths
⚠️ Resource usage scales linearly with entry count⚠️ Power consumption increases with parallel comparators⚠️ Limited to exact-match searches (no partial matching)
To modify the CAM configuration:
- Edit
CAM_PKG.vhdto change memory dimensions - Regenerate all dependent files if interface widths change
- Update testbench to match new configuration
- Resynthesize for target FPGA platform
Example configuration for 128-entry CAM:
CONSTANT DECODER_OUTPUT_VECTOR : INTEGER := 128; -- 128 entries
CONSTANT KEY_FILE_OUTPUT : INTEGER := 7; -- log₂(128) = 7 bits
CONSTANT TAGS_SIZE : INTEGER := 2816; -- 128×22 bitsContributions are welcome! Areas for enhancement:
- Ternary CAM (TCAM) support for masked searches
- Multiple match handling beyond priority encoding
- Different replacement policies (LRU, LFU, random)
- Power optimization features
- Built-in aging mechanisms for dynamic entries
This project is released under the GPL-3.0 license, so feel free to use it or make it better ;)
For questions, suggestions, or collaboration opportunities, please open an issue on GitHub.
Note: This implementation prioritizes clarity and educational value while maintaining synthesizable, production-ready code quality. Performance characteristics may vary depending on target FPGA family and synthesis tool optimization settings.

