GShare / TAGE overriding branch predictor, with a 256-entry Branch Target Buffer and a 16-entry
Next Line Predictor
Pipeline flushing on branch misprediction and precise exception handling
2 integer units, 1 pipelined multiplication and division unit and 1 pipelined load/store unit, with grouped bypass network
4KiB 4-way instruction cache with Pseudo-LRU replacement policy and 16KiB non-blocking data cache to avoid stalling the load/store pipeline on cache misses
Use ROB to force the MMIO access seen in order
Explicit register renaming to handle WAW, WAR hazards
Exceeded the 40x IPC than Loongson GS132 (single-issue, 3-stage pipeline) on crc32, select sort, 30x on sha, stream copy
RTL design using Verilog, synthesis and implementation using Vivado, with target frequency 88 MHz on Artix-7 FPGA