Projects

Tools

  • PCC: MLIR-Based Compilation Framework for Photonic-Electronic Accelerators

    • Created an end-to-end compilation framework for neural networks using MLIR and LLVM for photonic-electronic accelerators

    • Utilized MLIR to automate the mapping of high-level DNN specifications to photonic hardware architectures

    • Demonstrated up to 4x speedup on DNN workloads compared to traditional hand-crafted implementations using custom optimizations

    • Positioned the framework as a foundational tool for developing novel photonic circuit designs for real-world AI applications.

  • FIONA: Full-Stack Photonic-Electronic Architectural Simulation Framework

    • Designed an ISA for photonic-electronic collaborative processor

    • Implemented a baseline vector processing unit (FIONA-V) compatible with photonic computation circuits

    • Capable of simulating photonic circuits with the electronic counterparts.

CPU Cores

  • A RISC-V Processor Core running Linux · [Code]

    • Designed a 3-stage pipeline including (1) instruction fetch, (2) decode & execute and (3) commit & exception handling

    • Implemented Translation Lookaside Buffer and hardware Page Table Walker to support hardware virtual memory management

    • Implemented AXI interface to integrate the core into SoC that work with peripherals (DDR controllers, Ethernet, etc.)

  • MIPS Out-of-Order Superscalar Processor on FPGA · [Doc] · [Code] · [Slides]

    • GShare / TAGE overriding branch predictor, with a 256-entry Branch Target Buffer and a 16-entry

    • Next Line Predictor

    • Pipeline flushing on branch misprediction and precise exception handling

    • 2 integer units, 1 pipelined multiplication and division unit and 1 pipelined load/store unit, with grouped bypass network

    • 4KiB 4-way instruction cache with Pseudo-LRU replacement policy and 16KiB non-blocking data cache to avoid stalling the load/store pipeline on cache misses

    • Use ROB to force the MMIO access seen in order

    • Explicit register renaming to handle WAW, WAR hazards

    • Exceeded the 40x IPC than Loongson GS132 (single-issue, 3-stage pipeline) on crc32, select sort, 30x on sha, stream copy

    • RTL design using Verilog, synthesis and implementation using Vivado, with target frequency 88 MHz on Artix-7 FPGA

Presentations

  • RISC-V虚拟内存扩展的概念验证 · Jun. 25, 2021 · Shanghai, China (Virtual talk) · [Video]

    • Discussed the recent progress in the POC for several RISC-V virtual memory extension.