Software Release

EBMF: Exact Binary Matrix Factorization

https://github.com/UCLA-VAST/EBMF This project provides SMT solving method and a heuristic, row packing, for the exact binary matrix factorization (EBMF) problem. Additionally, we provide an SMT method to find fooling set size of a binary matrix.

2024
OLSQ-DPQA

Optimal Layout Synthesizer of Quantum Circuits for Dynamically Field-Programmable Qubits Array. https://github.com/UCLA-VAST/DPQA

2023
Callipepla

Callipepla & SerpensCG are two conjugate gradient solvers on HBM FPGA.

https://github.com/UCLA-VAST/Callipepla

2023
Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

Serpens is a high bandwidth memory based accelerator for general-purpose sparse matrix-vector multiplication. We build Serpens accelerator on Xilinx Alveo U280 card. Serpens achieves up to 60.55 GFLOP/s (30,204 MTEPS).

https://github.com/UCLA-VAST/Serpens

2022
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Sextans is an FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM).

https://github.com/UCLA-VAST/Sextans

2022
Pyxis: An Open-Source Performance Dataset of Sparse Accelerators

Pyxis collects open-source accelerator designs and the performance data.

https://github.com/UCLA-VAST/Pyxis

2021
AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

https://github.com/UCLA-VAST/AutoDSE

 

Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient...

2021
Merlin Compiler

https://github.com/Xilinx/merlin-compiler

We are excited that Xilinx has made the decision to open-source the Merlin compiler under the Apache license. The Merlin compiler was originally developed by the Falcon Computing Solutions, a spin-off from the VAST Lab, which was acquired by Xilinx in 2020. Multiple research projects in the VAST Lab, such as [S2FA], [HeteroCL], and...

2021
AutoSA: Polyhedral-Based Systolic Array Auto-Compilation

https://github.com/UCLA-VAST/AutoSA

2021
AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs

https://github.com/Licheng-Guo/AutoBridge

2021
Extending High-Level Synthesis for Task-Parallel Programs

Codebase: https://github.com/UCLA-VAST/tapa
Documentation: https://tapa.rtfd.io

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly...

2021
OLSQ: Optimal Layout Synthesis for Quantum Computing

Many quantum computers have constraints on the connections between qubits. However, a quantum program may not conform to these constraints. Thus, it is necessary to perform 'layout synthesis for quantum computing', LSQC, which transforms quantum programs prior to execution so that the connectivity issues are resolved. OLSQ can solve LSQC optimally with respect to depth, number of SWAP gates, or fidelity. There is also a transition-based mode (TB) to speed it up with little loss of optimality. [link]

2020
HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration

https://github.com/UCLA-VAST/heterohalide

2020
SODA: Stencil with Optimized Dataflow Architecture

https://github.com/UCLA-VAST/soda

Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial differential equations, and cellular automata. Many of the stencil kernels are complex, usually consist of multiple stages or iterations, and often contain redundant computation. Such kernels are often offloaded to FPGAs to take advantages of the efficiency of dedicated hardware accelerators. However, implementing such complex kernels efficiently is not trivial, due to...

2020
QUEKO benchmarks

https://github.com/UCLA-VAST/QUEKO-benchmark

QUantum Mapping Examples with Known Optimal are a few families of quantum programs, i.e., quantum circuits, that have known optimal depths and gate counts for corresponding quantum devices in layout synthesis for quantum computing. 

2020
FlexCNN: End-to-End Optimization of Deep Learning Applications

https://github.com/UCLA-VAST/FlexCNN

 

Tutorial slides:
Click [here]

2020
HeteroRefactor: Refactoring for Heterogeneous Computing with FPGA

https://github.com/heterorefactor/heterorefactor

2020
Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing

https://github.com/UCLA-VAST/minimap2-acceleration

2019
INSIDER

INSIDER is an FPGA-based full-stack in-storage computing system: 

https://github.com/zainryan/INSIDER-System

https://github.com/zainryan/EISC

Please click the above link for further details.

2019
HeteroCL

HeteroCL is a programming infrastructure composed of a Python-based domain-specific language (DSL) and a compilation flow through close collaboration by research groups led by Prof. Zhiru Zhang at Cornell and Prof. Jason Cong at UCLA. The HeteroCL DSL provides a clean abstraction that decouples algorithm specification from three important types of hardware customization in compute, data types, and memory architectures. HeteroCL further captures the interdependence among these techniques, allowing programmers to explore various trade-offs in a systematic and productive manner. In addition,...

2019
Caffeine 2019
Cloud-Scale BWAMEM

Cloud-scale BWAMEM (CS-BWAMEM) is an ultrafast and highly scalable aligner built on top of cloud infrastructures, including Spark and Hadoop distributed file system (HDFS). It leverages the abundant computing resources in a public or private cloud to fully exploit the parallelism obtained from the enormous number of reads. With CSBWAMEM, the pair-end whole-genome reads (30x) can be aligned within 80 minutes in a 25-node cluster with 300 cores. The features include: 1) support both pair-end and single-end alignment; 2) achieve similar quality to BWA-MEM; 3) Input: FASTQ files and 4) output...

2019
Microbenchmarks to Characterize Modern CPU-FPGA Platforms

With the rapid evolution of CPU-FPGA heterogeneous acceleration platforms, it is critical for both platform developers and users to quantify the fundamental microarchitectural features of the platforms. We developed a set of microbenchmarks to evaluate mainstream CPU-FPGA platforms.

The first benchmark (https://github.com/peterpengwei/Microbench_AlphaData) is dedicated to the Alpha Data card which connects a CPU with an FPGA via the PCIe interface. The benchmark follows the Xilinx SDAccel programming model, and...

2017
PARADE: Full-System Accelerator-Rich Architecture Simulator
PARADE is a cycle-accurate full-system simulation platform that enables the design and exploration of the emerging accelerator-rich architectures (ARA). It extends the widely used gem5 simulator with high-level synthesis (HLS) support. 
2017
CMOST - System-Level FPGA Synthesis

CMOST is a system-level design automation framework for FPGA. The main features are:

  • Analyze and extract system-level information and generate task level data model
  • System-level optimizations for parallelism, task mapping and scheduling, pipelined streaming and data organization
    • Module evaluation using high-level synthesis
    • System-level module selection and duplication
    • ...
2015
PolyOpt/HLS: Polyhedral-Based Data Reuse Optimization for FPGA

PolyOpt/HLS is a polyhedral loop optimization framework dedicated to data reuse optimization for High-Level Synthesis, integrated in the ROSE compiler. The main features are:

  • Automatic extraction of regions that can be optimized in the polyhedral model
  • Full support of PoCC (the Polyhedral Compiler Collection) analysis and optimizations
    • Dependence analysis with Candl
    • Program transformations for tiling and parallelism with Pluto
    • Code generation with CLooG
    • Parametric tiling with PTile
    • Data reuse optmization with LMP
    • ...
2013
LEKO/LEKU

LEKO and LEKU Suites [GitHub]

(Logic synthesis Examples with Known Optimal/Upper-bounds)

Director : Prof. Jason Cong

Author : Kirill Minkovich

Copyright 2005-2008 the Regents of University of California


...

2006
PEKO-MS (placement suboptimality benchmarks with parametrized white space)

Open-source repository: https://github.com/jshinnerl/pekoMS_2006_book

The generating algorithm is described in https://doi.org/10.1007/978-0-387-68739-1_2

2006
xPilot: Platform-based Behavior Synthesis System

The xPilot Team:

  • Professor Jason Cong
  • Researchers: Deming Chen, Yiping Fan, Guoling Han, Wei Jiang, Bin Liu, Junjuan Xu, Zhiru Zhang

2006
TPEKO Suite (Timing-driven Placement Example with Known Optimal delay)

https://cadlab.cs.ucla.edu/~pubbench/tpeko.htm

2004
PEKO Suite (Placement Example with Known Optimal wirelength)

https://cadlab.cs.ucla.edu/~pubbench/peko.htm

2003
fpgaEva : A Heterogeneous FPGA Evaluation Tool

fpgaEva is a heterogeneous FPGA evaluation tool that incorporates a set of architecture evaluation related features into a user friendly JAVA interface. Modern field programmable gate arrays (FPGAs) provide in a single device both logic array for general logic functions and embedded memory blocks (EMBs) for efficient implementation of on-chip memory and specialized logic functions. Besides, recent generation of FPGAs take advantage of speed and density benefits resulted from heterogeneous FPGAs, which provide either an array of homogeneous programmable logic blocks (PLBs), each configured...

2003
MCAS: Multi-Cycle Architectural Synthesis System

The MCAS system accepts behavioral C and VHDL, performs aggressive high-level synthesis and optimization coupled with physical planning to optimize design performance, and generates RTL implementations together with physical constraints and timing constraints (e.g., multi-cycle path constraints) which serve as guidelines for the downstream tools. The underlying theme of this research is to raise the design abstraction from RTL to higher-level description without losing the physical reality.

The Team

2003
3-D IC Physical Design and 3-D Architecture Exploration

3-D ICs have recently attracted great interest from researchers and IC designers. Studies demonstrate a potential performance improvement of up to 65% by transferring a placement from 2-D to 3-D and eliminating long interconnects. Furthermore, the multiple device layer structure of 3-D ICs provides a platform to integrate different components, such as digital ICs, analog ICs, memory, RF modules, and different technologies such as SOI, SiGe HBTs, GaAs, etc., into one single circuit stack. Thus, it is a more flexible vehicle for system-on-chip (SoC) and system-in-package (SiP) designs...

2002
CPMO --- Constrained Placement by Multilevel Optimization

Placement is one of the most important steps in the post-RTL synthesis process, as it directly defines the interconnects, which are now the bottleneck in circuit and system performance in deep submicron technologies. The placement problem has been studied extensively in the past 30 years. However, a study from UCLA shows that existing placement solutions are surprisingly far from optimal. Using a set of cleverly constructed circuit placement examples with known optima (PEKO) that match many industrial circuit characteristics, the study shows that the results of leading placement tools from...

2002
RASP: FPGA/CPLD Technology Mapping and Synthesis Package

RASP, an FPGA/CPLD technology mapping and synthesis package, is the synthesis core of the UCLA RASP System developed at UCLA VLSI CAD LAB. This site is actively updated.

 

Rasp team:

  • Jason Cong

  • Deming Chen

  • Eugene Ding

  • Zhijun Huang

  • Yean-Yow Hwang

  • John Peck

  • Chang Wu

  • Songjie Xu

Copyright (C) 1991-2004 the Regents of University of California



...

2000
Performance Estimation Models for Optimized Interconnects (IPEM)

IPEM provides a set of procedures that estimate performance under interconnect optimization for deep submicron technology. Adopting models derived from several interconnection optimization algorithms of Trio, IPEM is fast, accurate, and easy to be linked to user's application programs. The results of IPEM match well with the UCLA Trio package.

IPEM team...

2000
mGP - A Multilevel Global Placement Tool

mGP - A Multilevel Global Placement Tool

2000
V4R - Multilayer MCM Router

V4R - Multilayer MCM Router

1998
TRIO - Tree, Repeater and Interconnect Optimization Package

TRIO - Tree, Repeater and Interconnect Optimization Package

1998