Software Release

EBMF: Exact Binary Matrix Factorization	https://github.com/UCLA-VAST/EBMF This project provides SMT solving method and a heuristic, row packing, for the exact binary matrix factorization (EBMF) problem. Additionally, we provide an SMT method to find fooling set size of a binary matrix.	2024
OLSQ-DPQA	Optimal Layout Synthesizer of Quantum Circuits for Dynamically Field-Programmable Qubits Array. https://github.com/UCLA-VAST/DPQA	2023
Callipepla	Callipepla & SerpensCG are two conjugate gradient solvers on HBM FPGA. https://github.com/UCLA-VAST/Callipepla	2023
Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication	Serpens is a high bandwidth memory based accelerator for general-purpose sparse matrix-vector multiplication. We build Serpens accelerator on Xilinx Alveo U280 card. Serpens achieves up to 60.55 GFLOP/s (30,204 MTEPS). https://github.com/UCLA-VAST/Serpens	2022
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication	Sextans is an FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM). https://github.com/UCLA-VAST/Sextans	2022
Pyxis: An Open-Source Performance Dataset of Sparse Accelerators	Pyxis collects open-source accelerator designs and the performance data. https://github.com/UCLA-VAST/Pyxis	2021
AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators	https://github.com/UCLA-VAST/AutoDSE Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient...	2021
Merlin Compiler	https://github.com/Xilinx/merlin-compiler We are excited that Xilinx has made the decision to open-source the Merlin compiler under the Apache license. The Merlin compiler was originally developed by the Falcon Computing Solutions, a spin-off from the VAST Lab, which was acquired by Xilinx in 2020. Multiple research projects in the VAST Lab, such as [S2FA], [HeteroCL], and...	2021
AutoSA: Polyhedral-Based Systolic Array Auto-Compilation	https://github.com/UCLA-VAST/AutoSA	2021
AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs	https://github.com/Licheng-Guo/AutoBridge	2021
Extending High-Level Synthesis for Task-Parallel Programs	Codebase: https://github.com/UCLA-VAST/tapa Documentation: https://tapa.rtfd.io C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly...	2021
OLSQ: Optimal Layout Synthesis for Quantum Computing	Many quantum computers have constraints on the connections between qubits. However, a quantum program may not conform to these constraints. Thus, it is necessary to perform 'layout synthesis for quantum computing', LSQC, which transforms quantum programs prior to execution so that the connectivity issues are resolved. OLSQ can solve LSQC optimally with respect to depth, number of SWAP gates, or fidelity. There is also a transition-based mode (TB) to speed it up with little loss of optimality. [link]	2020
HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration	https://github.com/UCLA-VAST/heterohalide	2020
SODA: Stencil with Optimized Dataflow Architecture	https://github.com/UCLA-VAST/soda Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial differential equations, and cellular automata. Many of the stencil kernels are complex, usually consist of multiple stages or iterations, and often contain redundant computation. Such kernels are often offloaded to FPGAs to take advantages of the efficiency of dedicated hardware accelerators. However, implementing such complex kernels efficiently is not trivial, due to...	2020
QUEKO benchmarks	https://github.com/UCLA-VAST/QUEKO-benchmark QUantum Mapping Examples with Known Optimal are a few families of quantum programs, i.e., quantum circuits, that have known optimal depths and gate counts for corresponding quantum devices in layout synthesis for quantum computing.	2020
FlexCNN: End-to-End Optimization of Deep Learning Applications	https://github.com/UCLA-VAST/FlexCNN Tutorial slides: Click [here]	2020
HeteroRefactor: Refactoring for Heterogeneous Computing with FPGA	https://github.com/heterorefactor/heterorefactor	2020
Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing	https://github.com/UCLA-VAST/minimap2-acceleration	2019
INSIDER	INSIDER is an FPGA-based full-stack in-storage computing system: https://github.com/zainryan/INSIDER-System https://github.com/zainryan/EISC Please click the above link for further details.	2019
HeteroCL	HeteroCL is a programming infrastructure composed of a Python-based domain-specific language (DSL) and a compilation flow through close collaboration by research groups led by Prof. Zhiru Zhang at Cornell and Prof. Jason Cong at UCLA. The HeteroCL DSL provides a clean abstraction that decouples algorithm specification from three important types of hardware customization in compute, data types, and memory architectures. HeteroCL further captures the interdependence among these techniques, allowing programmers to explore various trade-offs in a systematic and productive manner. In addition,...	2019
Caffeine		2019
Cloud-Scale BWAMEM	Cloud-scale BWAMEM (CS-BWAMEM) is an ultrafast and highly scalable aligner built on top of cloud infrastructures, including Spark and Hadoop distributed file system (HDFS). It leverages the abundant computing resources in a public or private cloud to fully exploit the parallelism obtained from the enormous number of reads. With CSBWAMEM, the pair-end whole-genome reads (30x) can be aligned within 80 minutes in a 25-node cluster with 300 cores. The features include: 1) support both pair-end and single-end alignment; 2) achieve similar quality to BWA-MEM; 3) Input: FASTQ files and 4) output...	2019
Microbenchmarks to Characterize Modern CPU-FPGA Platforms	With the rapid evolution of CPU-FPGA heterogeneous acceleration platforms, it is critical for both platform developers and users to quantify the fundamental microarchitectural features of the platforms. We developed a set of microbenchmarks to evaluate mainstream CPU-FPGA platforms. The first benchmark (https://github.com/peterpengwei/Microbench_AlphaData) is dedicated to the Alpha Data card which connects a CPU with an FPGA via the PCIe interface. The benchmark follows the Xilinx SDAccel programming model, and...	2017
PARADE: Full-System Accelerator-Rich Architecture Simulator	PARADE is a cycle-accurate full-system simulation platform that enables the design and exploration of the emerging accelerator-rich architectures (ARA). It extends the widely used gem5 simulator with high-level synthesis (HLS) support. parade.png ...	2017
CMOST - System-Level FPGA Synthesis	CMOST is a system-level design automation framework for FPGA. The main features are: Analyze and extract system-level information and generate task level data model System-level optimizations for parallelism, task mapping and scheduling, pipelined streaming and data organization Module evaluation using high-level synthesis System-level module selection and duplication ...	2015
PolyOpt/HLS: Polyhedral-Based Data Reuse Optimization for FPGA	PolyOpt/HLS is a polyhedral loop optimization framework dedicated to data reuse optimization for High-Level Synthesis, integrated in the ROSE compiler. The main features are: Automatic extraction of regions that can be optimized in the polyhedral model Full support of PoCC (the Polyhedral Compiler Collection) analysis and optimizations Dependence analysis with Candl Program transformations for tiling and parallelism with Pluto Code generation with CLooG Parametric tiling with PTile Data reuse optmization with LMP ...	2013
LEKO/LEKU	LEKO and LEKU Suites [GitHub] (Logic synthesis Examples with Known Optimal/Upper-bounds) Director : Prof. Jason Cong Author : Kirill Minkovich Copyright 2005-2008 the Regents of University of California ...	2006
PEKO-MS (placement suboptimality benchmarks with parametrized white space)	Open-source repository: https://github.com/jshinnerl/pekoMS_2006_book The generating algorithm is described in https://doi.org/10.1007/978-0-387-68739-1_2	2006
xPilot: Platform-based Behavior Synthesis System	The xPilot Team: Professor Jason Cong Researchers: Deming Chen, Yiping Fan, Guoling Han, Wei Jiang, Bin Liu, Junjuan Xu, Zhiru Zhang xpilot-arch.gif ...	2006
TPEKO Suite (Timing-driven Placement Example with Known Optimal delay)	https://cadlab.cs.ucla.edu/~pubbench/tpeko.htm	2004
PEKO Suite (Placement Example with Known Optimal wirelength)	https://cadlab.cs.ucla.edu/~pubbench/peko.htm	2003
fpgaEva : A Heterogeneous FPGA Evaluation Tool	fpgaEva is a heterogeneous FPGA evaluation tool that incorporates a set of architecture evaluation related features into a user friendly JAVA interface. Modern field programmable gate arrays (FPGAs) provide in a single device both logic array for general logic functions and embedded memory blocks (EMBs) for efficient implementation of on-chip memory and specialized logic functions. Besides, recent generation of FPGAs take advantage of speed and density benefits resulted from heterogeneous FPGAs, which provide either an array of homogeneous programmable logic blocks (PLBs), each configured...	2003
MCAS: Multi-Cycle Architectural Synthesis System	The MCAS system accepts behavioral C and VHDL, performs aggressive high-level synthesis and optimization coupled with physical planning to optimize design performance, and generates RTL implementations together with physical constraints and timing constraints (e.g., multi-cycle path constraints) which serve as guidelines for the downstream tools. The underlying theme of this research is to raise the design abstraction from RTL to higher-level description without losing the physical reality. The Team Professor: Jason Cong...	2003
3-D IC Physical Design and 3-D Architecture Exploration	3-D ICs have recently attracted great interest from researchers and IC designers. Studies demonstrate a potential performance improvement of up to 65% by transferring a placement from 2-D to 3-D and eliminating long interconnects. Furthermore, the multiple device layer structure of 3-D ICs provides a platform to integrate different components, such as digital ICs, analog ICs, memory, RF modules, and different technologies such as SOI, SiGe HBTs, GaAs, etc., into one single circuit stack. Thus, it is a more flexible vehicle for system-on-chip (SoC) and system-in-package (SiP) designs...	2002
CPMO --- Constrained Placement by Multilevel Optimization	Placement is one of the most important steps in the post-RTL synthesis process, as it directly defines the interconnects, which are now the bottleneck in circuit and system performance in deep submicron technologies. The placement problem has been studied extensively in the past 30 years. However, a study from UCLA shows that existing placement solutions are surprisingly far from optimal. Using a set of cleverly constructed circuit placement examples with known optima (PEKO) that match many industrial circuit characteristics, the study shows that the results of leading placement tools from...	2002
RASP: FPGA/CPLD Technology Mapping and Synthesis Package	RASP, an FPGA/CPLD technology mapping and synthesis package, is the synthesis core of the UCLA RASP System developed at UCLA VLSI CAD LAB. This site is actively updated. Rasp team: Jason Cong Deming Chen Eugene Ding Zhijun Huang Yean-Yow Hwang John Peck Chang Wu Songjie Xu Copyright (C) 1991-2004 the Regents of University of California ...	2000
Performance Estimation Models for Optimized Interconnects (IPEM)	IPEM provides a set of procedures that estimate performance under interconnect optimization for deep submicron technology. Adopting models derived from several interconnection optimization algorithms of Trio, IPEM is fast, accurate, and easy to be linked to user's application programs. The results of IPEM match well with the UCLA Trio package. IPEM team...	2000
mGP - A Multilevel Global Placement Tool	mGP - A Multilevel Global Placement Tool	2000
V4R - Multilayer MCM Router	V4R - Multilayer MCM Router	1998
TRIO - Tree, Repeater and Interconnect Optimization Package	TRIO - Tree, Repeater and Interconnect Optimization Package	1998

Software Release

parade.png

xpilot-arch.gif

The Team

Main menu