Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEE
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020
DyVEDeep: Dynamic Variable Effort Deep Neural NetworksSanjay GanapathySwagath Venkataramaniet al.2020ACM TECS
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networksXiao SunJungwook Choiet al.2019NeurIPS 2019
Memory and Interconnect Optimizations for Peta-Scale Deep Learning SystemsSwagath VenkataramaniVijayalakshmi Srinivasanet al.2019HiPC 2019
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗Swagath VenkataramaniJungwook Choiet al.2019IISWC 2019
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI AcceleratorSwagath VenkataramaniJungwook Choiet al.2019IEEE Micro
Dynamic Spike Bundling for Energy-Efficient Spiking Neural NetworksSarada KrithivasanSanchari Senet al.2019ISLPED 2019
BiScaled-DNN: Quantizing long-tailed datastructures with two scale factors for deep neural networksShubham JainSwagath Venkataramaniet al.2019DAC 2019
SparCE: Sparsity Aware General-Purpose Core Extensions to Accelerate Deep Neural NetworksSanchari SenShubham Jainet al.2019IEEE TC
24 Feb 2025US12236338Single Function To Perform Combined Matrix Multiplication And Bias Add Operations
11 Nov 2024US12141513Method To Map Convolutional Layers Of Deep Neural Network On A Plurality Of Processing Elements With Simd Execution Units, Private Memories, And Connected As A 2d Systolic Processor Array
30 Apr 2024TWI840790Single Function To Perform Combined Matrix Multiplication And Bias Add Operations
21 Apr 2024JP7477249System-aware Selective Quantization For Performance Optimized Distributed Deep Learning