4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021
Efficacy of Pruning in Ultra-Low Precision DNNsSanchari SenSwagath Venkataramaniet al.2021ISLPED 2021
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021
Efficient Management of Scratch-Pad Memories in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2021ISPASS 2021
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021
Value Similarity Extensions for Approximate Computing in General-Purpose ProcessorsYounghoon KimSwagath Venkataramaniet al.2021DATE 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020
Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEE
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020
30 Apr 2024TWI840790Single Function To Perform Combined Matrix Multiplication And Bias Add Operations
21 Apr 2024JP7477249System-aware Selective Quantization For Performance Optimized Distributed Deep Learning
27 Nov 2023US11831467Programmable Multicast Protocol For Ring-topology Based Artificial Intelligence Systems
06 Nov 2023US11810340System And Method For Consensus-based Representation And Error Checking For Neural Networks
11 May 2023CNZL202010150294.1Programmable Data Delivery To A System Of Shared Processing Elements With Shared Memory
KEKaoutar El MaghraouiPrincipal Research Scientist and Manager, AIU Spyre Model Enablement, AI Hardware Center