Showing 1–20 of 20 results
/ Date/ Name
Nov 19, 2025MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert SkippingOct 31, 2025Phased DMD: Few-step Distribution Matching Distillation via Score Matching within SubintervalsAug 13, 2025LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play ToolkitJun 4, 2025Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationMay 16, 2025QVGen: Pushing the Limit of Quantized Video Generative ModelsJul 30, 2024OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation BalanceMar 11, 20242023 Low-Power Computer Vision Challenge (LPCVC) SummaryNov 27, 2023TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion ModelsOct 20, 2023Exploring the Potential of Flexible 8-bit Format: Design and AlgorithmAug 8, 2023Lossy and Lossless (L$^2$) Post-training Model Size CompressionJul 1, 2023SysNoise: Exploring and Benchmarking Training-Deployment System InconsistencyApr 18, 2023Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scalingSep 28, 2022Exploring the Relationship between Architecture and Adversarially Robust GeneralizationNov 5, 2021MQBench: Towards Reproducible and Deployable Model Quantization BenchmarkSep 2, 2021Real World Robustness from Systematic NoiseJun 13, 2021A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks CalibrationFeb 10, 2021BRECQ: Pushing the Limit of Post-Training Quantization by Block ReconstructionOct 9, 2020Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture SearchDec 29, 2019Towards Unified INT8 Training for Convolutional Neural NetworkAug 14, 2019Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks