Showing 1–20 of 20 results
/ Date/ Name
Apr 8, 2026Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLMJan 20, 2026Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision FlowJun 19, 2025SparseLoRA: Accelerating LLM Fine-Tuning with Contextual SparsityMay 28, 2025Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel DecodingDec 5, 2024NVILA: Efficient Frontier Visual Language ModelsOct 25, 2024COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 TrainingOct 14, 2024SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion TransformersSep 6, 2024VILA-U: a Unified Foundation Model Integrating Visual Understanding and GenerationJul 26, 2024Wolf: Dense Video Captioning with a World Summarization FrameworkJul 24, 2024VILA$^2$: VILA Augmented VILAMar 28, 2024Tiny Machine Learning: Progress and FuturesOct 26, 2023PockEngine: Sparse and Efficient Fine-tuning in a PocketJun 30, 2022On-Device Training Under 256KB MemoryApr 25, 2022Enable Deep Learning on Mobile Devices: Methods, Systems, and ApplicationsNov 2, 2020IOS: Inter-Operator Scheduler for CNN AccelerationMay 28, 2020HAT: Hardware-Aware Transformers for Efficient Natural Language ProcessingJun 21, 2019Deep Leakage from GradientsDec 2, 2018ProxylessNAS: Direct Neural Architecture Search on Target Task and HardwareJan 18, 2018Sparsely Aggregated Convolutional NetworksDec 5, 2017Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering