"au:"Yongji Wu"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yongji Wu"" — arXiv2 Search

Showing 1–20 of 27 results

/ Date/ Name

Jul 5, 2024Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models May 10, 2022Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures May 28, 2021Rethinking Lifelong Sequential Recommendation with Incremental Multi-Interest Attention May 28, 2021Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation Apr 4, 2025HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Jan 17, 2024Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native Nov 21, 2023AR Visualization System for Ship Detection and Recognition Based on AI Mar 12, 2025Prompt Inversion Attack against Collaborative Inference of Large Language Models Sep 19, 2025RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation Apr 14, 2023Remote Procedure Call as a Managed System Service Oct 2, 2024ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-Serving Apr 12, 2025DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving Jan 3, 2026Curator: Efficient Vector Search with Low-Selectivity Filters Oct 22, 2025RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs Aug 17, 2021How Powerful is Graph Convolution for Recommendation?Nov 22, 2021Poisoning Attacks to Local Differential Privacy Protocols for Key-Value Data Jun 29, 2024VcLLM: Video Codecs are Secretly Tensor Codecs Jun 9, 2025LEANN: A Low-Storage Vector Index Oct 16, 2025Cross-Scenario Unified Modeling of User Interests at Billion Scale Apr 24, 2025An Extensible Software Transport Layer for GPU Networking