ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding

/ Authors

Lu Ren, Junda She, Xinchen Luo, Tao Wang, Xin Ye, Xu Zhang, Mu-Chun Wang, Xiao Yang, Chenguang Wang, Fei Xie

and 16 more authors

Yiwei Zhou, Dan Wu, Guodong Zhang, Yifei Hu, Guoying Zheng, Shu-Jun Yang, Xing-Yao Wang, Shiyao Wang, Yukun Zhou, Fangkai Yang, Size Li, Kuo Cai, Qiang Luo, Ruiming Tang, Hangxiu Li, Kun Gai

/ Abstract

Recent advances in large language models have highlighted their potential for personalized recommendation, where accurately capturing user preferences remains a key challenge. Leveraging their strong reasoning and generalization capabilities, LLMs offer new opportunities for modeling long-term user behavior. To systematically evaluate this, we introduce ALPBench, a Benchmark for Attribution-level Long-term Personal Behavior Understanding. Unlike item-focused benchmarks, ALPBench predicts user-interested attribute combinations, enabling ground-truth evaluation even for newly introduced items. It models preferences from long-term historical behaviors rather than users'explicitly expressed requests, better reflecting enduring interests. User histories are represented as natural language sequences, allowing interpretable, reasoning-based personalization. ALPBench enables fine-grained evaluation of personalization by focusing on the prediction of attribute combinations task that remains highly challenging for current LLMs due to the need to capture complex interactions among multiple attributes and reason over long-term user behavior sequences.

Journal: ArXiv

DOI: 10.48550/arXiv.2602.03056