ReliabilityBench: Evaluating LLM Agent Reliability Under Production-Like Stress Conditions — arXiv2