On Optimal Caching and Model Multiplexing for Large Model Inference — arXiv2