Assessing External Validity Over Worst-case Subpopulations — arXiv2