HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling — arXiv2