Showing 1–20 of 20 results
/ Date/ Name
Jun 13, 2024An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual PixelsFeb 15, 2024Revisiting Feature Prediction for Learning Visual Representations from VideoJan 25, 2024Deconstructing Denoising Diffusion Models for Self-Supervised LearningJun 8, 2023R-MAE: Regions Meet Masked AutoencodersJan 2, 2023ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersNov 23, 2022EurNet: Efficient Multi-Range Relational Modeling of Spatial Multi-Relational DataApr 1, 2022On the Importance of Asymmetry for Siamese Representation LearningMar 10, 2022LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text RetrievalNov 22, 2021Benchmarking Detection Transfer Learning with Vision TransformersOct 11, 2021Towards Demystifying Representation Learning with Non-contrastive Self-supervisionApr 5, 2021An Empirical Study of Training Self-Supervised Vision TransformersNov 20, 2020Exploring Simple Siamese Representation LearningMar 9, 2020Improved Baselines with Momentum Contrastive LearningApr 9, 2019Multi-Target Embodied Question AnsweringMar 28, 2019TensorMask: A Foundation for Dense Object SegmentationMar 29, 2018Iterative Visual Reasoning Beyond ConvolutionsApr 13, 2017Spatial Memory for Context Reasoning in Object DetectionFeb 7, 2017An Implementation of Faster RCNN with Study for Region SamplingMay 7, 2015Webly Supervised Learning of Convolutional NetworksNov 20, 2014Learning a Recurrent Visual Representation for Image Caption Generation