Showing 1–20 of 180 results
/ Date/ Name
Oct 9, 2019Greedy Convex EnsembleApr 2, 2015A Probabilistic Theory of Deep LearningDec 6, 2016A Probabilistic Framework for Deep LearningJul 2, 2025Completion of the DrugMatrix Toxicogenomics Database using 3-Dimensional TensorsJun 1, 2022Transformer with Fourier Integral AttentionsJun 4, 2020Sample Efficient Graph-Based Optimization with Noisy ObservationsDec 6, 2016Semi-Supervised Learning with the Deep Rendering Mixture ModelNov 1, 2018A Bayesian Perspective of Convolutional Neural Networks through a Deconvolutional Generative ModelJun 19, 2024Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component AnalysisSep 18, 2024Monomial Matrix Group Equivariant Neural Functional NetworksOct 5, 2024Equivariant Neural Functional Networks for TransformersMar 14, 2025Spherical Tree-Sliced Wasserstein DistanceJun 19, 2024A Primal-Dual Framework for Transformers and Neural NetworksOct 11, 2022Designing Robust Transformers using Robust Kernel Density EstimationOct 16, 2021Improving Transformers with Probabilistic Attention KeysOct 4, 2024Demystifying the Token Dynamics of Deep Selective State Space ModelsFeb 26, 2025CAMEx: Curvature-aware Merging of ExpertsFeb 21, 2025Tight Clusters Make Specialized ExpertsDec 9, 2019InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive SolversMar 14, 2025MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling