Neil Mallinar, Abhishek Shah, Tin Kam Ho, Rajendra Ugrani, Ayush Gupta
Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data sets quickly via a general framework for building weak models, also known as labeling functions, and denoising them through ensemble learning techniques. We present a fast, simple data programming method for augmenting text data sets by generating neighborhood-based weak models with minimal supervision. Furthermore, our method employs an iterative procedure to identify sparsely distributed examples from large volumes of unlabeled data. The iterative data programming techniques improve newer weak models as more labeled data is confirmed with human-in-loop. We show empirical results on sentence classification tasks, including those from a task of improving intent recognition in conversational agents.
Jennifer Richards, Andrew Elby, Ayush Gupta
Responsive teaching, in which teachers adapt instruction based on close attention to the substance of students' ideas, is typically characterized along two dimensions: the level of detail at which teachers attend and respond to students' ideas, and the stance teachers take toward what they hear - evaluating for correctness vs. interpreting meaning. We propose that characterizations of progress in responsive teaching should also consider the disciplinary centrality of the practices teachers notice and respond to within student thinking. To illustrate what this kind of progress can look like, we present a case study of a middle school science teacher who implemented the "same" lesson on the motion of freely falling objects in two subsequent years. We argue that his primary shift in responsiveness stemmed from a shift in which disciplinary practices he preferentially noticed and foregrounded. He moved from a focus on causal factors or variables to a more scientifically productive focus on causal stories or explanations. We explore how participation in a professional development community, institutional constraints, and a shift in personal epistemology may have contributed to the nature and stability of this shift in responsiveness.
Luke D. Conlin, Jennifer Richards, Ayush Gupta, Andrew Elby
Research has documented a sharp decline in students' interest and persistence in science, starting in middle school, particularly among students from underrepresented populations. In working to address this problem, we can learn a great deal from positive examples of students getting excited about science, especially students who were previously disengaged. In this paper, we present a case study of Estevan, an 8th grade student who came into Ms. K's science class with a reputation as a potential "problem student," but left as a leader of the class, even making plans to pursue a career in science. Through analysis of interviews and classroom interactions, we show how Estevan's love of science can be partially explained by an alignment between his identity as a lover of challenges and his epistemology of science as involving the challenge of figuring things out for yourself. This alignment was possible in part because it was supported by his caring teacher, who attended to his ideas and constantly challenged him and the rest of her students to figure things out for themselves instead of just "giving them the answers."
Ayush Gupta, David Hammer, Edward F. Redish
In a series of well-known papers, Chi and Slotta (Chi, 1992; Chi & Slotta, 1993; Chi, Slotta & de Leeuw, 1994; Slotta, Chi & Joram, 1995; Chi, 2005; Slotta & Chi, 2006) have contended that a reason for students' difficulties in learning physics is that they think about concepts as things rather than as processes, and that there is a significant barrier between these two ontological categories. We contest this view, arguing that expert and novice reasoning often and productively traverses ontological categories. We cite examples from everyday, classroom, and professional contexts to illustrate this. We agree with Chi and Slotta that instruction should attend to learners' ontologies; but we find these ontologies are better understood as dynamic and context-dependent, rather than as static constraints. To promote one ontological description in physics instruction, as suggested by Slotta and Chi, could undermine novices' access to productive cognitive resources they bring to their studies and inhibit their transition to the dynamic ontological flexibility required of experts.
Aayush Gupta
Existing benchmarks for tool-using LLM agents primarily report single-run success rates and miss reliability properties required in production. We introduce \textbf{ReliabilityBench}, a benchmark for evaluating agent reliability across three dimensions: (i) consistency under repeated execution using $\mathrm{pass}^k$, (ii) robustness to semantically equivalent task perturbations at intensity $ε$, and (iii) fault tolerance under controlled tool/API failures at intensity $λ$. ReliabilityBench contributes a unified reliability surface $R(k,ε,λ)$, \textit{action metamorphic relations} that define correctness via end-state equivalence rather than text similarity, and a chaos-engineering-style fault injection framework (timeouts, rate limits, partial responses, schema drift). We evaluate two models (Gemini 2.0 Flash, GPT-4o) and two agent architectures (ReAct, Reflexion) across four domains (scheduling, travel, customer support, e-commerce) over 1,280 episodes. Perturbations alone reduce success from 96.9% at $ε=0$ to 88.1% at $ε=0.2$. Rate limiting is the most damaging fault in ablations. ReAct is more robust than Reflexion under combined stress, and Gemini 2.0 Flash achieves comparable reliability to GPT-4o at much lower cost. ReliabilityBench provides a systematic framework for assessing production readiness of LLM agents.
Aayush Gupta
Large language models (LLMs) remain acutely vulnerable to prompt injection and related jailbreak attacks; heuristic guardrails (rules, filters, LLM judges) are routinely bypassed. We present Contextual Integrity Verification (CIV), an inference-time security architecture that attaches cryptographically signed provenance labels to every token and enforces a source-trust lattice inside the transformer via a pre-softmax hard attention mask (with optional FFN/residual gating). CIV provides deterministic, per-token non-interference guarantees on frozen models: lower-trust tokens cannot influence higher-trust representations. On benchmarks derived from recent taxonomies of prompt-injection vectors (Elite-Attack + SoK-246), CIV attains 0% attack success rate under the stated threat model while preserving 93.1% token-level similarity and showing no degradation in model perplexity on benign tasks; we note a latency overhead attributable to a non-optimized data path. Because CIV is a lightweight patch -- no fine-tuning required -- we demonstrate drop-in protection for Llama-3-8B and Mistral-7B. We release a reference implementation, an automated certification harness, and the Elite-Attack corpus to support reproducible research.
Ayush Gupta, Siyuan Huang, Rama Chellappa
Gait is becoming popular as a method of person re-identification because of its ability to identify people at a distance. However, most current works in gait recognition do not address the practical problem of occlusions. Among those which do, some require paired tuples of occluded and holistic sequences, which are impractical to collect in the real world. Further, these approaches work on occlusions but fail to retain performance on holistic inputs. To address these challenges, we propose RG-Gait, a method for residual correction for occluded gait recognition with holistic retention. We model the problem as a residual learning task, conceptualizing the occluded gait signature as a residual deviation from the holistic gait representation. Our proposed network adaptively integrates the learned residual, significantly improving performance on occluded gait sequences without compromising the holistic recognition accuracy. We evaluate our approach on the challenging Gait3D, GREW and BRIAR datasets and show that learning the residual can be an effective technique to tackle occluded gait recognition with holistic retention. We release our code publicly at https://github.com/Ayush-00/rg-gait.
Nathan Ng, Walid A. Hanafy, Prashanthi Kadambi, Balachandra Sunil, Ayush Gupta, David Irwin, Yogesh Simmhan, Prashant Shenoy
IoT applications are increasingly relying on on-device AI accelerators to ensure high performance, especially in limited connectivity and safety-critical scenarios. However, the limited on-chip memory of these accelerators forces inference runtimes to swap model segments between host and accelerator memory, substantially inflating latency. While collaborative processing by partitioning the model processing between CPU and accelerator resources can reduce accelerator memory pressure and latency, naive partitioning may worsen end-to-end latency by either shifting excessive computation to the CPU or failing to sufficiently curb swapping, a problem that is further amplified in multi-tenant and dynamic environments. To address these issues, we present SwapLess, a system for adaptive, multi-tenant TPU-CPU collaborative inference for memory-constrained Edge TPUs. SwapLess utilizes an analytic queueing model that captures partition-dependent CPU/TPU service times as well as inter- and intra-model swapping overheads across different workload mixes and request rates. Using this model, SwapLess continuously adjusts both the partition point and CPU core allocation online to minimize end-to-end response time with low decision overhead. An implementation on Edge TPU-equipped platforms demonstrates that SwapLess reduces mean latency by up to 63.8% for single-tenant workloads and up to 77.4% for multi-tenant workloads relative to the default Edge TPU compiler.
Noah D. Brenowitz, Tao Ge, Akshay Subramaniam, Peter Manshausen, Aayush Gupta, David M. Hall, Morteza Mardani, Arash Vahdat, Karthik Kashinath, Michael S. Pritchard
Climate modeling is reaching unprecedented resolution, producing petabytes of data. AI climate model emulators offer a path to computationally cheap analysis, enabling new scientific insight and scenario planning. Recent advances show promise in faithfully emulating climate data. However, prevailing auto-regressive paradigms are difficult to train on climate time horizons due to drifts, instabilities, and component-coupling challenges. They are hard to scale to high resolution and require sifting through troves of output to identify rare extremes of interest. We present Climate in a Bottle (cBottle), a generative diffusion-based framework emulating global 5 km climate simulations and reanalysis on the HEALPix grid. cBottle samples directly from the full distribution of atmospheric states, avoiding auto-regressive rollout, and is the first to reach this 12.5M-pixel global resolution. It consists of two stages: a coarse-resolution generator conditioned on sea surface temperatures and solar position, followed by a patch-based 16x super-resolution stage. cBottle passes a battery of tests, including diurnal-to-seasonal variability, large-scale modes of variability, tropical cyclone statistics, and trends of climate change and weather extremes. It is a step toward a foundation model: bridging data modalities (reanalysis and simulation), enabling zero-shot bias correction, downscaling, and data infilling. It also enables new interactivity via guided diffusion. For example, we train a tropical cyclone (TC) classifier alongside the generator, guide towards TC states, and obtain physically credible samples. This opens the door to guidance methods for a wide array of user queries and new ways of interacting with climate data.
Ayush Gupta, Andrew Elby
Researchers have argued against deficit-based explanations of students' troubles with mathematical sense-making, pointing instead to factors such as epistemology: students' beliefs about the nature of knowledge and learning can hinder them from activating and integrating productive knowledge they have. But such explanations run the risk of substituting an epistemological deficit for a concepts/skills deficit. Our analysis of an undergraduate engineering major avoids this "deficit trap" by incorporating multiple, context-dependent epistemological stances into his cognitive dynamics.
Brian A. Danielak, Ayush Gupta, Andrew Elby
Research has linked a student's affect to her epistemology (Boaler & Greeno, 2000), but those constructs often apply broadly to a discipline and/or classroom culture. Independently, an emerging line of research shows that a student in a given classroom and discipline can shift between multiple locally coherent epistemological stances (Hammer, Elby, Scherr, & Redish, 2005). Our case study of Judy, an undergraduate engineering major, begins our long-term effort at uniting these two bodies of literature.
Ayush Gupta, Brian A. Danielak, Andrew Elby
An established body of literature shows that a student's affect can be linked to her epistemological stance [1]. In this literature, the epistemology is generally taken as a belief or stance toward a discipline, and the affective stance applies broadly to a discipline or classroom culture. A second, emerging line of research, however, shows that a student in a given discipline can shift between multiple locally coherent epistemological stances [2]. To begin uniting these two bodies of literature, toward the long-term goal of incorporating affect into fine-grained models of in-the-moment cognitive dynamics, we present a case study of "Judy", an undergraduate engineering major. We argue that a fine-grained aspect of Judy's affect, her annoyance at a particular kind of homework problem, stabilizes a context-dependent epistemological stance she displays, about an unbridgeable gulf she perceives to exist between real and ideal circuits.
Sara Kothari, Ayush Gupta
Healthcare systems continuously generate vast amounts of electronic health records (EHRs), commonly stored in the Fast Healthcare Interoperability Resources (FHIR) standard. Despite the wealth of information in these records, their complexity and volume make it difficult for users to retrieve and interpret crucial health insights. Recent advances in Large Language Models (LLMs) offer a solution, enabling semantic question answering (QA) over medical data, allowing users to interact with their health records more effectively. However, ensuring privacy and compliance requires edge and private deployments of LLMs. This paper proposes a novel approach to semantic QA over EHRs by first identifying the most relevant FHIR resources for a user query (Task1) and subsequently answering the query based on these resources (Task2). We explore the performance of privately hosted, fine-tuned LLMs, evaluating them against benchmark models such as GPT-4 and GPT-4o. Our results demonstrate that fine-tuned LLMs, while 250x smaller in size, outperform GPT-4 family models by 0.55% in F1 score on Task1 and 42% on Meteor Task in Task2. Additionally, we examine advanced aspects of LLM usage, including sequential fine-tuning, model self-evaluation (narcissistic evaluation), and the impact of training data size on performance. The models and datasets are available here: https://huggingface.co/genloop
Benjamin W. Dreyfus, Erin Ronayne Sohr, Ayush Gupta, Andrew Elby
Quantum mechanics can seem like a departure from everyday experience of the physical world, but constructivist theories assert that learners build new ideas from their existing ones. To explore how students can navigate this tension, we examine video of a focus group completing a tutorial about the "particle in a box." In reasoning about the properties of a quantum particle, the students bring in elements of a classical particle ontology, evidenced by students' language and gestures. This reasoning, however, is modulated by metacognitive moments when the group explicitly considers whether classical intuitions apply to the quantum system. The students find some cases where they can usefully apply classical ideas to quantum physics, and others where they explicitly contrast classical and quantum mechanics. Negotiating this boundary with metacognitive awareness is part of the process of building quantum intuitions. Our data suggest that (some) students bring productive intellectual resources to this negotiation.
Ayush Gupta, Brian A. Danielak, Andrew Elby
Many prominent lines of research on student's reasoning and conceptual change within learning sciences and physics education research have not attended to the role of learners' affect or emotions in the dynamics of their conceptual reasoning. This is despite evidence from psychology and cognitive- and neuro- sciences that emotions are deeply integrated with cognition and documented associations in education research between emotions and academic performance. The few studies that have aimed to integrate emotions within models of learners' cognition, have mostly done so at a coarse grain size. In this manuscript, toward the long-term goal of incorporating emotions into fine-grained models of in-themoment cognitive dynamics, we present a case study of Judy, an undergraduate electrical engineering and physics major. We argue that a fine-grained aspect of Judy's affect, her annoyance at a particular kind of homework problem, stabilizes a context-dependent epistemological stance she displays, about an unbridgeable gulf she perceives to exist between real and ideal circuits.
Luke D. Conlin, Ayush Gupta, Rachel E. Scherr, David Hammer
We investigate the dynamics of student behaviors (posture, gesture, vocal register, visual focus) and the substance of their reasoning during collaborative work on inquiry-based physics tutorials. Scherr has characterized student activity during tutorials as observable clusters of behaviors separated by sharp transitions, and has argued that these behavioral modes reflect students' epistemological framing of what they are doing, i.e., their sense of what is taking place with respect to knowledge. We analyze students' verbal reasoning during several tutorial sessions using the framework of Russ, and find a strong correlation between certain behavioral modes and the scientific quality of students' explanations. We suggest that this is due to a dynamic coupling of how students behave, how they frame an activity, and how they reason during that activity. This analysis supports the earlier claims of a dynamic between behavior and epistemology. We discuss implications for research and instruction.
Shree Singhi, Aayan Yadav, Aayush Gupta, Shariar Ebrahimi, Parisa Hassanizadeh
As AI-generated sensitive images become more prevalent, identifying their source is crucial for distinguishing them from real images. Conventional image watermarking methods are vulnerable to common transformations like filters, lossy compression, and screenshots, often applied during social media sharing. Watermarks can also be faked or removed if models are open-sourced or leaked since images can be rewatermarked. We have developed a three-part framework for secure, transformation-resilient AI content provenance detection, to address these limitations. We develop an adversarially robust state-of-the-art perceptual hashing model, DinoHash, derived from DINOV2, which is robust to common transformations like filters, compression, and crops. Additionally, we integrate a Multi-Party Fully Homomorphic Encryption~(MP-FHE) scheme into our proposed framework to ensure the protection of both user queries and registry privacy. Furthermore, we improve previous work on AI-generated media detection. This approach is useful in cases where the content is absent from our registry. DinoHash significantly improves average bit accuracy by 12% over state-of-the-art watermarking and perceptual hashing methods while maintaining superior true positive rate (TPR) and false positive rate (FPR) tradeoffs across various transformations. Our AI-generated media detection results show a 25% improvement in classification accuracy on commonly used real-world AI image generators over existing algorithms. By combining perceptual hashing, MP-FHE, and an AI content detection model, our proposed framework provides better robustness and privacy compared to previous work.
Aayush Gupta, Arpit Bhayani
Web proxies such as NGINX commonly rely on least-recently-used (LRU) eviction, which is size agnostic and can thrash under periodic bursts and mixed object sizes. We introduce Cold-RL, a learned eviction policy for NGINX that replaces LRU's forced-expire path with a dueling Deep Q-Network served by an ONNX sidecar within a strict microsecond budget. On each eviction, Cold-RL samples the K least-recently-used objects, extracts six lightweight features (age, size, hit count, inter-arrival time, remaining TTL, and last origin RTT), and requests a bitmask of victims; a hard timeout of 500 microseconds triggers immediate fallback to native LRU. Policies are trained offline by replaying NGINX access logs through a cache simulator with a simple reward: a retained object earns one point if it is hit again before TTL expiry. We compare against LRU, LFU, size-based, adaptive LRU, and a hybrid baseline on two adversarial workloads. With a 25 MB cache, Cold-RL raises hit ratio from 0.1436 to 0.3538, a 146 percent improvement over the best classical baseline; at 100 MB, from 0.7530 to 0.8675, a 15 percent gain; and at 400 MB it matches classical methods (about 0.918). Inference adds less than 2 percent CPU overhead and keeps 95th percentile eviction latency within budget. To our knowledge, this is the first reinforcement learning eviction policy integrated into NGINX with strict SLOs.
Luke Conlin, Ayush Gupta, David Hammer
There are ongoing divisions in the learning sciences between perspectives that treat cognition as occurring within individual minds and those that treat it as irreducibly distributed or situated in material and social contexts. We contend that accounts of individual minds as complex systems are theoretically continuous with distributed and situated cognition. On this view, the difference is a matter of the scale of the dynamics of interest, and the choice of scale can be informed by data. In this paper, we propose heuristics for empirically determining the scale of the relevant cognitive dynamics. We illustrate these heuristics in two contrasting cases, one in which the evidence supports attributing cognition to a group of students and one in which the evidence supports attributing cognition to an individual.
Aayush Gupta, Aditya Gulati, Himanshu, Lakshya LNU
Human shape and clothing estimation has gained significant prominence in various domains, including online shopping, fashion retail, augmented reality (AR), virtual reality (VR), and gaming. The visual representation of human shape and clothing has become a focal point for computer vision researchers in recent years. This paper presents a comprehensive survey of the major works in the field, focusing on four key aspects: human shape estimation, fashion generation, landmark detection, and attribute recognition. For each of these tasks, the survey paper examines recent advancements, discusses their strengths and limitations, and qualitative differences in approaches and outcomes. By exploring the latest developments in human shape and clothing estimation, this survey aims to provide a comprehensive understanding of the field and inspire future research in this rapidly evolving domain.