Sanjana Vijay Ganesh, Yanzhao Wu, Gaowen Liu, Ramana Kompella, Ling Liu
Object tracking is an important functionality of edge video analytic systems and services. Multi-object tracking (MOT) detects the moving objects and tracks their locations frame by frame as real scenes are being captured into a video. However, it is well known that real time object tracking on the edge poses critical technical challenges, especially with edge devices of heterogeneous computing resources. This paper examines the performance issues and edge-specific optimization opportunities for object tracking. We will show that even the well trained and optimized MOT model may still suffer from random frame dropping problems when edge devices have insufficient computation resources. We present several edge specific performance optimization strategies, collectively coined as EMO, to speed up the real time object tracking, ranging from window-based optimization to similarity based optimization. Extensive experiments on popular MOT benchmarks demonstrate that our EMO approach is competitive with respect to the representative methods for on-device object tracking techniques in terms of run-time performance and tracking accuracy. EMO is released on Github at https://github.com/git-disl/EMO.
Tushin Mallick, Ashish Kundu, Ramana Kompella
The advent of quantum computing poses significant threats to classical public-key cryptographic primitives such as RSA and elliptic-curve cryptography. As many critical network and security protocols depend on these primitives for key exchange and authentication, there is an urgent need to understand their quantum vulnerability and assess the progress made towards integrating post-quantum cryptography (PQC). This survey provides a detailed examination of nine widely deployed protocols - TLS, IPsec, BGP, DNSSEC, SSH, QUIC, OpenID Connect, OpenVPN, and Signal Protocol - analysing their cryptographic foundations, quantum risks, and the current state of PQC migration. We find that TLS and Signal lead the transition with hybrid post-quantum key exchange already deployed at scale, while IPsec and SSH have standardised mechanisms but lack widespread production adoption. DNSSEC and BGP face the most significant structural barriers, as post-quantum signature sizes conflict with fundamental protocol constraints. Across all protocols, key exchange proves consistently easier to migrate than authentication, and protocol-level limitations such as message size and fragmentation often dominate over raw algorithm performance. We also discuss experimental deployments and emerging standards that are shaping the path towards a quantum-resistant communication infrastructure.
Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, Ramana Kompella
Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes used to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH). To begin, the proposed framework samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state. We then show how to produce unbiased estimators for various graph properties from the sample. Given that the graph analysis algorithms will run on a sample instead of the whole population, the runtime complexity of these algorithm is kept under control. Moreover, given that the estimators of graph properties are unbiased, the approximation error is kept under control. Finally, we show the performance of the proposed framework (gSH) on various types of graphs, such as social graphs, among others.
Emir Dervisevic, Amina Tankovic, Ehsan Fazel, Ramana Kompella, Peppino Fazio, Miroslav Voznak, Miralem Mehic
Secure communication makes the widespread use of telecommunication networks and services possible. With the constant progress of computing and mathematics, new cryptographic methods are being diligently developed. Quantum Key Distribution (QKD) is a promising technology that provides an Information-Theoretically Secure (ITS) solution to the secret-key agreement problem between two remote parties. QKD networks based on trusted repeaters are built to provide service to a larger number of parties at arbitrary distances. They function as an add-on technology to traditional networks, generating, managing, distributing, and supplying ITS cryptographic keys. Since key resources are limited, integrating QKD network services into critical infrastructures necessitates effective key management. As a result, this paper provides a comprehensive review of QKD network key management approaches. They are analyzed to facilitate the identification of potential strategies and accelerate the future development of QKD networks.
Shahrooz Pouryousef, Hassan Shapourian, Alireza Shabani, Ramana Kompella, Don Towsley
Aug 30, 2023·quant-ph·PDF Existing classical optical network infrastructure cannot be immediately used for quantum network applications due to photon loss. The first step towards enabling quantum networks is the integration of quantum repeaters into optical networks. However, the expenses and intrinsic noise inherent in quantum hardware underscore the need for an efficient deployment strategy that optimizes the allocation of quantum repeaters and memories. In this paper, we present a comprehensive framework for network planning, aiming to efficiently distributing quantum repeaters across existing infrastructure, with the objective of maximizing quantum network utility within an entanglement distribution network. We apply our framework to several cases including a preliminary illustration of a dumbbell network topology and real-world cases of the SURFnet and ESnet. We explore the effect of quantum memory multiplexing within quantum repeaters, as well as the influence of memory coherence time on quantum network utility. We further examine the effects of different fairness assumptions on network planning, uncovering their impacts on real-time network performance.
Charles Fleming, Ashish Kundu, Ramana Kompella
The proliferation of autonomous AI agents within enterprise environments introduces a critical security challenge: managing access control for emergent, novel tasks for which no predefined policies exist. This paper introduces an advanced security framework that extends the Task-Based Access Control (TBAC) model by using a Large Language Model (LLM) as an autonomous, risk-aware judge. This model makes access control decisions not only based on an agent's intent but also by explicitly considering the inherent \textbf{risk associated with target resources} and the LLM's own \textbf{model uncertainty} in its decision-making process. When an agent proposes a novel task, the LLM judge synthesizes a just-in-time policy while also computing a composite risk score for the task and an uncertainty estimate for its own reasoning. High-risk or high-uncertainty requests trigger more stringent controls, such as requiring human approval. This dual consideration of external risk and internal confidence allows the model to enforce a more robust and adaptive version of the principle of least privilege, paving the way for safer and more trustworthy autonomous systems.
Zebo Yang, Ali Ghubaish, Raj Jain, Ramana Kompella, Hassan Shapourian
In entanglement distribution networks, communication between two nodes necessitates the generation of end-to-end entanglement by entanglement swapping at intermediate nodes. Efficiently creating end-to-end entanglements over long distances is a key objective. In our prior study on asynchronous routing, we enhanced these entanglement rates by leveraging solely the local knowledge of the entanglement links of a node. This was achieved by creating a tree structure, particularly a destination-oriented directed acyclic graph (DODAG) or a spanning tree, eliminating synchronous operations and conserving unused entanglement links. In this article, we present a multi-tree approach with multiple DODAGs designed to improve end-to-end entanglement rates in large-scale networks, specifically catering to a range of network topologies, including grids and barbells, as well as realistic topologies found in research testbeds like ESnet and Internet2. Our simulations show a marked improvement in end-to-end entanglement rates for specific topologies compared to the single-tree method. This study underscores the promise of asynchronous routing schemes in quantum networks, highlighting the effectiveness of asynchronous routing across different network topologies and proposing a superior routing tactic.
Shahrooz Pouryousef, Eneet Kaur, Hassan Shapourian, Don Towsley, Ramana Kompella, Reza Nejabati
Scalable distributed quantum computing (DQC) has motivated the design of multiple quantum data-center (QDC) architectures that overcome the limitations of single quantum processors through modular interconnection. While these architectures adopt fundamentally different design philosophies, their relative performance under realistic quantum hardware constraints remains poorly understood. In this paper, we present a systematic benchmarking study of four representative QDC architectures-QFly, BCube, Clos, and Fat-Tree-quantifying their impact on distributed quantum circuit execution latency, resource contention, and scalability. Focusing on quantum-specific effects absent from classical data-center evaluations, we analyze how optical-loss-induced Einstein-Podolsky-Rosen (EPR) pair generation delays, coherence-limited entanglement retry windows, and contention from teleportation-based non-local gates shape end-to-end execution performance. Across diverse circuit workloads, we evaluate how architectural properties such as path diversity and path length, and shared BSM (Bell State Measurement) resources interact with optical-switch insertion loss and reconfiguration delay. Our results show that distributed quantum performance is jointly shaped by topology, scheduling policies, and physical-layer parameters, and that these factors interact in nontrivial ways. Together, these insights provide quantitative guidance for the design of scalable and high-performance quantum data-center architectures for DQC.
Tushin Mallick, Cristina Nita-Rotaru, Ashish Kundu, Ramana Kompella
Classification techniques can be used to analyze system behaviors, network protocols, and cryptographic primitives based on identifiable traits. While useful for defense, such classification can also be leveraged by attackers to infer system configurations, detect vulnerabilities, and tailor attacks such as denial-of-service, key recovery, or downgrade attacks. In this paper, we study the feasibility of classifying post-quantum (PQ) algorithms by analyzing implementations of key exchange and digital signatures, their use within secure protocols, and their integration into SNARK generation libraries. Unlike traditional cryptography, PQ algorithms have larger memory requirements and variable computational costs. Our research examines two post-quantum cryptography libraries, liboqs and CIRCL, evaluating TLS, SSH, QUIC, OpenVPN, and OpenID Connect (OIDC) across Windows, Ubuntu, and macOS. We also analyze pysnark and lattice_zksnark for SNARK generation and verification on Ubuntu. Experimental results show that (1) classical and PQ key exchange and signature algorithms can be distinguished with accuracies of 98% and 100%; (2) specific PQ algorithms can be identified with 97% accuracy for key exchange and 86% for signatures; (3) implementations of the same algorithm in liboqs and CIRCL are distinguishable with up to 100% accuracy; and (4) within CIRCL, PQ and hybrid key exchange implementations can be distinguished with 97% accuracy. For secure protocols, we can determine whether key exchange is classical or PQ and identify the PQ algorithm used. SNARK generation and verification in pysnark and lattice_zksnark are distinguishable with 100% accuracy. We demonstrate real-world applicability by identifying PQ-enabled TLS domains in the Tranco dataset and integrating our methods into QUARTZ, an open-source risk and threat analyzer by Cisco.
Saaket Agashe, Jayanth Srinivasa, Gaowen Liu, Ramana Kompella, Xin Eric Wang
Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that require the acquisition of novel reasoning patterns or domain-specific knowledge. To address this, we propose Context Bootstrapped Reinforcement Learning (CBRL), which augments RLVR training by stochastically prepending few-shot demonstrations to training prompts. The injection probability follows a curriculum that starts high to bootstrap early exploration, then anneals to zero so the model must ultimately succeed without assistance. This forces the policy to internalize reasoning patterns from the demonstrations rather than relying on them at test time. We validate CBRL across two model families and five Reasoning Gym tasks. Our results demonstrate that CBRL consistently improves success rate, provides better exploration efficiency, and is algorithm-agnostic. We further demonstrate CBRL's practical applicability on Q, a domain-specific programming language that diverges significantly from mainstream language conventions.
Charles Fleming, Ramana Kompella, Peter Bosch, Vijoy Pandey
As Large Language Model (LLM) based Multi-Agent Systems (MAS) evolve from experimental pilots to complex, persistent ecosystems, the limitations of direct agent-to-agent communication have become increasingly apparent. Current architectures suffer from fragmented context, stochastic hallucinations, rigid security boundaries, and inefficient topology management. This paper introduces Cognitive Fabric Nodes (CFN), a novel middleware layer that creates an omnipresent "Cognitive Fabric" between agents. Unlike traditional message queues or service meshes, CFNs are not merely pass-through mechanisms; they are active, intelligent intermediaries. Central to this architecture is the elevation of Memory from simple storage to an active functional substrate that informs four other critical capabilities: Topology Selection, Semantic Grounding, Security Policy Enforcement, and Prompt Transformation. We propose that each of these functions be governed by learning modules utilizing Reinforcement Learning (RL) and optimization algorithms to improve system performance dynamically. By intercepting, analyzing, and rewriting inter-agent communication, the Cognitive Fabric ensures that individual agents remain lightweight while the ecosystem achieves coherence, safety, and semantic alignment. We evaluate the effectiveness of the CFN on the HotPotQA and MuSiQue datasets in a multi-agent environment and demonstrate that the CFN improves performance by more than 10\% on both datasets over direct agent to agent communication.
Chenliang Tian, Zebo Yang, Raj Jain, Ramana Kompella, Reza Nejabati, Eneet Kaur, Aiman Erbad, Mohamed Abdallah, Mounir Hamdi
Mar 29, 2026·quant-ph·PDF Scalable quantum networks must support concurrent entanglement requests, yet existing routing protocols fail when users compete for shared repeater resources, wasting fragile quantum states. This paper presents RADAR-Q, a resource-aware decentralized routing protocol embedding real-time resource contention into path selection. Unlike prior designs requiring global coordination or central anchors, RADAR-Q makes intelligent local decisions balancing path length and fidelity, instantaneous quantum memory availability, and intermediate Bell-State Measurement (BSM) operations. By identifying the Nearest Common Ancestor (NCA) within a DODAG hierarchy, RADAR-Q localizes entanglement swapping close to communicating users - avoiding unnecessary central detours and reducing BSM chain length and decoherence exposure. We evaluate RADAR-Q on grid and random topologies against synchronous and root-centric asynchronous baselines. Results show RADAR-Q achieves aggregate throughputs 2.5x and 7.6x higher than synchronized and root-centric designs, respectively. While baselines suffer catastrophic fidelity collapse below the 0.5 threshold under high load, RADAR-Q consistently maintains end-to-end fidelity above 0.76, ensuring pairs remain usable. Furthermore, RADAR-Q exhibits near-perfect fairness (Jain's Fairness Index 96-98%) and retains over 50% of its ideal throughput under stringent 1.0 ms coherence times. These findings establish contention-aware decentralized routing as a scalable foundation for multi-tenant quantum networks.
Yuzhang Shang, Bingxin Xu, Gaowen Liu, Ramana Kompella, Yan Yan
Model quantization, which aims to compress deep neural networks and accelerate inference speed, has greatly facilitated the development of cumbersome models on mobile and edge devices. There is a common assumption in quantization methods from prior works that training data is available. In practice, however, this assumption cannot always be fulfilled due to reasons of privacy and security, rendering these methods inapplicable in real-life situations. Thus, data-free network quantization has recently received significant attention in neural network compression. Causal reasoning provides an intuitive way to model causal relationships to eliminate data-driven correlations, making causality an essential component of analyzing data-free problems. However, causal formulations of data-free quantization are inadequate in the literature. To bridge this gap, we construct a causal graph to model the data generation and discrepancy reduction between the pre-trained and quantized models. Inspired by the causal understanding, we propose the Causality-guided Data-free Network Quantization method, Causal-DFQ, to eliminate the reliance on data via approaching an equilibrium of causality-driven intervened distributions. Specifically, we design a content-style-decoupled generator, synthesizing images conditioned on the relevant and irrelevant factors; then we propose a discrepancy reduction loss to align the intervened distributions of the pre-trained and quantized models. It is worth noting that our work is the first attempt towards introducing causality to data-free quantization problem. Extensive experiments demonstrate the efficacy of Causal-DFQ. The code is available at https://github.com/42Shawn/Causal-DFQ.
Yanzhao Wu, Ling Liu, Ramana Kompella
Deep Neural Network (DNN) trained object detectors are widely deployed in many mission-critical systems for real time video analytics at the edge, such as autonomous driving and video surveillance. A common performance requirement in these mission-critical edge services is the near real-time latency of online object detection on edge devices. However, even with well-trained DNN object detectors, the online detection quality at edge may deteriorate for a number of reasons, such as limited capacity to run DNN object detection models on heterogeneous edge devices, and detection quality degradation due to random frame dropping when the detection processing rate is significantly slower than the incoming video frame rate. This paper addresses these problems by exploiting multi-model multi-device detection parallelism for fast object detection in edge systems with heterogeneous edge devices. First, we analyze the performance bottleneck of running a well-trained DNN model at edge for real time online object detection. We use the offline detection as a reference model, and examine the root cause by analyzing the mismatch among the incoming video streaming rate, video processing rate for object detection, and output rate for real time detection visualization of video streaming. Second, we study performance optimizations by exploiting multi-model detection parallelism. We show that the model-parallel detection approach can effectively speed up the FPS detection processing rate, minimizing the FPS disparity with the incoming video frame rate on heterogeneous edge devices. We evaluate the proposed approach using SSD300 and YOLOv3 on benchmark videos of different video stream rates. The results show that exploiting multi-model detection parallelism can speed up the online object detection processing rate and deliver near real-time object detection performance for efficient video analytics at edge.
Siheng Xiong, Ali Payani, Ramana Kompella, Faramarz Fekri
While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal concepts and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that enhances the learning of TR. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain-of-Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.
Yingjun Du, Gaowen Liu, Yuzhang Shang, Yuguang Yao, Ramana Kompella, Cees G. M. Snoek
Foundation models enable prompt-based classifiers for zero-shot and few-shot learning. Nonetheless, the conventional method of employing fixed prompts suffers from distributional shifts that negatively impact generalizability to unseen samples. This paper introduces prompt diffusion, which uses a diffusion model to gradually refine the prompts to obtain a customized prompt for each sample. Specifically, we first optimize a collection of prompts to obtain over-fitted prompts per sample. Then, we propose a prompt diffusion model within the prompt space, enabling the training of a generative transition process from a random prompt to its overfitted prompt. As we cannot access the label of a test image during inference, our model gradually generates customized prompts solely from random prompts using our trained, prompt diffusion. Our prompt diffusion is generic, flexible, and modality-agnostic, making it a simple plug-and-play module seamlessly embedded into existing prompt learning methods for textual, visual, or multi-modal prompt learning. Our diffusion model uses a fast ODE-based sampling strategy to optimize test sample prompts in just five steps, offering a good trade-off between performance improvement and computational efficiency. For all prompt learning methods tested, adding prompt diffusion yields more robust results for base-to-new generalization, cross-dataset generalization, and domain generalization in classification tasks tested over 15 diverse datasets.
Fatih Ilhan, Selim Furkan Tekin, Tiansheng Huang, Gaowen Liu, Ramana Kompella, Greg Eisenhauer, Yingyan Celine Lin, Calton Pu, Ling Liu
Fine-tuning pre-trained large language models (LLMs) has become a common practice for personalized natural language understanding (NLU) applications on downstream tasks and domain-specific datasets. However, there are two main challenges: (i) limited and/or heterogeneous data for fine-tuning due to proprietary data confidentiality or privacy requirements, and (ii) varying computation resources available across participating clients such as edge devices. This paper presents FedHFT - an efficient and personalized federated fine-tuning framework to address both challenges. First, we introduce a mixture of masked adapters to handle resource heterogeneity across participating clients, enabling high-performance collaborative fine-tuning of pre-trained language model(s) across multiple clients in a distributed setting, while keeping proprietary data local. Second, we introduce a bi-level optimization approach to handle non-iid data distribution based on masked personalization and client clustering. Extensive experiments demonstrate significant performance and efficiency improvements over various natural language understanding tasks under data and resource heterogeneity compared to representative heterogeneous federated learning methods.
Charles Fleming, Luca Muscariello, Vijoy Pandey, Ramana Kompella
Large Language Models (LLMs) have demonstrated remarkable performance improvements and the ability to learn domain-specific languages (DSLs), including APIs and tool interfaces. This capability has enabled the creation of AI agents that can perform preliminary computations and act through tool calling, which is now being standardized via protocols like MCP. However, LLMs face fundamental limitations: their context windows cannot grow indefinitely, restricting their memory and computational capacity. Agent collaboration emerges as essential for solving increasingly complex problems, mirroring how computational systems rely on different types of memory to scale. The "Internet of Agents" (IoA) represents the communication stack that enables agents to scale by distributing computation across collaborating entities. Current network architectural stacks (OSI and TCP/IP) were designed for data delivery between hosts and processes, not for agent collaboration with semantic understanding. To address this gap, we propose two new layers: an Agent Communication Layer (L8) and an Agent Semantic Layer (L9). L8 formalizes the structure of communication, standardizing message envelopes, speech-act performatives (e.g., REQUEST, INFORM), and interaction patterns (e.g., request-reply, publish-subscribe), building on protocols like MCP. The proposed L9 layer: (1) formalizes semantic context discovery and negotiation, (2) provides semantic grounding by binding terms to semantic context, and (3) semantically validates incoming prompts and performs disambiguation as needed. Furthermore, L9 introduces primitives for coordination and consensus, allowing agents to achieve alignment on shared states, collective goals, and distributed beliefs. Together, these layers provide the foundation for scalable, distributed agent collaboration, enabling the next generation of multi-agentic systems.
Jiapeng Zhao, Stéphane Vinet, Amir Minoofar, Michael Kilzer, Lucas Wang, Galan Moody, Vijoy Pandey, Ramana Kompella, Reza Nejabati
Apr 23, 2026·quant-ph·PDF Quantum networks are a keystone of the quantum internet. However, existing implementations remain largely confined to static point-to-point links due to the absence of a switching paradigm capable of dynamically routing fragile quantum entanglement without introducing decoherence. Here, we propose the Universal Quantum Switch, a foundational building block allowing on-demand, non-blocking, and encoding-agnostic routing of quantum information, as well as seamless modality conversion between disparate quantum platforms. We develop a prototype in thin-film lithium niobate and experimentally demonstrate robust switching with $\le 4\%$ decoherence via thermo-optic modulation and high-speed electro-optic switching of arbitrary entangled states at 1 MHz. Moreover, we show that our platform can support reconfiguration speeds up to 1 GHz. To our knowledge, this work represents the first demonstration of multi-node dynamic entanglement distribution at these speeds. Complementing these experimental results, we project the architecture's scalability, showing dimension-independent decoherence, and provide a scalable, interoperable building block for heterogeneous quantum network fabrics.
Xin Jin, Charalampos Katsis, Fan Sang, Jiahao Sun, Ashish Kundu, Ramana Kompella
Edge computing is a paradigm that shifts data processing services to the network edge, where data are generated. While such an architecture provides faster processing and response, among other benefits, it also raises critical security issues and challenges that must be addressed. This paper discusses the security threats and vulnerabilities emerging from the edge network architecture spanning from the hardware layer to the system layer. We further discuss privacy and regulatory compliance challenges in such networks. Finally, we argue the need for a holistic approach to analyze edge network security posture, which must consider knowledge from each layer.