Nannan Wu, Qianwen Chao, Yanzhen Chen, Weiwei Xu, Chen Liu, Dinesh Manocha, Wenxin Sun, Yi Han, Xinran Yao, Xiaogang Jin
We present a real-time cloth animation method for dressing virtual humans of various shapes and poses. Our approach formulates the clothing deformation as a high-dimensional function of body shape parameters and pose parameters. In order to accelerate the computation, our formulation factorizes the clothing deformation into two independent components: the deformation introduced by body pose variation (Clothing Pose Model) and the deformation from body shape variation (Clothing Shape Model). Furthermore, we sample and cluster the poses spanning the entire pose space and use those clusters to efficiently calculate the anchoring points. We also introduce a sensitivity-based distance measurement to both find nearby anchoring points and evaluate their contributions to the final animation. Given a query shape and pose of the virtual agent, we synthesize the resulting clothing deformation by blending the Taylor expansion results of nearby anchoring points. Compared to previous methods, our approach is general and able to add the shape dimension to any clothing pose model. %and therefore it is more general. Furthermore, we can animate clothing represented with tens of thousands of vertices at 50+ FPS on a CPU. Moreover, our example database is more representative and can be generated in parallel, and thereby saves the training time. We also conduct a user evaluation and show that our method can improve a user's perception of dressed virtual agents in an immersive virtual environment compared to a conventional linear blend skinning method.
Carl Schissler, Dinesh Manocha
We present a new sound rendering pipeline that is able to generate plausible sound propagation effects for interactive dynamic scenes. Our approach combines ray-tracing-based sound propagation with reverberation filters using robust automatic reverb parameter estimation that is driven by impulse responses computed at a low sampling rate.We propose a unified spherical harmonic representation of directional sound in both the propagation and auralization modules and use this formulation to perform a constant number of convolution operations for any number of sound sources while rendering spatial audio. In comparison to previous geometric acoustic methods, we achieve a speedup of over an order of magnitude while delivering similar audio to high-quality convolution rendering algorithms. As a result, our approach is the first capable of rendering plausible dynamic sound propagation effects on commodity smartphones.
Hao Tian, Changbo Wang, Dinesh Manocha, Xinyu Zhang
We present a new approach to transfer grasp configurations from prior example objects to novel objects. We assume the novel and example objects have the same topology and similar shapes. We perform 3D segmentation on these objects using geometric and semantic shape characteristics. We compute a grasp space for each part of the example object using active learning. We build bijective contact mapping between these model parts and compute the corresponding grasps for novel objects. Finally, we assemble the individual parts and use local replanning to adjust grasp configurations while maintaining its stability and physical constraints. Our approach is general, can handle all kind of objects represented using mesh or point cloud and a variety of robotic hands.
Tanmay Randhavane, Aniket Bera, Emily Kubin, Kurt Gray, Dinesh Manocha
We present a data-driven algorithm for generating gaits of virtual characters with varying dominance traits. Our formulation utilizes a user study to establish a data-driven dominance mapping between gaits and dominance labels. We use our dominance mapping to generate walking gaits for virtual characters that exhibit a variety of dominance traits while interacting with the user. Furthermore, we extract gait features based on known criteria in visual perception and psychology literature that can be used to identify the dominance levels of any walking gait. We validate our mapping and the perceived dominance traits by a second user study in an immersive virtual environment. Our gait dominance classification algorithm can classify the dominance traits of gaits with ~73% accuracy. We also present an application of our approach that simulates interpersonal relationships between virtual characters. To the best of our knowledge, ours is the first practical approach to classifying gait dominance and generate dominance traits in virtual characters.
Andrew Best, Sahil Narang, Dinesh Manocha
We present a novel approach for generating plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments. Sense-Plan-Ask, or SPA, extends prior work in propositional planning and natural language processing to enable agents to plan with uncertain information, and leverage question and answer dialogue with other agents and avatars to obtain the needed information and complete their goals. The agents are additionally able to respond to questions from the avatars and other agents using natural-language enabling real-time multi-agent multi-avatar communication environments. Our algorithm can simulate tens of virtual agents at interactive rates interacting, moving, communicating, planning, and replanning. We find that our algorithm creates a small runtime cost and enables agents to complete their goals more effectively than agents without the ability to leverage natural-language communication. We demonstrate quantitative results on a set of simulated benchmarks and detail the results of a preliminary user-study conducted to evaluate the plausibility of the virtual interactions generated by SPA. Overall, we find that participants prefer SPA to prior techniques in 84\% of responses including significant benefits in terms of the plausibility of natural-language interactions and the positive impact of those interactions.
Venkatraman Narayanan, Bala Murali Manoghar, Vishnu Sashank Dorbala, Dinesh Manocha, Aniket Bera
We present ProxEmo, a novel end-to-end emotion prediction algorithm for socially aware robot navigation among pedestrians. Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation taking into account social and proxemic constraints. To classify emotions, we propose a multi-view skeleton graph convolution-based model that works on a commodity camera mounted onto a moving robot. Our emotion recognition is integrated into a mapless navigation scheme and makes no assumptions about the environment of pedestrian motion. It achieves a mean average emotion prediction precision of 82.47% on the Emotion-Gait benchmark dataset. We outperform current state-of-art algorithms for emotion recognition from 3D gaits. We highlight its benefits in terms of navigation in indoor scenes using a Clearpath Jackal robot.
Zherong Pan, Min Liu, Xifeng Gao, Kai Xu, Dinesh Manocha
We present a method to find globally optimal topology and trajectory jointly for planar linkages. Planar linkage structures can generate complex end-effector trajectories using only a single rotational actuator, which is very useful in building low-cost robots. We address the problem of searching for the optimal topology and geometry of these structures. However, since topology changes are non-smooth and non-differentiable, conventional gradient-based searches cannot be used. We formulate this problem as a mixed-integer convex programming (MICP) problem, for which a global optimum can be found using the branch-and-bound (BB) algorithm. Compared to existing methods, our experiments show that the proposed approach finds complex linkage structures more efficiently and generates end-effector trajectories more accurately.
Qingyang Tan, Zherong Pan, Lin Gao, Dinesh Manocha
We address the problem of accelerating thin-shell deformable object simulations by dimension reduction. We present a new algorithm to embed a high-dimensional configuration space of deformable objects in a low-dimensional feature space, where the configurations of objects and feature points have approximate one-to-one mapping. Our key technique is a graph-based convolutional neural network (CNN) defined on meshes with arbitrary topologies and a new mesh embedding approach based on physics-inspired loss term. We have applied our approach to accelerate high-resolution thin shell simulations corresponding to cloth-like materials, where the configuration space has tens of thousands of degrees of freedom. We show that our physics-inspired embedding approach leads to higher accuracy compared with prior mesh embedding methods. Finally, we show that the temporal evolution of the mesh in the feature space can also be learned using a recurrent neural network (RNN) leading to fully learnable physics simulators. After training our learned simulator runs $500-10000\times$ faster and the accuracy is high enough for robot manipulation tasks.
Cheng Li, Min Tang, Ruofeng Tong, Ming Cai, Jieyi Zhao, Dinesh Manocha
We present a novel parallel algorithm for cloth simulation that exploits multiple GPUs for fast computation and the handling of very high resolution meshes. To accelerate implicit integration, we describe new parallel algorithms for sparse matrix-vector multiplication (SpMV) and for dynamic matrix assembly on a multi-GPU workstation. Our algorithms use a novel work queue generation scheme for a fat-tree GPU interconnect topology. Furthermore, we present a novel collision handling scheme that uses spatial hashing for discrete and continuous collision detection along with a non-linear impact zone solver. Our parallel schemes can distribute the computation and storage overhead among multiple GPUs and enable us to perform almost interactive simulation on complex cloth meshes, which can hardly be handled on a single GPU due to memory limitations. We have evaluated the performance with two multi-GPU workstations (with 4 and 8 GPUs, respectively) on cloth meshes with 0.5-1.65M triangles. Our approach can reliably handle the collisions and generate vivid wrinkles and folds at 2-5 fps, which is significantly faster than prior cloth simulation systems. We observe almost linear speedups with respect to the number of GPUs.
Feixiang Lu, Zongdai Liu, Xibin Song, Dingfu Zhou, Wei Li, Hui Miao, Miao Liao, Liangjun Zhang, Bin Zhou, Ruigang Yang, Dinesh Manocha
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image for autonomous driving. Our approach combines the strengths of deep learning and the elegance of traditional techniques from part-based deformable model representation to produce high-quality 3D models in the presence of severe occlusions. We present a new part-based deformable vehicle model that is used for instance segmentation and automatically generate a dataset that contains dense correspondences between 2D images and 3D models. We also present a novel end-to-end deep neural network to predict dense 2D/3D mapping and highlight its benefits. Based on the dense mapping, we are able to compute precise 6-DoF poses and 3D reconstruction results at almost interactive rates on a commodity GPU. We have integrated these algorithms with an autonomous driving system. In practice, our method outperforms the state-of-the-art methods for all major vehicle parsing tasks: 2D instance segmentation by 4.4 points (mAP), 6-DoF pose estimation by 9.11 points, and 3D detection by 1.37. Moreover, we have released all of the source code, dataset, and the trained model on Github.
Aniket Bera, Tanmay Randhavane, Emily Kubin, Husam Shaik, Kurt Gray, Dinesh Manocha
We present a data-driven algorithm to model and predict the socio-emotional impact of groups on observers. Psychological research finds that highly entitative i.e. cohesive and uniform groups induce threat and unease in observers. Our algorithm models realistic trajectory-level behaviors to classify and map the motion-based entitativity of crowds. This mapping is based on a statistical scheme that dynamically learns pedestrian behavior and computes the resultant entitativity induced emotion through group motion characteristics. We also present a novel interactive multi-agent simulation algorithm to model entitative groups and conduct a VR user study to validate the socio-emotional predictive power of our algorithm. We further show that model-generated high-entitativity groups do induce more negative emotions than low-entitative groups.
Qingyang Tan, Zherong Pan, Breannan Smith, Takaaki Shiratori, Dinesh Manocha
We present a robust learning algorithm to detect and handle collisions in 3D deforming meshes. Our collision detector is represented as a bilevel deep autoencoder with an attention mechanism that identifies colliding mesh sub-parts. We use a numerical optimization algorithm to resolve penetrations guided by the network. Our learned collision handler can resolve collisions for unseen, high-dimensional meshes with thousands of vertices. To obtain stable network performance in such large and unseen spaces, we progressively insert new collision data based on the errors in network inferences. We automatically label these data using an analytical collision detector and progressively fine-tune our detection networks. We evaluate our method for collision handling of complex, 3D meshes coming from several datasets with different shapes and topologies, including datasets corresponding to dressed and undressed human poses, cloth simulations, and human hand poses acquired using multiview capture systems. Our approach outperforms supervised learning methods and achieves $93.8-98.1\%$ accuracy compared to the groundtruth by analytic methods. Compared to prior learning methods, our approach results in a $5.16\%-25.50\%$ lower false negative rate in terms of collision checking and a $9.65\%-58.91\%$ higher success rate in collision handling.
Tianrui Guan, Zhenpeng He, Ruitao Song, Dinesh Manocha, Liangjun Zhang
We present a terrain traversability mapping and navigation system (TNS) for autonomous excavator applications in an unstructured environment. We use an efficient approach to extract terrain features from RGB images and 3D point clouds and incorporate them into a global map for planning and navigation. Our system can adapt to changing environments and update the terrain information in real-time. Moreover, we present a novel dataset, the Complex Worksite Terrain (CWT) dataset, which consists of RGB images from construction sites with seven categories based on navigability. Our novel algorithms improve the mapping accuracy over previous SOTA methods by 4.17-30.48% and reduce MSE on the traversability map by 13.8-71.4%. We have combined our mapping approach with planning and control modules in an autonomous excavator navigation system and observe 49.3% improvement in the overall success rate. Based on TNS, we demonstrate the first autonomous excavator that can navigate through unstructured environments consisting of deep pits, steep hills, rock piles, and other complex terrain features.
Senthil Hariharan Arul, Dinesh Manocha
We present decentralized collision avoidance algorithms for quadrotor swarms operating under uncertain state estimation. Our approach exploits the differential flatness property and feedforward linearization to approximate the quadrotor dynamics and performs reciprocal collision avoidance. We account for the uncertainty in position and velocity by formulating the collision constraints as chance constraints, which describe a set of velocities that avoid collisions with a specified confidence level. We present two different methods for formulating and solving the chance constraints: our first method assumes a Gaussian noise distribution. Our second method is its extension to the non-Gaussian case by using a Gaussian Mixture Model (GMM). We reformulate the linear chance constraints into equivalent deterministic constraints, which are used with an MPC framework to compute a local collision-free trajectory for each quadrotor. We evaluate the proposed algorithm in simulations on benchmark scenarios and highlight its benefits over prior methods. We observe that both the Gaussian and non-Gaussian methods provide improved collision avoidance performance over the deterministic method. On average, the Gaussian method requires ~5ms to compute a local collision-free trajectory, while our non-Gaussian method is computationally more expensive and requires ~9ms on average in scenarios with 4 agents.
Shiguang Liu, Dinesh Manocha
Sound, as a crucial sensory channel, plays a vital role in improving the reality and immersiveness of a virtual environment, following only vision in importance. Sound can provide important clues such as sound directionality and spatial size. This paper gives a broad overview of research works on sound simulation in virtual reality, games, multimedia, computer-aided design. We first survey various sound synthesis methods, including harmonic synthesis, texture synthesis, spectral analysis, and physics-based synthesis. Then, we summarize popular sound propagation techniques, namely wave-based methods, geometric-based methods, and hybrid methods. Next, the sound rendering methods are reviewed. We further demonstrate the latest deep learning based sound simulation approaches. Finally, we point to some future directions of this field. To the best of our knowledge, this is the first attempt to provide a comprehensive summary of sound research in the field of computer graphics.
Yu-Ping Wang, Zi-Xin Zou, Cong Wang, Yue-Jiang Dong, Lei Qiao, Dinesh Manocha
The data loss caused by unreliable network seriously impacts the results of remote visual SLAM systems. From our experiment, a loss of less than 1 second of data can cause a visual SLAM algorithm to lose tracking. We present a novel buffering method, ORBBuf, to reduce the impact of data loss on remote visual SLAM systems. We model the buffering problem as an optimization problem by introducing a similarity metric between frames. To solve the buffering problem, we present an efficient greedy-like algorithm to discard the frames that have the least impact on the quality of SLAM results. We implement our ORBBuf method on ROS, a widely used middleware framework. Through an extensive evaluation on real-world scenarios and tens of gigabytes of datasets, we demonstrate that our ORBBuf method can be applied to different state-estimation algorithms (DSO and VINS-Fusion), different sensor data (both monocular images and stereo images), different scenes (both indoor and outdoor), and different network environments (both WiFi networks and 4G networks). Our experimental results indicate that the network losses indeed affect the SLAM results, and our ORBBuf method can reduce the RMSE up to 50 times comparing with the Drop-Oldest and Random buffering methods.
Qiaoyun Wu, Xiaoxi Gong, Kai Xu, Dinesh Manocha, Jingxuan Dong, Jun Wang
We present a target-driven navigation system to improve mapless visual navigation in indoor scenes. Our method takes a multi-view observation of a robot and a target as inputs at each time step to provide a sequence of actions that move the robot to the target without relying on odometry or GPS at runtime. The system is learned by optimizing a combinational objective encompassing three key designs. First, we propose that an agent conceives the next observation before making an action decision. This is achieved by learning a variational generative module from expert demonstrations. We then propose predicting static collision in advance, as an auxiliary task to improve safety during navigation. Moreover, to alleviate the training data imbalance problem of termination action prediction, we also introduce a target checking module to differentiate from augmenting navigation policy with a termination action. The three proposed designs all contribute to the improved training data efficiency, static collision avoidance, and navigation generalization performance, resulting in a novel target-driven mapless navigation system. Through experiments on a TurtleBot, we provide evidence that our model can be integrated into a robotic system and navigate in the real world. Videos and models can be found in the supplementary material.
Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
We present a method for improving the quality of synthetic room impulse responses for far-field speech recognition. We bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and the real room impulse responses using our novel, TS-RIRGAN architecture. Given a synthetic RIR in the form of raw audio, we use TS-RIRGAN to translate it into a real RIR. We also perform real-world sub-band room equalization on the translated synthetic RIR. Our overall approach improves the quality of synthetic RIRs by compensating low-frequency wave effects, similar to those in real RIRs. We evaluate the performance of improved synthetic RIRs on a far-field speech dataset augmented by convolving the LibriSpeech clean speech dataset [1] with RIRs and adding background noise. We show that far-field speech augmented using our improved synthetic RIRs reduces the word error rate by up to 19.9% in Kaldi far-field automatic speech recognition benchmark [2].
Hui Miao, Feixiang Lu, Zongdai Liu, Liangjun Zhang, Dinesh Manocha, Bin Zhou
We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS). Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters. First, to deal with multi-view data scarcity, we propose a part-assisted novel view synthesis algorithm for data augmentation. We train a part-based texture inpainting network in a self-supervised manner. Then we render the textured model into the background image with the target 6-DoF pose. Second, to handle various camera parameters, we present a new method that produces dense mappings between image pixels and 3D points to perform robust 2D/3D vehicle parsing. Third, we build the first CVIS dataset for benchmarking, which annotates more than 1540 images (14017 instances) from real-world traffic scenarios. We combine these novel algorithms and datasets to develop a robust approach for 2D/3D vehicle parsing for CVIS. In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation, by 4.5%, 4.3%, and 2.9%, respectively. More details and results are included in the supplement. To facilitate future research, we will release the source code and the dataset on GitHub.
Niall L. Williams, Aniket Bera, Dinesh Manocha
We present a novel redirected walking controller based on alignment that allows the user to explore large and complex virtual environments, while minimizing the number of collisions with obstacles in the physical environment. Our alignment-based redirection controller, ARC, steers the user such that their proximity to obstacles in the physical environment matches the proximity to obstacles in the virtual environment as closely as possible. To quantify a controller's performance in complex environments, we introduce a new metric, Complexity Ratio (CR), to measure the relative environment complexity and characterize the difference in navigational complexity between the physical and virtual environments. Through extensive simulation-based experiments, we show that ARC significantly outperforms current state-of-the-art controllers in its ability to steer the user on a collision-free path. We also show through quantitative and qualitative measures of performance that our controller is robust in complex environments with many obstacles. Our method is applicable to arbitrary environments and operates without any user input or parameter tweaking, aside from the layout of the environments. We have implemented our algorithm on the Oculus Quest head-mounted display and evaluated its performance in environments with varying complexity. Our project website is available at https://gamma.umd.edu/arc/.