Jack W. O'Sullivan, Anil Palepu, Khaled Saab, Wei-Hung Weng, Yong Cheng, Emily Chu, Yaanik Desai, Aly Elezaby, Daniel Seung Kim, Roy Lan, Wilson Tang, Natalie Tapaskar, Victoria Parikh, Sneha S. Jain, Kavita Kulkarni, Philip Mansfield, Dale Webster, Juraj Gottweis, Joelle Barral, Mike Schaekermann, Ryutaro Tanno, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Euan Ashley, Tao Tu
The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI system optimized for diagnostic dialogue, to potentially augment and support clinical decision-making in this challenging context. We curated a real-world dataset of 204 complex cases from a subspecialist cardiology practice, including results for electrocardiograms, echocardiograms, cardiac MRI, genetic tests, and cardiopulmonary stress tests. We developed a ten-domain evaluation rubric used by subspecialists to evaluate the quality of diagnosis and clinical management plans produced by general cardiologists or AMIE, the latter enhanced with web-search and self-critique capabilities. AMIE was rated superior to general cardiologists for 5 of the 10 domains (with preference ranging from 9% to 20%), and equivalent for the rest. Access to AMIE's response improved cardiologists' overall response quality in 63.7% of cases while lowering quality in just 3.4%. Cardiologists' responses with access to AMIE were superior to cardiologist responses without access to AMIE for all 10 domains. Qualitative examinations suggest AMIE and general cardiologist could complement each other, with AMIE thorough and sensitive, while general cardiologist concise and specific. Overall, our results suggest that specialized medical LLMs have the potential to augment general cardiologists' capabilities by bridging gaps in subspecialty expertise, though further research and validation are essential for wide clinical utility.
Alexander H. Liu, Tao Tu, Hung-yi Lee, Lin-shan Lee
In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances. This is achieved by proper temporal segmentation to make the representations phoneme-synchronized, and proper phonetic clustering to have total number of distinct representations close to the number of phonemes. Mapping between the distinct representations and phonemes is learned from a small amount of annotated paired data. Preliminary experiments on LJSpeech demonstrated the learned representations for vowels have relative locations in latent space in good parallel to that shown in the IPA vowel chart defined by linguistics experts. With less than 20 minutes of annotated speech, our method outperformed existing methods on phoneme recognition and is able to synthesize intelligible speech that beats our baseline model.
Qiong Ma, Zhi-Rong Lin, Tao Tu, Guang-Can Guo, Guo-Ping Guo
We propose a new method to use gapped graphene as barrier to confine electrons in gapless graphene and form a good quantum dot, which can be realized on an oxygen-terminated $SiO_{2}$ substrate partly H-passivated. In particular, we use ferromagnetic insulators deposited on top of barrier which give rise to a spin related energy spectrum and transport properties. Compared to the complexity of etched quantum dots in graphene, the setup suggested here is a promising candidate for practical applications.
Guo-Ping Guo, Zhi-Rong Lin, Xiao-Peng Li, Tao Tu, Guang-Can Guo
Aug 12, 2008·quant-ph·PDF We propose a scalable scheme to implement quantum computation in graphene nanoribbon. It is shown that electron or hole can be naturally localized in each zigzag region for a graphene nanoribbon with a sequence of Z-shaped structure without exploiting any confined gate. An one-dimensional graphene quantum dots chain is formed in such graphene nanoribbon, where electron or hole spin can be encoded as qubits. The coupling interaction between neighboring graphene quantum dots is found to be always-on Heisenberg type. Applying the bang-bang control strategy and decoherence free subspaces encoding method, universal quantum computation is argued to be realizable with the present techniques.
Fei-Yun Zhu, Zhi-Cheng Zhu, Tao Tu, Hua Tu, Guang-Can Guo, Guo-Ping Guo
We study the time evolution of two electron spin states in a double quantum-dot system, which includes a nearby quantum point contact (QPC) as a measurement device. We obtain that the QPC measurement induced decoherence is in time scales of microsecond. We also find that the enhanced QPC measurement will trap the system in its initial spin states, which is consistent with quantum Zeno effect.
Xiao-Jie Hao, Tao Tu, Yong-Jie Zhao, Guang-Can Guo, H. W. Jiang, Guo-Ping Guo
We carry out a numerical study of the quantum Hall ferromagnetism in a two-subband system using a set of experimental parameters in a recently experiment [X. C. Zhang, I. Martin, and H. W. Jiang, Phys. Rev. B \textbf{74}, 073301 (2006)]. Employing the self-consistence local density approximation for growth direction wave function and the Hartree-Fock theory for the pseudospin anisotropy energy, we are able to account for the easy-axis and easy-plane quantum Hall ferromagnetism observed at total filling factor $ν= 3$ and $ν= 4$, respectively. Our study provides some insight of how the anisotropy energy, which highly depends upon the distribution of growth direction wave functions, determines the symmetry of the quantum Hall ferromagnets.
Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, Shekoofeh Azizi, Karan Singhal, Yong Cheng, Le Hou, Albert Webson, Kavita Kulkarni, S Sara Mahdavi, Christopher Semturs, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Alan Karthikesalingam, Vivek Natarajan
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
Ling-Jun Wang, Gang Cao, Tao Tu, Hai-Ou Li, Cheng Zhou, Xiao-Jie Hao, Zhan Su, Guang-Can Guo, Guo-Ping Guo, Hong-Wen Jiang
We have developed an etching process to fabricate a quantum dot and a nearby single electron transistor as a charge detector in a single layer graphene. The high charge sensitivity of the detector is used to probe Coulomb diamonds as well as excited spectrum in the dot, even in the regime where the current through the quantum dot is too small to be measured by conventional transport means. The graphene based quantum dot and integrated charge sensor serve as an essential building block to form a solid-state qubit in a nuclear-spin-free quantum world.
Anil Palepu, Vikram Dhillon, Polly Niravath, Wei-Hung Weng, Preethi Prasad, Khaled Saab, Ryutaro Tanno, Yong Cheng, Hanh Mai, Ethan Burns, Zainub Ajmal, Kavita Kulkarni, Philip Mansfield, Dale Webster, Joelle Barral, Juraj Gottweis, Mike Schaekermann, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Tao Tu
Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care without specific fine-tuning to this challenging domain. To perform this evaluation, we curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases and mirroring the key information available to a multidisciplinary tumor board for decision-making (openly released with this work). We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy. To improve performance, we enhanced AMIE with the inference-time ability to perform web search retrieval to gather relevant and up-to-date clinical knowledge and refine its responses with a multi-stage self-critique pipeline. We compare response quality of AMIE with internal medicine trainees, oncology fellows, and general oncology attendings under both automated and specialist clinician evaluations. In our evaluations, AMIE outperformed trainees and fellows demonstrating the potential of the system in this challenging and important domain. We further demonstrate through qualitative examples, how systems such as AMIE might facilitate conversational interactions to assist clinicians in their decision making. However, AMIE's performance was overall inferior to attending oncologists suggesting that further research is needed prior to consideration of prospective uses.
Qiong Ma, Tao Tu, Zhi-Rong Lin, Guang-Can Guo, Guo-Ping Guo
We study the conductance spectrum of graphene quantum dots, both single and multiple cases. The single electron tunneling phenomenon is investigated and the periodicity, amplitude and line shape of the Coulomb blockade oscillations at low temperatures are obtained. Further, we discuss the transport behavior when multiple dots are assembled in array and find a phase transition of conductance spectra from individual Coulomb blockade to collective Coulomb blockade.
Zhan Su, Tao Tu, Gang Cao, Guang-Can Guo, Guo-Ping Guo
We propose an approach to reconstruct two-electron spin qubit states in semiconductor quantum dots by employing tomographic techniques. This procedure exploits the combination of fast gate operations on electron spins trapped in dots and dynamical nuclear polarization of the underlying Ga and As nuclei. The presented method can be an important tool in solid state quantum computation for complete characterization of qubit states and theirs correlations.
Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun, Ming-Hsuan Yang
Articulated 3D reconstruction has valuable applications in various domains, yet it remains costly and demands intensive work from domain experts. Recent advancements in template-free learning methods show promising results with monocular videos. Nevertheless, these approaches necessitate a comprehensive coverage of all viewpoints of the subject in the input video, thus limiting their applicability to casually captured videos from online sources. In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete. We propose DreaMo that jointly performs shape reconstruction while solving the challenging low-coverage regions with view-conditioned diffusion prior and several tailored regularizations. In addition, we introduce a skeleton generation strategy to create human-interpretable skeletons from the learned neural bones and skinning weights. We conduct our study on a self-collected internet video collection characterized by incomplete view coverage. DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation. Extensive qualitative and quantitative studies validate the efficacy of each proposed component, and show existing methods are unable to solve correct geometry due to the incomplete view coverage.
Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun
We propose ImGeoNet, a multi-view image-based 3D object detection framework that models a 3D space by an image-induced geometry-aware voxel representation. Unlike previous methods which aggregate 2D features into 3D voxels without considering geometry, ImGeoNet learns to induce geometry from multi-view images to alleviate the confusion arising from voxels of free space, and during the inference phase, only images from multiple views are required. Besides, a powerful pre-trained 2D feature extractor can be leveraged by our representation, leading to a more robust performance. To evaluate the effectiveness of ImGeoNet, we conduct quantitative and qualitative experiments on three indoor datasets, namely ARKitScenes, ScanNetV2, and ScanNet200. The results demonstrate that ImGeoNet outperforms the current state-of-the-art multi-view image-based method, ImVoxelNet, on all three datasets in terms of detection accuracy. In addition, ImGeoNet shows great data efficiency by achieving results comparable to ImVoxelNet with 100 views while utilizing only 40 views. Furthermore, our studies indicate that our proposed image-induced geometry-aware representation can enable image-based methods to attain superior detection accuracy than the seminal point cloud-based method, VoteNet, in two practical scenarios: (1) scenarios where point clouds are sparse and noisy, such as in ARKitScenes, and (2) scenarios involve diverse object classes, particularly classes of small objects, as in the case in ScanNet200.
Guo-Ping Guo, Zhi-Rong Lin, Tao Tu, Hai-Ou Li, Chang-Ling Zou, Xi-Feng Ren, Guang-Can Guo
Apr 23, 2009·quant-ph·PDF We develop an architecture for distributed quantum computation using quantum bus of plasmonic circuits and spin qubits in self-assembled quantum dots. Deterministic quantum gates between two distant spin qubits can be reached by using an adiabatic approach in which quantum dots couple with highly detuned plasmon modes in a metallic nanowire. Plasmonic quantum bus offers a robust and scalable platform for quantum optics experiments and the development of on-chip quantum networks composed of various quantum nodes, such as quantum dots, molecules and nanoparticles.
Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-yi Lee
Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain success in the situation where a lot of high-quality speech plus their corresponding transcriptions are available. However, laborious paired data collection processes prevent many institutes from building multi-speaker TTS systems of great performance. In this work, we propose a semi-supervised learning approach for multi-speaker TTS. A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation. The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices. We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy. In addition, our analysis reveals that different speaker characteristics of the paired data have an impact on the effectiveness of semi-supervised TTS.
Tao Tu, Yuan-Jui Chen, Cheng-chieh Yeh, Hung-yi Lee
End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data. However, laborious data collection remains difficult for at least 95% of the languages over the world, which hinders the development of TTS in different languages. In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available. We show such TTS can be effectively constructed by transferring knowledge from a high-resource (source) language. Since the model trained on source language cannot be directly applied to target language due to input space mismatch, we propose a method to learn a mapping between source and target linguistic symbols. Benefiting from this learned mapping, pronunciation information can be preserved throughout the transferring procedure. Preliminary experiments show that we only need around 15 minutes of paired data to obtain a relatively good TTS system. Furthermore, analytic studies demonstrated that the automatically discovered mapping correlate well with the phonetic expertise.
Guo-Ping Guo, Hui Zhang, Yong Hu, Tao Tu, Guang-Can Guo
Feb 10, 2008·quant-ph·PDF Realization of controllable interaction between distant qubits is one of the major problems in scalable solid state quantum computing. We study a superconducting transmission line resonator (TLR) as a tunable dispersive coupler for the double-dot molecules. A general interaction Hamiltonian of $n$ two-electron spin-based qubits and the TLR is presented, where the double-dot qubits are biased at the large detuning region and the TLR is always empty and virtually excited. Our analysis o the main decoherence sources indicates that various major quantum operations can be reliably implemented with current technology.
Zhi-Rong Lin, Guo-Ping Guo, Tao Tu, Fei-Yun Zhu, Guang-Can Guo
Apr 20, 2008·quant-ph·PDF We propose an efficient method to generate cluster states in spatially separated double quantum dots with a superconducting transmission line resonator (TLR). When the detuning between the double-dot qubits transition frequency and the frequency of the full wave mode in the TLR satisfies some conditions, an Ising-like operator between arbitrary two separated qubits can be achieved. Even including the main noise sources, it's shown that the high fidelity cluster states could be generated in this solid system in just one step.
Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Nathaneal Scharli, Aakanksha Chowdhery, Philip Mansfield, Blaise Aguera y Arcas, Dale Webster, Greg S. Corrado, Yossi Matias, Katherine Chou, Juraj Gottweis, Nenad Tomasev, Yun Liu, Alvin Rajkomar, Joelle Barral, Christopher Semturs, Alan Karthikesalingam, Vivek Natarajan
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
Xiao-Jie Hao, Guo-Ping Guo, Hai-Ou Li, Cheng Zhou, Gang Cao, Guang-Can Guo, Wayne Y. Fung, Zhongqing Ji, Wei Lu, Tao Tu
We experimentally study the electrical transport properties of Ge/Si core/shell nanowire device with two superconducting leads in the Coulomb blockade regime. Anomalous zero field magnetoconductance peaks are observed for the first time at the gate voltages where Coulomb blockade oscillation peaks present. Many evidences indicate this feature is due to Andreev reflection enhanced phase coherent single hole tunneling through the quantum dot, which can be suppressed by an external magnetic field without destroying the superconducting states in the electrodes.