Yong Cheng
In this paper, we examine the limit of applicability of Gödel's first incompleteness theorem ($\sf G1$ for short). We first define the notion "$\sf G1$ holds for the theory $T$". This paper is motivated by the following question: can we find a theory with a minimal degree of interpretation for which $\sf G1$ holds. To approach this question, we first examine the following question: is there a theory $T$ such that Robinson's $\mathbf{R}$ interprets $T$ but $T$ does not interpret $\mathbf{R}$ (i.e. $T$ is weaker than $\mathbf{R}$ w.r.t. interpretation) and $\sf G1$ holds for $T$? In this paper, we show that there are many such theories based on Jeřábek's work using some model theory. We prove that for each recursively inseparable pair $\langle A,B\rangle$, we can construct a r.e. theory $U_{\langle A,B\rangle}$ such that $U_{\langle A,B\rangle}$ is weaker than $\mathbf{R}$ w.r.t. interpretation and $\sf G1$ holds for $U_{\langle A,B\rangle}$. As a corollary, we answer a question from Albert Visser. Moreover, we prove that for any Turing degree $\mathbf{0}< \mathbf{d}<\mathbf{0}^{\prime}$, there is a theory $T$ with Turing degree $\mathbf{d}$ such that $\sf G1$ holds for $T$ and $T$ is weaker than $\mathbf{R}$ w.r.t. Turing reducibility. As a corollary, based on Shoenfield's work using some recursion theory, we show that there is no theory with a minimal degree of Turing reducibility for which $\sf G1$ holds.
Yong Cheng, Vincent K. N. Lau
Distributed power control over interference limited network has received an increasing intensity of interest over the past few years. Distributed solutions (like the iterative water-filling, gradient projection, etc.) have been intensively investigated under \emph{quasi-static} channels. However, as such distributed solutions involve iterative updating and explicit message passing, it is unrealistic to assume that the wireless channel remains unchanged during the iterations. Unfortunately, the behavior of those distributed solutions under \emph{time-varying} channels is in general unknown. In this paper, we shall investigate the distributed scaled gradient projection algorithm (DSGPA) in a $K$ pairs multicarrier interference network under a finite-state Markov channel (FSMC) model. We shall analyze the \emph{convergence property} as well as \emph{tracking performance} of the proposed DSGPA. Our analysis shows that the proposed DSGPA converges to a limit region rather than a single point under the FSMC model. We also show that the order of growth of the tracking errors is given by $\mathcal{O}\(1 \big/ \bar{N}\)$, where $\bar{N}$ is the \emph{average sojourn time} of the FSMC. Based on the analysis, we shall derive the \emph{tracking error optimal scaling matrices} via Markov decision process modeling. We shall show that the tracking error optimal scaling matrices can be implemented distributively at each transmitter. The numerical results show the superior performance of the proposed DSGPA over three baseline schemes, such as the gradient projection algorithm with a constant stepsize.
Yong Cheng
In this paper, we prove that: if $κ$ is supercompact and the $\mathsf{HOD}$ Hypothesis holds, then there is a proper class of regular cardinals in $V_κ$ which are measurable in $\mathsf{HOD}$. Woodin also proved this result. As a corollary, we prove Woodin's Local Universality Theorem. This work shows that under the assumption of the $\mathsf{HOD}$ Hypothesis and supercompact cardinals, large cardinals in $\mathsf{V}$ are reflected to be large cardinals in $\mathsf{HOD}$ in a local way, and reveals the huge difference between $\mathsf{HOD}$-supercompact cardinals and supercompact cardinals under the $\mathsf{HOD}$ Hypothesis.
Yong Cheng
In this paper we characterize the strong reflecting property for $L$-cardinals for all $ω_n$, characterize Harrington's Principle $HP(L)$ and its generalization and discuss the relationship between the strong reflecting property for $L$-cardinals and Harrington's Principle $HP(L)$.
Han Zhang, Jun Yang, Yun Zhou, Jianfeng Zheng, Yong Cheng, Bichao Bai, Guoxin Zhang, Yisheng Lv
Micro/nanoliter droplet is capable of achieving versatile applications with tiny volume and substantial surface energy, which is a big plus over bulk liquid. Yet, the contradiction of elaborate manipulation and enough power is still a challenge. Here, we unleash the potential of our miniwatt aspirators pumping up liquid and creating droplets with the help of acoustic vortex beams, inspired by the power mechanism that spirals are significant for most mollusks that live in water. These droplet aspirators produce very large interface deformations by small radiation pressures with orbit angular momentum from spiral-electrode transducers. The precisely contactless manipulation of physical, chemical and biological objects at micrometric down to nanometric scales, promises tremendous development in fields as diverse as microrobotics, nanoreactors, or nanoassemblies.
Yong Cheng
In this paper, we use Gödel's incompleteness theorem as a case study for investigating mathematical depth. We take for granted the widespread judgment by mathematical logicians that Gödel's incompleteness theorem is deep, and focus on the philosophical question of what its depth consists in. We focus on the methodological study of the depth of Gödel's incompleteness theorem, and propose three criteria to account for its depth: influence, fruitfulness, and unity. Finally, we give some explanations for our account of the depth of Gödel's incompleteness theorem.
Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, Yang Liu
Small perturbations in the input can severely distort intermediate representations and thus impact translation quality of neural machine translation (NMT) models. In this paper, we propose to improve the robustness of NMT models with adversarial stability training. The basic idea is to make both the encoder and decoder in NMT models robust against input perturbations by enabling them to behave similarly for the original input and its perturbed counterpart. Experimental results on Chinese-English, English-German and English-French translation tasks show that our approaches can not only achieve significant improvements over strong NMT systems but also improve the robustness of NMT models.
Yong Cheng, Vincent K. N. Lau, Yi Long
In network MIMO systems, channel state information is required at the transmitter side to multiplex users in the spatial domain. Since perfect channel knowledge is difficult to obtain in practice, \emph{limited feedback} is a widely accepted solution. The {\em dynamic number of cooperating BSs} and {\em heterogeneous path loss effects} of network MIMO systems pose new challenges on limited feedback design. In this paper, we propose a scalable limited feedback design for network MIMO systems with multiple base stations, multiple users and multiple data streams for each user. We propose a {\em limited feedback framework using per-cell product codebooks}, along with a {\em low-complexity feedback indices selection algorithm}. We show that the proposed per-cell product codebook limited feedback design can asymptotically achieve the same performance as the joint-cell codebook approach. We also derive an asymptotic \emph{per-user throughput loss} due to limited feedback with per-cell product codebooks. Based on that, we show that when the number of per-user feedback-bits $B_{k}$ is $\mathcal{O}\big( Nn_{T}n_{R}\log_{2}(ρg_{k}^{sum})\big)$, the system operates in the \emph{noise-limited} regime in which the per-user throughput is $\mathcal{O} \left( n_{R} \log_{2} \big( \frac{n_{R}ρg_{k}^{sum}}{Nn_{T}} \big) \right)$. On the other hand, when the number of per-user feedback-bits $B_{k}$ does not scale with the \emph{system SNR} $ρ$, the system operates in the \emph{interference-limited} regime where the per-user throughput is $\mathcal{O}\left( \frac{n_{R}B_{k}}{(Nn_{T})^{2}} \right)$. Numerical results show that the proposed design is very flexible to accommodate dynamic number of cooperating BSs and achieves much better performance compared with other baselines (such as the Givens rotation approach).
Yong Cheng, Ralf Schindler
Let $Z_2$, $Z_3$, and $Z_4$ denote $2^{\rm nd}$, $3^{\rm rd}$, and $4^{\rm th}$ order arithmetic, respectively. We let Harrington's Principle, {\sf HP}, denote the statement that there is a real $x$ such that every $x$--admissible ordinal is a cardinal in $L$. The known proofs of Harrington's theorem "$Det(Σ_1^1)$ implies $0^{\sharp}$ exists" are done in two steps: first show that $Det(Σ_1^1)$ implies {\sf HP}, and then show that {\sf HP} implies $0^{\sharp}$ exists. The first step is provable in $Z_2$. In this paper we show that $Z_2 \, + \, {\sf HP}$ is equiconsistent with ${\sf ZFC}$ and that $Z_3\, + \, {\sf HP}$ is equiconsistent with ${\sf ZFC} \, +$ there exists a remarkable cardinal. As a corollary, $Z_3\, + \, {\sf HP}$ does not imply $0^{\sharp}$ exists, whereas $Z_4\, + \, {\sf HP}$ does. We also study strengthenings of Harrington's Principle over $2^{\rm nd}$ and $3^{\rm rd}$ order arithmetic.
Yong Cheng
This paper belongs to the research on the limit of the first incompleteness theorem. Effectively inseparable theories (EI) can be viewed as an effective version of essentially undecidable theories (EU), and EI is stronger than EU. We examine the question: are there minimal effectively inseparable theories with respect to interpretability. We propose tEI, the theory version of EI. We first prove that there are no minimal tEI theories with respect to interpretability (i.e., for any tEI theory $T$, we can effectively find a theory which is tEI and strictly weaker than $T$ with respect to interpretability). By a theorem due to Marian B. Pour-EI, we have tEI is equivalent with EI. Thus, there are no minimal EI theories with respect to interpretability. Also we prove that there are no minimal finitely axiomatizable EI theories with respect to interpretability.
Yong Cheng
Let $Z_3$ denote $3^{rd}$ order arithmetic. Let Harrington's Principle, HP, denote the statement that there is a real $x$ such that every $x$--admissible ordinal is a cardinal in $L$. In this paper, assuming there exists a remarkable cardinal with a weakly inaccessible cardinal above it, we force a set model of $Z_3\, + \, {\sf HP}$ via set forcing without reshaping.
Yong Cheng
We give a survey of current research on Gödel's incompleteness theorems from the following three aspects: classifications of different proofs of Gödel's incompleteness theorems, the limit of the applicability of Gödel's first incompleteness theorem, and the limit of the applicability of Gödel's second incompleteness theorem.
Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, Wei Xu
While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.
Yong Cheng
This work is motivated by the problem of finding the limit of the applicability of the first incompleteness theorem ($\sf G1$). A natural question is: can we find a minimal theory for which $\sf G1$ holds? We examine the Turing degree structure of recursively enumerable (RE) theories for which $\sf G1$ holds and the interpretation degree structure of RE theories weaker than the theory $\mathbf{R}$ with respect to interpretation for which $\sf G1$ holds. We answer all questions that we posed in [2], and prove more results about them. It is known that there are no minimal essentially undecidable theories with respect to interpretation. We generalize this result and give some general characterizations which tell us under what conditions there are no minimal RE theories having some property with respect to interpretation.
Yong Cheng
In this paper, we aim to conceptually examine the relationship between logical incompleteness and concrete incompleteness which both study the incompleteness phenomenon. We argue for two main theses. Firstly, the current research on concrete incompleteness reals both similarities and differences between logical incompleteness and concrete incompleteness. Similarities between them are not universal, and differences between them are essential. Secondly, concrete incompleteness is a higher order phenomenon over logical incompleteness. This verifies that Hilbert's concrete and intuitive proof theory provides us essential new information from non-concrete and non-intuitive ideal proofs. We examine similarities between logical incompleteness and concrete incompleteness from two aspects: equivalences between logical incompleteness and concrete incompleteness, and the ubiquity of the incompleteness phenomenon in both logical incompleteness and concrete incompleteness. We examine differences between logical incompleteness and concrete incompleteness from five aspects: (1) the influence on Hilbert's program; (2) properties of independent sentences; (3) the intensionality problem; (4) the relationship with ordinal analysis; (5) the limit of provability.
Yong Cheng
Effectively inseparable pairs and their properties play an important role in the meta-mathematics of arithmetic and incompleteness. Different notions are introduced and shown in the literature to be equivalent to effective inseparability. We give a much simpler proof of these equivalences using the strong double recursion theorem. Then we prove some results about the application of effective inseparability in meta-mathematics.
Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
Yong Cheng, Ankur Bapna, Orhan Firat, Yuan Cao, Pidong Wang, Wolfgang Macherey
Multilingual neural machine translation models are trained to maximize the likelihood of a mix of examples drawn from multiple language pairs. The dominant inductive bias applied to these models is a shared vocabulary and a shared set of parameters across languages; the inputs and labels corresponding to examples drawn from different language pairs might still reside in distinct sub-spaces. In this paper, we introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level. Our approach interpolates instances from different language pairs into joint `crossover examples' in order to encourage sharing input and output spaces across languages. To ensure better fusion of examples in multilingual settings, we propose several techniques to improve example interpolation across dissimilar languages under heavy data imbalance. Experiments on a large-scale WMT multilingual dataset demonstrate that our approach significantly improves quality on English-to-Many, Many-to-English and zero-shot translation tasks (from +0.5 BLEU up to +5.5 BLEU points). Results on code-switching sets demonstrate the capability of our approach to improve model generalization to out-of-distribution multilingual examples. We also conduct qualitative and quantitative representation comparisons to analyze the advantages of our approach at the representation level.
Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein
In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT). The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, of which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embedding space centered around observed training sentence pairs. We then discuss our approach, AdvAug, to train NMT models using the embeddings of virtual sentences in sequence-to-sequence learning. Experiments on Chinese-English, English-French, and English-German translation benchmarks show that AdvAug achieves significant improvements over the Transformer (up to 4.9 BLEU points), and substantially outperforms other data augmentation techniques (e.g. back-translation) without using extra corpora.
Yong Cheng
In this work, we aim at understanding incompleteness in an abstract way via metamathematical properties of formal theories. We systematically examine the relationships between the following twelve important metamathematical properties of arithmetical theories: Rosser, EI (Effectively inseparable), RI (Recursively inseparable), TP (Turing persistent), EHU (essentially hereditarily undecidable), EU (essentially undecidable), Creative, $\mathbf{0}^{\prime}$ (theories with Turing degree $\mathbf{0}^{\prime}$), REW (all RE sets are weakly representable), RFD (all recursive functions are definable), RSS (all recursive sets are strongly representable), RSW (all recursive sets are weakly representable). Given any two properties $P$ and $Q$ of these properties, we examine whether the property $P$ implies $Q$.