"au:"Jiang Liu"" — arXiv2 Search

Showing 1–20 of 197 results

Cloud Ensemble Learning for Fault Diagnosis of Rolling Bearings with Stochastic Configuration Networks

Wei Dai, Jiang Liu, Lanhao Wang

Jul 2, 2023·cs.LG·PDF

Fault diagnosis of rolling bearings is of great significance for post-maintenance in rotating machinery, but it is a challenging work to diagnose faults efficiently with a few samples. Additionally, faults commonly occur with randomness and fuzziness due to the complexity of the external environment and the structure of rolling bearings, hindering effective mining of fault characteristics and eventually restricting accuracy of fault diagnosis. To overcome these problems, stochastic configuration network (SCN) based cloud ensemble learning, called SCN-CEL, is developed in this work. Concretely, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and advance the generalization performance of fault diagnosis machine. Experimental results demonstrate that the proposed method indeed performs favorably for distinguishing fault categories of rolling bearings in the few shot scenarios.

DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation

Jiang Liu, Chenqiang Gao, Deyu Meng, Alexander G. Hauptmann

Dec 18, 2017·cs.CV·PDF

In real-world crowd counting applications, the crowd densities vary greatly in spatial and temporal domains. A detection based counting method will estimate crowds accurately in low density scenes, while its reliability in congested areas is downgraded. A regression based approach, on the other hand, captures the general density information in crowded regions. Without knowing the location of each person, it tends to overestimate the count in low density areas. Thus, exclusively using either one of them is not sufficient to handle all kinds of scenes with varying densities. To address this issue, a novel end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density Estimation Network) is proposed. It can adaptively decide the appropriate counting mode for different locations on the image based on its real density conditions. DecideNet starts with estimating the crowd density by generating detection and regression based density maps separately. To capture inevitable variation in densities, it incorporates an attention module, meant to adaptively assess the reliability of the two types of estimations. The final crowd counts are obtained with the guidance of the attention module to adopt suitable estimations from the two kinds of density maps. Experimental results show that our method achieves state-of-the-art performance on three challenging crowd counting datasets.

An Optimization Approach to Jacobian Conjecture

Jiang Liu

Feb 19, 2020·math.GM·PDF

Let $n\geq 2$ and $\mathbb K $ be a number field of characteristic $0$. Jacobian Conjecture asserts for a polynomial map $\mathcal P$ from $\mathbb K ^n$ to itself, if the determinant of its Jacobian matrix is a nonzero constant in $\mathbb K $ then the inverse $\mathcal P^{-1}$ exists and is also a polynomial map. This conjecture was firstly proposed by Keller in 1939 for $\mathbb K ^n=\mathbb C^2$ and put in Smale's 1998 list of Mathematical Problems for the Next Century. This study is going to present a proof for the conjecture. Our proof is based on Dru{ż}kowski Map and Hadamard's Diffeomorphism Theorem, and additionally uses some optimization idea.

A Classical $π$ Machine and Grover's Algorithm

Jiang Liu

Jan 29, 2020·quant-ph·PDF

This paper studies a well-known $π$ machine illustrated by Fig.(1). It is shown that the $π$ machine can compute digits of $π$ if the ratio of block weights, $m_2/m_1$, satisfies certain conditions, and that dynamics of the $π$ machine is identical to that of Grover's algorithm [1] in quantum computing.

Mutual Adversarial Training: Learning together is better than going alone

Jiang Liu, Chun Pong Lau, Hossein Souri, Soheil Feizi, Rama Chellappa

Dec 9, 2021·cs.LG·PDF

Recent studies have shown that robustness to adversarial attacks can be transferred across networks. In other words, we can make a weak model more robust with the help of a strong teacher model. We ask if instead of learning from a static teacher, can models "learn together" and "teach each other" to achieve better robustness? In this paper, we study how interactions among models affect robustness via knowledge distillation. We propose mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness. MAT allows robust models to explore a larger space of adversarial samples, and find more robust feature spaces and decision boundaries. Through extensive experiments on CIFAR-10 and CIFAR-100, we demonstrate that MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks, bringing $\sim$8% accuracy gain to vanilla adversarial training (AT) under PGD-100 attacks. In addition, we show that MAT can also mitigate the robustness trade-off among different perturbation types, bringing as much as 13.1% accuracy gain to AT baselines against the union of $l_\infty$, $l_2$ and $l_1$ attacks. These results show the superiority of the proposed method and demonstrate that collaborative learning is an effective strategy for designing robust models.

M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction

Jiang Liu, Bobo Li, Xinran Yang, Na Yang, Hao Fei, Mingyao Zhang, Fei Li, Donghong Ji

Dec 5, 2024·cs.CL·PDF

Multimodal information extraction (IE) tasks have attracted increasing attention because many studies have shown that multimodal information benefits text information extraction. However, existing multimodal IE datasets mainly focus on sentence-level image-facilitated IE in English text, and pay little attention to video-based multimodal IE and fine-grained visual grounding. Therefore, in order to promote the development of multimodal IE, we constructed a multimodal multilingual multitask dataset, named M$^{3}$D, which has the following features: (1) It contains paired document-level text and video to enrich multimodal information; (2) It supports two widely-used languages, namely English and Chinese; (3) It includes more multimodal IE tasks such as entity recognition, entity chain extraction, relation extraction and visual grounding. In addition, our dataset introduces an unexplored theme, i.e., biography, enriching the domains of multimodal IE resources. To establish a benchmark for our dataset, we propose an innovative hierarchical multimodal IE model. This model effectively leverages and integrates multimodal information through a Denoised Feature Fusion Module (DFFM). Furthermore, in non-ideal scenarios, modal information is often incomplete. Thus, we designed a Missing Modality Construction Module (MMCM) to alleviate the issues caused by missing modalities. Our model achieved an average performance of 53.80% and 53.77% on four tasks in English and Chinese datasets, respectively, which set a reasonable standard for subsequent research. In addition, we conducted more analytical experiments to verify the effectiveness of our proposed module. We believe that our work can promote the development of the field of multimodal IE.

Instella: Fully Open Language Models with Stellar Performance

Jiang Liu, Jialian Wu, Xiaodong Yu, Yusheng Su, Prakamya Mishra, Gowtham Ramesh, Sudhanshu Ranjan, Chaitanya Manem, Ximeng Sun, Ze Wang, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum

Nov 13, 2025·cs.CL·PDF

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet the majority of high-performing models remain closed-source or partially open, limiting transparency and reproducibility. In this work, we introduce Instella, a family of fully open three billion parameter language models trained entirely on openly available data and codebase. Powered by AMD Instinct MI300X GPUs, Instella is developed through large-scale pre-training, general-purpose instruction tuning, and alignment with human preferences. Despite using substantially fewer pre-training tokens than many contemporaries, Instella achieves state-of-the-art results among fully open models and is competitive with leading open-weight models of comparable size. We further release two specialized variants: Instella-Long, capable of handling context lengths up to 128K tokens, and Instella-Math, a reasoning-focused model enhanced through supervised fine-tuning and reinforcement learning on mathematical tasks. Together, these contributions establish Instella as a transparent, performant, and versatile alternative for the community, advancing the goal of open and reproducible language modeling research.

TECM*: A Data-Driven Assessment to Reinforcement Learning Methods and Application to Heparin Treatment Strategy for Surgical Sepsis

Jiang Liu, Yujie Li, Chan Zhou, Yihao Xie, Qilong Sun, Xin Shu, Peiwei Li, Chunyong Yang, Yiziting Zhu, Jiaqi Zhu, Yuwen Chen, Bo An, Hao Wu, Bin Yi

Dec 2, 2025·cs.LG·PDF

Objective: Sepsis is a life-threatening condition caused by severe infection leading to acute organ dysfunction. This study proposes a data-driven metric and a continuous reward function to optimize personalized heparin therapy in surgical sepsis patients. Methods: Data from the MIMIC-IV v1.0 and eICU v2.0 databases were used for model development and evaluation. The training cohort consisted of abdominal surgery patients receiving unfractionated heparin (UFH) after postoperative sepsis onset. We introduce a new RL-based framework: converting the discrete SOFA score to a continuous cxSOFA for more nuanced state and reward functions; Second, defining "good" or "bad" strategies based on cxSOFA by a stepwise manner; Third, proposing a Treatment Effect Comparison Matrix (TECM), analogous to a confusion matrix for classification tasks, to evaluate the treatment strategies. We applied different RL algorithms, Q-Learning, DQN, DDQN, BCQ and CQL to optimize the treatment and comprehensively evaluated the framework. Results: Among the AI-derived strategies, the cxSOFA-CQL model achieved the best performance, reducing mortality from 1.83% to 0.74% with the average hospital stay from 11.11 to 9.42 days. TECM demonstrated consistent outcomes across models, highlighting robustness. Conclusion: The proposed RL framework enables interpretable and robust optimization of heparin therapy in surgical sepsis. Continuous cxSOFA scoring and TECM-based evaluation provide nuanced treatment assessment, showing promise for improving clinical outcomes and decision-support reliability.

Analytic Cancellation of Interference Terms and Closed-Form 1-Mode Marginals in Canonical Boson Sampling

Jiang Liu

Mar 1, 2026·quant-ph·PDF

Although the $k$-mode marginal distributions of Canonical Boson Sampling (CBS) are known to be computable in polynomial time, the physical mechanism driving this computational efficiency remains mathematically opaque. In this work, we provide a direct, bottom-up physical derivation of the exact 1-mode marginal distribution in CBS, computable in $\mathcal{O}(R^2)$ time, where $R$ is the total number of photons. We explicitly bridge this physical derivation with the mathematical theory of rank-1 matrix permanents, proving that multiphoton interference natively reduces to a symmetric polynomial scaled by a factorial bosonic bunching factor. Crucially, we demonstrate that our recursive combinatorial formulation circumvents the algorithmic overhead of characteristic function methods, entirely bypassing the need for polynomial interpolation or Fourier transforms. Finally, we apply this formula to identify macroscopic signatures of bunching, providing a rigorous, highly scalable metric for distinguishing genuine quantum interference from classical distinguishable-particle models using standard threshold detectors.

Code-MIE: A Code-style Model for Multimodal Information Extraction with Scene Graph and Entity Attribute Knowledge Enhancement

Jiang Liu, Ge Qiu, Hao Fei, Dongdong Xie, Jinbo Li, Fei Li, Chong Teng, Donghong Ji

Mar 21, 2026·cs.CL·PDF

With the rapid development of large language models (LLMs), more and more researchers have paid attention to information extraction based on LLMs. However, there are still some spaces to improve in the existing related methods. First, existing multimodal information extraction (MIE) methods usually employ natural language templates as the input and output of LLMs, which mismatch with the characteristics of information tasks that mostly include structured information such as entities and relations. Second, although a few methods have adopted structured and more IE-friendly code-style templates, they just explored their methods on text-only IE rather than multimodal IE. Moreover, their methods are more complex in design, requiring separate templates to be designed for each task. In this paper, we propose a Code-style Multimodal Information Extraction framework (Code-MIE) which formalizes MIE as unified code understanding and generation. Code-MIE has the following novel designs: (1) Entity attributes such as gender, affiliation are extracted from the text to guide the model to understand the context and role of entities. (2) Images are converted into scene graphs and visual features to incorporate rich visual information into the model. (3) The input template is constructed as a Python function, where entity attributes, scene graphs and raw text compose of the function parameters. In contrast, the output template is formalized as Python dictionaries containing all extraction results such as entities, relations, etc. To evaluate Code-MIE, we conducted extensive experiments on the M$^3$D, Twitter-15, Twitter-17, and MNRE datasets. The results show that our method achieves state-of-the-art performance compared to six competing baseline models, with 61.03\% and 60.49\% on the English and Chinese datasets of M$^3$D, and 76.04\%, 88.07\%, and 73.94\% on the other three datasets.

Constructive Incremental Learning for Fault Diagnosis of Rolling Bearings with Ensemble Domain Adaptation

Jiang Liu, Wei Dai

Aug 29, 2023·cs.AI·PDF

Given the prevalence of rolling bearing fault diagnosis as a practical issue across various working conditions, the limited availability of samples compounds the challenge. Additionally, the complexity of the external environment and the structure of rolling bearings often manifests faults characterized by randomness and fuzziness, hindering the effective extraction of fault characteristics and restricting the accuracy of fault diagnosis. To overcome these problems, this paper presents a novel approach termed constructive Incremental learning-based ensemble domain adaptation (CIL-EDA) approach. Specifically, it is implemented on stochastic configuration networks (SCN) to constructively improve its adaptive performance in multi-domains. Concretely, a cloud feature extraction method is employed in conjunction with wavelet packet decomposition (WPD) to capture the uncertainty of fault information from multiple resolution aspects. Subsequently, constructive Incremental learning-based domain adaptation (CIL-DA) is firstly developed to enhance the cross-domain learning capability of each hidden node through domain matching and construct a robust fault classifier by leveraging limited labeled data from both target and source domains. Finally, fault diagnosis results are obtained by a majority voting of CIL-EDA which integrates CIL-DA and parallel ensemble learning. Experimental results demonstrate that our CIL-DA outperforms several domain adaptation methods and CIL-EDA consistently outperforms state-of-art fault diagnosis methods in few-shot scenarios.

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

Feb 14, 2023·cs.CV·PDF

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input, and outputs a sequence of polygon vertices autoregressively. For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly, without any coordinate quantization error. In the experiments, PolyFormer outperforms the prior art by a clear margin, e.g., 5.40% and 4.52% absolute improvements on the challenging RefCOCO+ and RefCOCOg datasets. It also shows strong generalization ability when evaluated on the referring video segmentation task without fine-tuning, e.g., achieving competitive 61.5% J&F on the Ref-DAVIS17 dataset.

A Linear Algebra Formulation for Boolean Satisfiability Testing

Chengling Fang, Jiang Liu

Jan 10, 2017·cs.CC·PDF

In the article \The State of SAT", the authors asked whether a procedure dramatically different from DPLL can be found for handling unsatisfiable instances. This study proposes a new linear programming approach to address this issue efficiently. Our experiments showed that the new method works for many unsatisfiable instances. However, we must concede that this method should be incomplete; otherwise, it will imply P=co-NP.

A Lightweight Transfer Learning-Based State-of-Health Monitoring with Application to Lithium-ion Batteries in Autonomous Air Vehicles

Jiang Liu, Yan Qin, Wei Dai, Chau Yuen

Dec 9, 2025·cs.AI·PDF

Accurate and rapid state-of-health (SOH) monitoring plays an important role in indicating energy information for lithium-ion battery-powered portable mobile devices. To confront their variable working conditions, transfer learning (TL) emerges as a promising technique for leveraging knowledge from data-rich source working conditions, significantly reducing the training data required for SOH monitoring from target working conditions. However, traditional TL-based SOH monitoring is infeasible when applied in portable mobile devices since substantial computational resources are consumed during the TL stage and unexpectedly reduce the working endurance. To address these challenges, this paper proposes a lightweight TL-based SOH monitoring approach with constructive incremental transfer learning (CITL). First, taking advantage of the unlabeled data in the target domain, a semi-supervised TL mechanism is proposed to minimize the monitoring residual in a constructive way, through iteratively adding network nodes in the CITL. Second, the cross-domain learning ability of node parameters for CITL is comprehensively guaranteed through structural risk minimization, transfer mismatching minimization, and manifold consistency maximization. Moreover, the convergence analysis of the CITL is given, theoretically guaranteeing the efficacy of TL performance and network compactness. Finally, the proposed approach is verified through extensive experiments with a realistic autonomous air vehicles (AAV) battery dataset collected from dozens of flight missions. Specifically, the CITL outperforms SS-TCA, MMD-LSTM-DA, DDAN, BO-CNN-TL, and AS$^3$LSTM, in SOH estimation by 83.73%, 61.15%, 28.24%, 87.70%, and 57.34%, respectively, as evaluated using the index root mean square error.

LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset

Aizihaierjiang Yusufu, Jiang Liu, Kamran Aziz, Abidan Ainiwaer, Bobo Li, Fei Li, Donghong Ji, Aizierguli Yusufu

Apr 12, 2026·cs.CL·PDF

In recent years, aspect-based sentiment analysis (ABSA) has made rapid progress and shown strong practical value. However, existing research and benchmarks are largely concentrated on high-resource languages, leaving fine-grained sentiment extraction in low-resource languages under-explored. To address this gap, we constructed the first Low-resource languages Aspect-based Sentiment Quadruple dataset, named LASQ, which includes two low-resource languages: Uzbek and Uyghur. Secondly, it includes a fine-grained target-aspect-opinion-sentiment quadruple extraction task. To facilitate future research, we designed a grid-tagging model that integrates syntactic knowledge. This model incorporates part-of-speech (POS) and dependency knowledge into the model through our designed Syntax Knowledge Embedding Module (SKEM), thereby alleviating the lexical sparsity problem caused by agglutinative languages. Experiments on LASQ demonstrate consistent gains over competitive baselines, validating both the dataset's utility and the effectiveness of the proposed modeling approach.

One Model to Synthesize Them All: Multi-contrast Multi-scale Transformer for Missing Data Imputation

Jiang Liu, Srivathsa Pasumarthi, Ben Duffy, Enhao Gong, Keshav Datta, Greg Zaharchuk

Apr 28, 2022·eess.IV·PDF

Multi-contrast magnetic resonance imaging (MRI) is widely used in clinical practice as each contrast provides complementary information. However, the availability of each imaging contrast may vary amongst patients, which poses challenges to radiologists and automated image analysis algorithms. A general approach for tackling this problem is missing data imputation, which aims to synthesize the missing contrasts from existing ones. While several convolutional neural networks (CNN) based algorithms have been proposed, they suffer from the fundamental limitations of CNN models, such as the requirement for fixed numbers of input and output channels, the inability to capture long-range dependencies, and the lack of interpretability. In this work, we formulate missing data imputation as a sequence-to-sequence learning problem and propose a multi-contrast multi-scale Transformer (MMT), which can take any subset of input contrasts and synthesize those that are missing. MMT consists of a multi-scale Transformer encoder that builds hierarchical representations of inputs combined with a multi-scale Transformer decoder that generates the outputs in a coarse-to-fine fashion. The proposed multi-contrast Swin Transformer blocks can efficiently capture intra- and inter-contrast dependencies for accurate image synthesis. Moreover, MMT is inherently interpretable as it allows us to understand the importance of each input contrast in different regions by analyzing the in-built attention maps of Transformer blocks in the decoder. Extensive experiments on two large-scale multi-contrast MRI datasets demonstrate that MMT outperforms the state-of-the-art methods quantitatively and qualitatively.

Instruct2Attack: Language-Guided Semantic Adversarial Attacks

Jiang Liu, Chen Wei, Yuxiang Guo, Heng Yu, Alan Yuille, Soheil Feizi, Chun Pong Lau, Rama Chellappa

Nov 27, 2023·cs.CV·PDF

We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing noise-based and semantic attacks, I2A generates more natural and diverse adversarial examples while providing better controllability and interpretability. We further automate the attack process with GPT-4 to generate diverse image-specific text instructions. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses, and demonstrate great transferability among a variety of network architectures.

TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags

Jiang Liu, Donghong Ji, Jingye Li, Dongdong Xie, Chong Teng, Liang Zhao, Fei Li

Nov 1, 2022·cs.CL·PDF

So far, discontinuous named entity recognition (NER) has received increasing research attention and many related methods have surged such as hypergraph-based methods, span-based methods, and sequence-to-sequence (Seq2Seq) methods, etc. However, these methods more or less suffer from some problems such as decoding ambiguity and efficiency, which limit their performance. Recently, grid-tagging methods, which benefit from the flexible design of tagging systems and model architectures, have shown superiority to adapt for various information extraction tasks. In this paper, we follow the line of such methods and propose a competitive grid-tagging model for discontinuous NER. We call our model TOE because we incorporate two kinds of Tag-Oriented Enhancement mechanisms into a state-of-the-art (SOTA) grid-tagging model that casts the NER problem into word-word relationship prediction. First, we design a Tag Representation Embedding Module (TREM) to force our model to consider not only word-word relationships but also word-tag and tag-tag relationships. Concretely, we construct tag representations and embed them into TREM, so that TREM can treat tag and word representations as queries/keys/values and utilize self-attention to model their relationships. On the other hand, motivated by the Next-Neighboring-Word (NNW) and Tail-Head-Word (THW) tags in the SOTA model, we add two new symmetric tags, namely Previous-Neighboring-Word (PNW) and Head-Tail-Word (HTW), to model more fine-grained word-word relationships and alleviate error propagation from tag prediction. In the experiments of three benchmark datasets, namely CADEC, ShARe13 and ShARe14, our TOE model pushes the SOTA results by about 0.83%, 0.05% and 0.66% in F1, demonstrating its effectiveness.

An Optimum Algorithm for Quantum Search

Jiang Liu

Oct 7, 2020·quant-ph·PDF

This paper discusses an improvement to Grover's algorithm for searches where target states are Hamming weight eigenstates and search space is not ordered. It is shown that under these conditions search efficiency depends on the smaller number of 0's and 1's, not the total length, of binary string of target state, and that Grover's algorithm can be improved whenever number of 0's and number 1's are not equal. In particular, improvement can be exponential when number of 0's or number of 1's is very small relative to binary string length. One interesting application is that Dicke state preparation, which in Grover's algorithm is P on average, can be made poly-efficient in all cases. For decision making process, this improvement won't improve computation efficiency, but can make implementation much simpler.

The Geometric Unscented Kalman Filter

Chengling Fang, Jiang Liu, Songqing Ye, Ju Zhang

Sep 28, 2020·eess.SY·PDF

Many filters have been proposed in recent decades for the nonlinear state estimation problem. The linearization-based extended Kalman filter (EKF) is widely applied to nonlinear industrial systems. As EKF is limited in accuracy and reliability, sequential Monte-Carlo methods or particle filters (PF) can obtain superior accuracy at the cost of a huge number of random samples. The unscented Kalman filter (UKF) can achieve adequate accuracy more efficiently by using deterministic samples, but its weights may be negative, which might cause instability problem. For Gaussian filters, the cubature Kalman filter (CKF) and Gauss Hermit filter (GHF) employ cubature and respectively Gauss-Hermite rules to approximate statistic information of random variables and exhibit impressive performances in practical problems. Inspired by this work, this paper presents a new nonlinear estimation scheme named after geometric unscented Kalman filter (GUF). The GUF chooses the filtering framework of CKF for updating data and develops a geometric unscented sampling (GUS) strategy for approximating random variables. The main feature of GUS is selecting uniformly distributed samples according to the probability and geometric location similar to UKF and CKF, and having positive weights like PF. Through such way, GUF can maintain adequate accuracy as GHF with reasonable efficiency and good stability. The GUF does not suffer from the exponential increase of sample size as for PF or failure to converge resulted from non-positive weights as for high order CKF and UKF.

Code-MIE: A Code-style Model for Multimodal Information Extraction with Scene Graph and Entity Attribute Knowledge Enhancement

Jiang Liu, Ge Qiu, Hao Fei, Dongdong Xie, Jinbo Li, Fei Li, Chong Teng, Donghong Ji

Mar 21, 2026·cs.CL·PDF