Qingyi Wang, Shenhao Wang, Dingyi Zhuang, Haris Koutsopoulos, Jinhua Zhao
Recent studies have significantly improved the prediction accuracy of travel demand using graph neural networks. However, these studies largely ignored uncertainty that inevitably exists in travel demand prediction. To fill this gap, this study proposes a framework of probabilistic graph neural networks (Prob-GNN) to quantify the spatiotemporal uncertainty of travel demand. This Prob-GNN framework is substantiated by deterministic and probabilistic assumptions, and empirically applied to the task of predicting the transit and ridesharing demand in Chicago. We found that the probabilistic assumptions (e.g. distribution tail, support) have a greater impact on uncertainty prediction than the deterministic ones (e.g. deep modules, depth). Among the family of Prob-GNNs, the GNNs with truncated Gaussian and Laplace distributions achieve the highest performance in transit and ridesharing data. Even under significant domain shifts, Prob-GNNs can predict the ridership uncertainty in a stable manner, when the models are trained on pre-COVID data and tested across multiple periods during and after the COVID-19 pandemic. Prob-GNNs also reveal the spatiotemporal pattern of uncertainty, which is concentrated on the afternoon peak hours and the areas with large travel volumes. Overall, our findings highlight the importance of incorporating randomness into deep learning for spatiotemporal ridership prediction. Future research should continue to investigate versatile probabilistic assumptions to capture behavioral randomness, and further develop methods to quantify uncertainty to build resilient cities.
Qingyi Wang, Yuebing Liang, Yunhan Zheng, Kaiyuan Xu, Jinhua Zhao, Shenhao Wang
Generative AI offers new opportunities for automating urban planning by creating site-specific urban layouts and enabling flexible design exploration. However, existing approaches often struggle to produce realistic and practical designs at scale. Therefore, we adapt a state-of-the-art Stable Diffusion model, extended with ControlNet, to generate high-fidelity satellite imagery conditioned on land use descriptions, infrastructure, and natural environments. To overcome data availability limitations, we spatially link satellite imagery with structured land use and constraint information from OpenStreetMap. Using data from three major U.S. cities, we demonstrate that the proposed diffusion model generates realistic and diverse urban landscapes by varying land-use configurations, road networks, and water bodies, facilitating cross-city learning and design diversity. We also systematically evaluate the impacts of varying language prompts and control imagery on the quality of satellite imagery generation. Our model achieves high FID and KID scores and demonstrates robustness across diverse urban contexts. Qualitative assessments from urban planners and the general public show that generated images align closely with design descriptions and constraints, and are often preferred over real images. This work establishes a benchmark for controlled urban imagery generation and highlights the potential of generative AI as a tool for enhancing planning workflows and public engagement.
Qingyi Wang, Shenhao Wang, Yunhan Zheng, Hongzhou Lin, Xiaohu Zhang, Jinhua Zhao, Joan Walker
Classical demand modeling analyzes travel behavior using only low-dimensional numeric data (i.e. sociodemographics and travel attributes) but not high-dimensional urban imagery. However, travel behavior depends on the factors represented by both numeric data and urban imagery, thus necessitating a synergetic framework to combine them. This study creates a theoretical framework of deep hybrid models with a crossing structure consisting of a mixing operator and a behavioral predictor, thus integrating the numeric and imagery data into a latent space. Empirically, this framework is applied to analyze travel mode choice using the MyDailyTravel Survey from Chicago as the numeric inputs and the satellite images as the imagery inputs. We found that deep hybrid models outperform both the traditional demand models and the recent deep learning in predicting the aggregate and disaggregate travel behavior with our supervision-as-mixing design. The latent space in deep hybrid models can be interpreted, because it reveals meaningful spatial and social patterns. The deep hybrid models can also generate new urban images that do not exist in reality and interpret them with economic theory, such as computing substitution patterns and social welfare changes. Overall, the deep hybrid models demonstrate the complementarity between the low-dimensional numeric and high-dimensional imagery data and between the traditional demand modeling and recent deep learning. It generalizes the latent classes and variables in classical hybrid demand models to a latent space, and leverages the computational power of deep learning for imagery while retaining the economic interpretability on the microeconomics foundation.
Xinling Li, Xiaotong Guo, Qingyi Wang, Gioele Zardini, Jinhua Zhao
Autonomous Mobility-on-Demand (AMoD) services offer an opportunity for improving passenger service while reducing pollution and energy consumption through effective vehicle coordination. A primary challenge in the autonomous fleets coordination is to tackle the inherent issue of supply-demand imbalance. A key strategy in resolving this is vehicle rebalancing, strategically directing idle vehicles to areas with anticipated future demand. Traditional research focuses on deterministic optimization using specific demand forecasts, but the unpredictable nature of demand calls for methods that can manage this uncertainty. This paper introduces the Deep Uncertainty Robust Optimization (DURO), a framework specifically designed for vehicle rebalancing in AMoD systems amidst uncertain demand based on neural networks for robust optimization. DURO forecasts demand uncertainty intervals using a deep neural network, which are then integrated into a robust optimization model. We assess DURO against various established models, including deterministic optimization with refined demand forecasts and Distributionally Robust Optimization (DRO). Based on real-world data from New York City (NYC), our findings show that DURO surpasses traditional deterministic models in accuracy and is on par with DRO, but with superior computational efficiency. The DURO framework is a promising approach for vehicle rebalancing in AMoD systems that is proven to be effective in managing demand uncertainty, competitive in performance, and more computationally efficient than other optimization models.
Zhouhong Gu, Xiaoxuan Zhu, Yin Cai, Hao Shen, Xingzhou Chen, Qingyi Wang, Jialin Li, Xiaoran Shi, Haoran Guo, Wenxuan Huang, Hongwei Feng, Yanghua Xiao, Zheyu Ye, Yao Hu, Shaosheng Cao
Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains. However, current frameworks face critical challenges in system architecture design, cross-domain generalizability, and performance guarantees, particularly as task complexity and number of agents increases. We introduces AgentGroupChat-V2, a novel framework addressing these challenges through three core innovations: (1) a divide-and-conquer fully parallel architecture that decomposes user queries into hierarchical task forest structures enabling dependency management and distributed concurrent processing. (2) an adaptive collaboration engine that dynamically selects heterogeneous LLM combinations and interaction modes based on task characteristics. (3) agent organization optimization strategies combining divide-and-conquer approaches for efficient problem decomposition. Extensive experiments demonstrate AgentGroupChat-V2's superior performance across diverse domains, achieving 91.50% accuracy on GSM8K (exceeding the best baseline by 5.6 percentage points), 30.4% accuracy on competition-level AIME (nearly doubling other methods), and 79.20% pass@1 on HumanEval. Performance advantages become increasingly pronounced with higher task difficulty, particularly on Level 5 MATH problems where improvements exceed 11 percentage points compared to state-of-the-art baselines. These results confirm that AgentGroupChat-V2 provides a comprehensive solution for building efficient, general-purpose LLM multi-agent systems with significant advantages in complex reasoning scenarios. Code is available at https://github.com/MikeGu721/AgentGroupChat-V2.
Shuzheng Si, Qingyi Wang, Haozhe Zhao, Yuzhuo Bai, Guanqiao Chen, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun
Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-5.2 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.
Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Shenhao Wang, Jinhua Zhao
Short-term demand forecasting for on-demand ride-hailing services is one of the fundamental issues in intelligent transportation systems. However, previous travel demand forecasting research predominantly focused on improving prediction accuracy, ignoring fairness issues such as systematic underestimations of travel demand in disadvantaged neighborhoods. This study investigates how to measure, evaluate, and enhance prediction fairness between disadvantaged and privileged communities in spatial-temporal demand forecasting of ride-hailing services. A two-pronged approach is taken to reduce the demand prediction bias. First, we develop a novel deep learning model architecture, named socially aware neural network (SA-Net), to integrate the socio-demographics and ridership information for fair demand prediction through an innovative socially-aware convolution operation. Second, we propose a bias-mitigation regularization method to mitigate the mean percentage prediction error gap between different groups. The experimental results, validated on the real-world Chicago Transportation Network Company (TNC) data, show that the de-biasing SA-Net can achieve better predictive performance in both prediction accuracy and fairness. Specifically, the SA-Net improves prediction accuracy for both the disadvantaged and privileged groups compared with the state-of-the-art models. When coupled with the bias mitigation regularization method, the de-biasing SA-Net effectively bridges the mean percentage prediction error gap between the disadvantaged and privileged groups, and also protects the disadvantaged regions against systematic underestimation of TNC demand. Our proposed de-biasing method can be adopted in many existing short-term travel demand estimation models, and can be utilized for various other spatial-temporal prediction tasks such as crime incidents predictions.
Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li, Yanchang Li, Zhibin Li, Guangming Liu, Kairui Liu, Lihao Liu, Weizhi Liu, Xiaoshun Liu, Yufei Liu, Yunfei Liu, Qiang Lu, Yuanfei Luo, Xiang Lv, Hongying Ma, Sai Ma, Lingxian Mi, Sha Sa, Hongxiang Shu, Lei Tian, Chengzhi Wang, Jiayu Wang, Kaijie Wang, Qingyi Wang, Renwen Wang, Tao Wang, Wei Wang, Xirui Wang, Chao Wei, Xuguang Wei, Zijun Xia, Zhaohao Xiao, Tingshuai Yan, Liyan Yang, Yifan Yang, Zhikai Yang, Zhong Yin, Li Yuan, Liuchun Yuan, Chi Zhang, Jinyang Zhang, Junhui Zhang, Linge Zhang, Zhenyi Zhang, Zheyu Zhang, Dongjie Zhu, Hang Li, Yangang Zhang
Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal LLM, processes vision and language inputs to perform self and goal localization using a hybrid topological-semantic graph as the global map, and outperforms traditional visual place recognition methods. Astra-Local, a multitask network, handles local path planning and odometry estimation. Its 4D spatial-temporal encoder, trained through self-supervised learning, generates robust 4D features for downstream tasks. The planning head utilizes flow matching and a novel masked ESDF loss to minimize collision risks for generating local trajectories, and the odometry head integrates multi-sensor inputs via a transformer encoder to predict the relative pose of the robot. Deployed on real in-house mobile robots, Astra achieves high end-to-end mission success rate across diverse indoor environments.
Team Seedance, Heyi Chen, Siyan Chen, Xin Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Tianheng Cheng, Xinqi Cheng, Xuyan Chi, Jian Cong, Jing Cui, Qinpeng Cui, Qide Dong, Junliang Fan, Jing Fang, Zetao Fang, Chengjian Feng, Han Feng, Mingyuan Gao, Yu Gao, Dong Guo, Qiushan Guo, Boyang Hao, Qingkai Hao, Bibo He, Qian He, Tuyen Hoang, Ruoqing Hu, Xi Hu, Weilin Huang, Zhaoyang Huang, Zhongyi Huang, Donglei Ji, Siqi Jiang, Wei Jiang, Yunpu Jiang, Zhuo Jiang, Ashley Kim, Jianan Kong, Zhichao Lai, Shanshan Lao, Yichong Leng, Ai Li, Feiya Li, Gen Li, Huixia Li, JiaShi Li, Liang Li, Ming Li, Shanshan Li, Tao Li, Xian Li, Xiaojie Li, Xiaoyang Li, Xingxing Li, Yameng Li, Yifu Li, Yiying Li, Chao Liang, Han Liang, Jianzhong Liang, Ying Liang, Zhiqiang Liang, Wang Liao, Yalin Liao, Heng Lin, Kengyu Lin, Shanchuan Lin, Xi Lin, Zhijie Lin, Feng Ling, Fangfang Liu, Gaohong Liu, Jiawei Liu, Jie Liu, Jihao Liu, Shouda Liu, Shu Liu, Sichao Liu, Songwei Liu, Xin Liu, Xue Liu, Yibo Liu, Zikun Liu, Zuxi Liu, Junlin Lyu, Lecheng Lyu, Qian Lyu, Han Mu, Xiaonan Nie, Jingzhe Ning, Xitong Pan, Yanghua Peng, Lianke Qin, Xueqiong Qu, Yuxi Ren, Kai Shen, Guang Shi, Lei Shi, Yan Song, Yinglong Song, Fan Sun, Li Sun, Renfei Sun, Yan Sun, Zeyu Sun, Wenjing Tang, Yaxue Tang, Zirui Tao, Feng Wang, Furui Wang, Jinran Wang, Junkai Wang, Ke Wang, Kexin Wang, Qingyi Wang, Rui Wang, Sen Wang, Shuai Wang, Tingru Wang, Weichen Wang, Xin Wang, Yanhui Wang, Yue Wang, Yuping Wang, Yuxuan Wang, Ziyu Wang, Guoqiang Wei, Wanru Wei, Di Wu, Guohong Wu, Hanjie Wu, Jian Wu, Jie Wu, Ruolan Wu, Xinglong Wu, Yonghui Wu, Ruiqi Xia, Liang Xiang, Fei Xiao, XueFeng Xiao, Pan Xie, Shuangyi Xie, Shuang Xu, Jinlan Xue, Shen Yan, Bangbang Yang, Ceyuan Yang, Jiaqi Yang, Runkai Yang, Tao Yang, Yang Yang, Yihang Yang, ZhiXian Yang, Ziyan Yang, Songting Yao, Yifan Yao, Zilyu Ye, Bowen Yu, Jian Yu, Chujie Yuan, Linxiao Yuan, Sichun Zeng, Weihong Zeng, Xuejiao Zeng, Yan Zeng, Chuntao Zhang, Heng Zhang, Jingjie Zhang, Kuo Zhang, Liang Zhang, Liying Zhang, Manlin Zhang, Ting Zhang, Weida Zhang, Xiaohe Zhang, Xinyan Zhang, Yan Zhang, Yuan Zhang, Zixiang Zhang, Fengxuan Zhao, Huating Zhao, Yang Zhao, Hao Zheng, Jianbin Zheng, Xiaozheng Zheng, Yangyang Zheng, Yijie Zheng, Jiexin Zhou, Jiahui Zhu, Kuan Zhu, Shenhan Zhu, Wenjia Zhu, Benhui Zou, Feilong Zuo
Shenhao Wang, Qingyi Wang, Jinhua Zhao
It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze travel behavior. This study presents a framework of multitask learning deep neural networks (MTLDNNs) for this question, and demonstrates that MTLDNNs are more generic than the traditional nested logit (NL) method, due to its capacity of automatic feature learning and soft constraints. About 1,500 MTLDNN models are designed and applied to the survey data that was collected in Singapore and focused on the RP of four current travel modes and the SP with autonomous vehicles (AV) as the one new travel mode in addition to those in RP. We found that MTLDNNs consistently outperform six benchmark models and particularly the classical NL models by about 5% prediction accuracy in both RP and SP datasets. This performance improvement can be mainly attributed to the soft constraints specific to MTLDNNs, including its innovative architectural design and regularization methods, but not much to the generic capacity of automatic feature learning endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs are also interpretable. The empirical results show that AV is mainly the substitute of driving and AV alternative-specific variables are more important than the socio-economic variables in determining AV adoption. Overall, this study introduces a new MTLDNN framework to combine RP and SP, and demonstrates its theoretical flexibility and empirical power for prediction and interpretation. Future studies can design new MTLDNN architectures to reflect the speciality of RP and SP and extend this work to other behavioral analysis.
Xiaotong Guo, Baichuan Mo, Qingyi Wang
In response to the Amazon Last-Mile Routing Challenge, Team Permission Denied proposes a hierarchical Travelling Salesman Problem (TSP) optimization with a customized cost matrix. The higher level TSP solves for the zone sequence while the lower level TSP solves the intra-zonal stop sequence. The cost matrix is modified to account for routing patterns beyond the shortest travel time. Lastly, some post-processing is done to edit the sequence to match commonly observed routing patterns, such as when travel times are similar, drivers usually start with stops with more packages than those with fewer packages. The model is tested on 1223 routes that are randomly selected out of the training set and the score is 0.0381. On the 13 routes in the given model apply set, the score was 0.0375.
Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao
Urban design is a multifaceted process that demands careful consideration of site-specific constraints and collaboration among diverse professionals and stakeholders. The advent of generative artificial intelligence (GenAI) offers transformative potential by improving the efficiency of design generation and facilitating the communication of design ideas. However, most existing approaches are not well integrated with human design workflows. They often follow end-to-end pipelines with limited control, overlooking the iterative nature of real-world design. This study proposes a stepwise generative urban design framework that integrates multimodal diffusion models with human expertise to enable more adaptive and controllable design processes. Instead of generating design outcomes in a single end-to-end process, the framework divides the process into three key stages aligned with established urban design workflows: (1) road network and land use planning, (2) building layout planning, and (3) detailed planning and rendering. At each stage, multimodal diffusion models generate preliminary designs based on textual prompts and image-based constraints, which can then be reviewed and refined by human designers. We design an evaluation framework to assess the fidelity, compliance, and diversity of the generated designs. Experiments using data from Chicago and New York City demonstrate that our framework outperforms baseline models and end-to-end approaches across all three dimensions. This study underscores the benefits of multimodal diffusion models and stepwise generation in preserving human control and facilitating iterative refinements, laying the groundwork for human-AI interaction in urban design solutions.
Da Zhang, Qingyi Wang, Shaojie Song, Simiao Chen, Mingwei Li, Lu Shen, Siqi Zheng, Bofeng Cai, Shenhao Wang
Estimating health benefits of reducing fossil fuel use from improved air quality provides important rationales for carbon emissions abatement. Simulating pollution concentration is a crucial step of the estimation, but traditional approaches often rely on complicated chemical transport models that require extensive expertise and computational resources. In this study, we develop a novel and succinct machine learning framework that is able to provide precise and robust annual average fine particle (PM2.5) concentration estimations directly from a high-resolution fossil energy use data set. The accessibility and applicability of this framework show great potentials of machine learning approaches for integrated assessment studies. Applications of the framework with Chinese data reveal highly heterogeneous health benefits of reducing fossil fuel use in different sectors and regions in China with a mean of \$34/tCO2 and a standard deviation of \$84/tCO2. Reducing rural and residential coal use offers the highest co-benefits with a mean of \$360/tCO2. Our findings prompt careful policy designs to maximize cost-effectiveness in the transition towards a carbon-neutral energy system.
Shenhao Wang, Qingyi Wang, Nate Bailey, Jinhua Zhao
While researchers increasingly use deep neural networks (DNN) to analyze individual choices, overfitting and interpretability issues remain as obstacles in theory and practice. By using statistical learning theory, this study presents a framework to examine the tradeoff between estimation and approximation errors, and between prediction and interpretation losses. It operationalizes the DNN interpretability in the choice analysis by formulating the metrics of interpretation loss as the difference between true and estimated choice probability functions. This study also uses the statistical learning theory to upper bound the estimation error of both prediction and interpretation losses in DNN, shedding light on why DNN does not have the overfitting issue. Three scenarios are then simulated to compare DNN to binary logit model (BNL). We found that DNN outperforms BNL in terms of both prediction and interpretation for most of the scenarios, and larger sample size unleashes the predictive power of DNN but not BNL. DNN is also used to analyze the choice of trip purposes and travel modes based on the National Household Travel Survey 2017 (NHTS2017) dataset. These experiments indicate that DNN can be used for choice analysis beyond the current practice of demand forecasting because it has the inherent utility interpretation, the flexibility of accommodating various information formats, and the power of automatically learning utility specification. DNN is both more predictive and interpretable than BNL unless the modelers have complete knowledge about the choice task, and the sample size is small. Overall, statistical learning theory can be a foundation for future studies in the non-asymptotic data regime or using high-dimensional statistical models in choice analysis, and the experiments show the feasibility and effectiveness of DNN for its wide applications to policy and behavioral analysis.
Shenhao Wang, Qingyi Wang, Jinhua Zhao
While deep neural networks (DNNs) have been increasingly applied to choice analysis showing high predictive power, it is unclear to what extent researchers can interpret economic information from DNNs. This paper demonstrates that DNNs can provide economic information as complete as classical discrete choice models (DCMs). The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by domain experts. However, the economic information obtained from DNNs can be unreliable because of the three challenges associated with the automatic learning capacity: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength and challenges of DNNs, we estimated the DNNs using a stated preference survey, extracted the full list of economic information from the DNNs, and compared them with those from the DCMs. We found that the economic information either aggregated over trainings or population is more reliable than the disaggregate information of the individual observations or trainings, and that even simple hyperparameter searching can significantly improve the reliability of the economic information extracted from the DNNs. Future studies should investigate other regularizations and DNN architectures, better optimization algorithms, and robust DNN training methods to address DNNs' three challenges, to provide more reliable economic information from DNN-based choice models.
Guang Hua, Qingyi Wang, Dengpan Ye, Haijian Zhang
Power system frequency could be captured by digital recordings and extracted to compare with a reference database for forensic time-stamp verification. It is known as the electric network frequency (ENF) criterion, enabled by the properties of random fluctuation and intra-grid consistency. In essence, this is a task of matching a short random sequence within a long reference, and the reliability of this criterion is mainly concerned with whether this match could be unique and correct. In this paper, we comprehensively analyze the factors affecting the reliability of ENF matching, including length of test recording, length of reference, temporal resolution, and signal-to-noise ratio (SNR). For synthetic analysis, we incorporate the first-order autoregressive (AR) ENF model and propose an efficient time-frequency domain (TFD) noisy ENF synthesis method. Then, the reliability analysis schemes for both synthetic and real-world data are respectively proposed. Through a comprehensive study we reveal that while the SNR is an important external factor to determine whether time-stamp verification is viable, the length of test recording is the most important inherent factor, followed by the length of reference. However, the temporal resolution has little impact on the matching process.
Dingyi Zhuang, Qingyi Wang, Yunhan Zheng, Xiaotong Guo, Shenhao Wang, Haris N Koutsopoulos, Jinhua Zhao
Transportation mode share analysis is important to various real-world transportation tasks as it helps researchers understand the travel behaviors and choices of passengers. A typical example is the prediction of communities' travel mode share by accounting for their sociodemographics like age, income, etc., and travel modes' attributes (e.g. travel cost and time). However, there exist only limited efforts in integrating the structure of the urban built environment, e.g., road networks, into the mode share models to capture the impacts of the built environment. This task usually requires manual feature engineering or prior knowledge of the urban design features. In this study, we propose deep hybrid models (DHM), which directly combine road networks and sociodemographic features as inputs for travel mode share analysis. Using graph embedding (GE) techniques, we enhance travel demand models with a more powerful representation of urban structures. In experiments of mode share prediction in Chicago, results demonstrate that DHM can provide valuable spatial insights into the sociodemographic structure, improving the performance of travel demand models in estimating different mode shares at the city level. Specifically, DHM improves the results by more than 20\% while retaining the interpretation power of the choice models, demonstrating its superiority in interpretability, prediction accuracy, and geographical insights.
Yuzhuo Bai, Shuzheng Si, Kangyang Luo, Qingyi Wang, Wenhao Li, Gang Chen, Fanchao Qi, Maosong Sun
Large language models (LLMs) often hallucinate, yet most existing fact-checking methods treat factuality evaluation as a binary classification problem, offering limited interpretability and failing to capture fine-grained error types. In this paper, we introduce InFi-Check, a framework for interpretable and fine-grained fact-checking of LLM outputs. Specifically, we first propose a controlled data synthesis pipeline that generates high-quality data featuring explicit evidence, fine-grained error type labels, justifications, and corrections. Based on this, we further construct large-scale training data and a manually verified benchmark InFi-Check-FG for fine-grained fact-checking of LLM outputs. Building on these high-quality training data, we further propose InFi-Checker, which can jointly provide supporting evidence, classify fine-grained error types, and produce justifications along with corrections. Experiments show that InFi-Checker achieves state-of-the-art performance on InFi-Check-FG and strong generalization across various downstream tasks, significantly improving the utility and trustworthiness of factuality evaluation.
Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, Mojie Chi, Xuyan Chi, Jian Cong, Qinpeng Cui, Fei Ding, Qide Dong, Yujiao Du, Haojie Duanmu, Junliang Fan, Jiarui Fang, Jing Fang, Zetao Fang, Chengjian Feng, Yu Gao, Diandian Gu, Dong Guo, Hanzhong Guo, Qiushan Guo, Boyang Hao, Hongxiang Hao, Haoxun He, Jiaao He, Qian He, Tuyen Hoang, Heng Hu, Ruoqing Hu, Yuxiang Hu, Jiancheng Huang, Weilin Huang, Zhaoyang Huang, Zhongyi Huang, Jishuo Jin, Ming Jing, Ashley Kim, Shanshan Lao, Yichong Leng, Bingchuan Li, Gen Li, Haifeng Li, Huixia Li, Jiashi Li, Ming Li, Xiaojie Li, Xingxing Li, Yameng Li, Yiying Li, Yu Li, Yueyan Li, Chao Liang, Han Liang, Jianzhong Liang, Ying Liang, Wang Liao, J. H. Lien, Shanchuan Lin, Xi Lin, Feng Ling, Yue Ling, Fangfang Liu, Jiawei Liu, Jihao Liu, Jingtuo Liu, Shu Liu, Sichao Liu, Wei Liu, Xue Liu, Zuxi Liu, Ruijie Lu, Lecheng Lyu, Jingting Ma, Tianxiang Ma, Xiaonan Nie, Jingzhe Ning, Junjie Pan, Xitong Pan, Ronggui Peng, Xueqiong Qu, Yuxi Ren, Yuchen Shen, Guang Shi, Lei Shi, Yinglong Song, Fan Sun, Li Sun, Renfei Sun, Wenjing Tang, Boyang Tao, Zirui Tao, Dongliang Wang, Feng Wang, Hulin Wang, Ke Wang, Qingyi Wang, Rui Wang, Shuai Wang, Shulei Wang, Weichen Wang, Xuanda Wang, Yanhui Wang, Yue Wang, Yuping Wang, Yuxuan Wang, Zijie Wang, Ziyu Wang, Guoqiang Wei, Meng Wei, Di Wu, Guohong Wu, Hanjie Wu, Huachao Wu, Jian Wu, Jie Wu, Ruolan Wu, Shaojin Wu, Xiaohu Wu, Xinglong Wu, Yonghui Wu, Ruiqi Xia, Xin Xia, Xuefeng Xiao, Shuang Xu, Bangbang Yang, Jiaqi Yang, Runkai Yang, Tao Yang, Yihang Yang, Zhixian Yang, Ziyan Yang, Fulong Ye, Bingqian Yi, Xing Yin, Yongbin You, Linxiao Yuan, Weihong Zeng, Xuejiao Zeng, Yan Zeng, Siyu Zhai, Zhonghua Zhai, Bowen Zhang, Chenlin Zhang, Heng Zhang, Jun Zhang, Manlin Zhang, Peiyuan Zhang, Shuo Zhang, Xiaohe Zhang, Xiaoying Zhang, Xinyan Zhang, Xinyi Zhang, Yichi Zhang, Zixiang Zhang, Haiyu Zhao, Huating Zhao, Liming Zhao, Yian Zhao, Guangcong Zheng, Jianbin Zheng, Xiaozheng Zheng, Zerong Zheng, Kuan Zhu, Feilong Zuo