Search AI News
Find articles across all categories and topics
346 results for "paper"

Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
arXiv:2603.28651v1 Announce Type: new Abstract: With the rapid progress of multimodal large language models (MLLMs), AI already performs well at literature retrieval and certain reasoning tasks, serving as a capable assistant to human researchers, yet it remains far from autonomous research. The fundamental reason is that current work on academic paper reasoning is largely confined to a search-oriented paradigm centered on pre-specified targets, with reasoning grounded in relevance retrieval, which struggles to support researcher-style full-document understanding, reasoning, and verification. — Rongjin Li, Zichen Tang, Xianghe Wang, Xinyi Hu, Zhengyu Wang, Zhengyu Lu, Yiling Huang, Jiayuan Chen, Weisheng Tan, Jiacheng Liu, Zhongjun Yang, Haihong E

The Cognitive Divergence: AI Context Windows, Human Attention Decline, and the Delegation Feedback Loop
arXiv:2603.26707v1 Announce Type: cross Abstract: This paper documents and theorises a self-reinforcing dynamic between two measurable trends: the exponential expansion of large language model (LLM) context windows and the secular contraction of human sustained-attention capacity. We term the resulting asymmetry the Cognitive Divergence. AI context windows have grown from 512 tokens in 2017 to 2,000,000 tokens by 2026 (factor ~3,906; fitted lambda = 0.59/yr; doubling time ~14 months). Over the same period, human Effective Context Span (ECS) -- a token-equivalent measure derived from validated — Netanel Eliav (Machine Human Intelligence Lab)

Defend: Automated Rebuttals for Peer Review with Minimal Author Guidance
arXiv:2603.27360v1 Announce Type: new Abstract: Rebuttal generation is a critical component of the peer review process for scientific papers, enabling authors to clarify misunderstandings, correct factual inaccuracies, and guide reviewers toward a more accurate evaluation. We observe that Large Language Models (LLMs) often struggle to perform targeted refutation and maintain accurate factual grounding when used directly for rebuttal generation, highlighting the need for structured reasoning and author intervention. To address this, in the paper, we introduce DEFEND an LLM based tool designed t — Jyotsana Khatri, Manasi Patwardhan

Dual-Stage LLM Framework for Scenario-Centric Semantic Interpretation in Driving Assistance
arXiv:2603.27536v1 Announce Type: new Abstract: Advanced Driver Assistance Systems (ADAS) increasingly rely on learning-based perception, yet safety-relevant failures often arise without component malfunction, driven instead by partial observability and semantic ambiguity in how risk is interpreted and communicated. This paper presents a scenario-centric framework for reproducible auditing of LLM-based risk reasoning in urban driving contexts. Deterministic, temporally bounded scenario windows are constructed from multimodal driving data and evaluated under fixed prompt constraints and a close — Jean Douglas Carvalho, Hugo Taciro Kenji, Ahmad Mohammad Saber, Glaucia Melo, Max Mauro Dias Santos, Deepa Kundur

Physicochemical-Neural Fusion for Semi-Closed-Circuit Respiratory Autonomy in Extreme Environments
arXiv:2603.26697v1 Announce Type: cross Abstract: This paper introduces Galactic Bioware's Life Support System, a semi-closed-circuit breathing apparatus designed for integration into a positive-pressure firefighting suit and governed by an AI control system. The breathing loop incorporates a soda lime CO2 scrubber, a silica gel dehumidifier, and pure O2 replenishment with finite consumables. One-way exhaust valves maintain positive pressure while creating a semi-closed system in which outward venting gradually depletes the gas inventory. Part I develops the physicochemical foundations from fi — Phillip Kingston, Nicholas Johnston

Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
arXiv:2603.26676v1 Announce Type: cross Abstract: Current frontier AI safety evaluations emphasize static benchmarks, third-party annotations, and red-teaming. In this position paper, we argue that AI safety research should focus on human-centered evaluations that measure harmful capability uplift: the marginal increase in a user's ability to cause harm with a frontier model beyond what conventional tools already enable. We frame harmful capability uplift as a core AI safety metric, ground it in prior social science research, and provide concrete methodological guidance for systematic measurem — Michelle Vaccaro, Jaeyoon Song, Abdullah Almaatouq, Michiel A. Bakker

Quantification of Credal Uncertainty: A Distance-Based Approach
arXiv:2603.27270v1 Announce Type: new Abstract: Credal sets, i.e., closed convex sets of probability measures, provide a natural framework to represent aleatoric and epistemic uncertainty in machine learning. Yet how to quantify these two types of uncertainty for a given credal set, particularly in multiclass classification, remains underexplored. In this paper, we propose a distance-based approach to quantify total, aleatoric, and epistemic uncertainty for credal sets. Concretely, we introduce a family of such measures within the framework of Integral Probability Metrics (IPMs). The resulting — Xabier Gonzalez-Garcia, Siu Lun Chau, Julian Rodemann, Michele Caprio, Krikamol Muandet, Humberto Bustince, S\'ebastien Destercke, Eyke H\"ullermeier, Yusuf Sale

Concerning Uncertainty -- A Systematic Survey of Uncertainty-Aware XAI
arXiv:2603.26838v1 Announce Type: new Abstract: This paper surveys uncertainty-aware explainable artificial intelligence (UAXAI), examining how uncertainty is incorporated into explanatory pipelines and how such methods are evaluated. Across the literature, three recurring approaches to uncertainty quantification emerge (Bayesian, Monte Carlo, and Conformal methods), alongside distinct strategies for integrating uncertainty into explanations: assessing trustworthiness, constraining models or explanations, and explicitly communicating uncertainty. Evaluation practices remain fragmented and larg — Helena L\"ofstr\"om, Tuwe L\"ofstr\"om, Anders Hjort, Fatima Rabia Yapicioglu

Bitboard version of Tetris AI
arXiv:2603.26765v1 Announce Type: new Abstract: The efficiency of game engines and policy optimization algorithms is crucial for training reinforcement learning (RL) agents in complex sequential decision-making tasks, such as Tetris. Existing Tetris implementations suffer from low simulation speeds, suboptimal state evaluation, and inefficient training paradigms, limiting their utility for large-scale RL research. To address these limitations, this paper proposes a high-performance Tetris AI framework based on bitboard optimization and improved RL algorithms. First, we redesign the Tetris game — Xingguo Chen, Pingshou Xiong, Zhenyu Luo, Mengfei Hu, Xinwen Li, Yongzhou L\"u, Guang Yang, Chao Li, Shangdong Yang

On the Relationship between Bayesian Networks and Probabilistic Structural Causal Models
arXiv:2603.27406v1 Announce Type: new Abstract: In this paper, the relationship between probabilistic graphical models, in particular Bayesian networks, and causal diagrams, also called structural causal models, is studied. Structural causal models are deterministic models, based on structural equations or functions, that can be provided with uncertainty by adding independent, unobserved random variables to the models, equipped with probability distributions. One question that arises is whether a Bayesian network that has obtained from expert knowledge or learnt from data can be mapped to a pr — Peter J. F. Lucas, Eleanora Zullo, Fabio Stella

PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision
arXiv:2603.28183v1 Announce Type: new Abstract: Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "perception, recognition, decision-making." We constructed a high-quality multitask EM dataset, PReD-1.3M, and an evaluation benchmark, PReD-Bench. The dataset e — Zehua Han, Jing Xiao, Yiqi Duan, Mengyu Xiang, Yuheng Ji, Xiaolong Zheng, Chenghanyu Zhang, Zhendong She, Junyu Shen, Dingwei Tan, Shichu Sun, Zhou Cong, Mingxuan Liu, Fengxiang Wang, Jinping Sun, Yangang Sun

Dogfight Search: A Swarm-Based Optimization Algorithm for Complex Engineering Optimization and Mountainous Terrain Path Planning
arXiv:2603.28046v1 Announce Type: new Abstract: Dogfight is a tactical behavior of cooperation between fighters. Inspired by this, this paper proposes a novel metaphor-free metaheuristic algorithm called Dogfight Search (DoS). Unlike traditional algorithms, DoS draws algorithmic framework from the inspiration, but its search mechanism is constructed based on the displacement integration equations in kinematics. Through experimental validation on CEC2017 and CEC2022 benchmark test functions, 10 real-world constrained optimization problems and mountainous terrain path planning tasks, DoS signifi — Yujing Sun, Jie Cai, Xingguo Xu, Yuansheng Gao, Lei Zhang, Kaichen Ouyang, Zhanyu Liu

Exploring Cultural Variations in Moral Judgments with Large Language Models
arXiv:2506.12433v2 Announce Type: cross Abstract: Large Language Models (LLMs) have shown strong performance across many tasks, but their ability to capture culturally diverse moral values remains unclear. In this paper, we examine whether LLMs mirror variations in moral attitudes reported by the World Values Survey (WVS) and the Pew Research Center's Global Attitudes Survey (PEW). We compare smaller monolingual and multilingual models (GPT-2, OPT, BLOOMZ, and Qwen) with recent instruction-tuned models (GPT-4o, GPT-4o-mini, Gemma-2-9b-it, and Llama-3.3-70B-Instruct). Using log-probability-base — Hadi Mohammadi, Ayoub Bagheri

Bridge-RAG: An Abstract Bridge Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
arXiv:2603.26668v1 Announce Type: cross Abstract: As an important paradigm for enhancing the generation quality of Large Language Models (LLMs), retrieval-augmented generation (RAG) faces the two challenges regarding retrieval accuracy and computational efficiency. This paper presents a novel RAG framework called Bridge-RAG. To overcome the accuracy challenge, we introduce the concept of abstract to bridge query entities and document chunks, providing robust semantic understanding. We organize the abstracts into a tree structure and design a multi-level retrieval strategy to ensure the inclusi — Zihang Li, Wenjun Liu, Yikun Zong, Jiawen Tao, Siying Dai, Songcheng Ren, Zirui Liu, Yanbing Jiang, Tong Yang

Compliance-Aware Predictive Process Monitoring: A Neuro-Symbolic Approach
arXiv:2603.26948v1 Announce Type: new Abstract: Existing approaches for predictive process monitoring are sub-symbolic, meaning that they learn correlations between descriptive features and a target feature fully based on data, e.g., predicting the surgical needs of a patient based on historical events and biometrics. However, such approaches fail to incorporate domain-specific process constraints (knowledge), e.g., surgery can only be planned if the patient was released more than a week ago, limiting the adherence to compliance and providing less accurate predictions. In this paper, we presen — Fabrizio De Santis, Gyunam Park, Wil M. P. van der Aalst

Stress Classification from ECG Signals Using Vision Transformer
arXiv:2603.26721v1 Announce Type: cross Abstract: Vision Transformers have shown tremendous success in numerous computer vision applications; however, they have not been exploited for stress assessment using physiological signals such as Electrocardiogram (ECG). In order to get the maximum benefit from the vision transformer for multilevel stress assessment, in this paper, we transform the raw ECG data into 2D spectrograms using short time Fourier transform (STFT). These spectrograms are divided into patches for feeding to the transformer encoder. We also perform experiments with 1D CNN and Re — Zeeshan Ahmad, Naimul Khan
