MALDOC: A Modular Red-Teaming Platform for Document Processing AI Agents

MALDOC evaluates document-processing agents against document-layer attacks by combining multi-view extraction, risk-aware planning, controlled injection, and agentic propagation analysis.

Ashish Raj Shekhar* · Priyanuj Bordoloi* · Shiven Agarwal* · Yash Shah · Sandipan De · Vivek Gupta

Arizona State University · *Equal contribution

Preprint · 2026

MALDOC system overview
Figure 1: System overview of MALDOC, covering multi-view extraction, risk-aware planning, injection mechanisms, and agentic propagation scoring.

Document-Layer Red-Teaming

Document-processing AI agents can be compromised by attacks that exploit discrepancies between rendered PDF content and machine-readable representations. MALDOC provides a modular pipeline to generate document-layer adversarial PDFs and measure downstream failures in agent workflows.

Key result: under the default planner, MALDOC achieves 86% ASR, with task degradation accounting for 72% of successes while preserving human-visible fidelity.

Multi-Stage Attack Generation

The system is organized into four attack stages plus propagation evaluation. Each stage emits structured artifacts for reproducibility and ablation studies.

MALDOC pipeline overview
Top-half view of the MALDOC pipeline: extraction, planning, and injection.

Evaluation Setup

MALDOC evaluates on the Document Understanding Dataset (DuDE) with finance, healthcare, and education documents. Agents are implemented using LangGraph to simulate realistic workflows.

  • 6 domain-specific agents with 10 functional tools.
  • Task types: QA, key-field extraction, and summarization.
  • Attacks tested under HTI, FGR, and VOI mechanisms.
Agentic propagation scoring overview
Bottom-half view highlighting domain-specific agents and propagation metrics.

End-to-End Walkthrough

The interactive demo allows users to upload PDFs, configure attack settings, and compare clean versus adversarial execution traces. This supports reproducible evaluation across models.

Demo: creation view Demo: stage timeline view Demo: evaluation setup Demo: propagation metrics
Figure 2: End-to-end demo walkthrough (cropped into four panels).

Propagation Metrics & Stealth

Attack success is defined by any propagation signal relative to clean baselines. MALDOC reports an aggregated 86% attack success rate, with task degradation accounting for 72% of successes. ASR decomposes into QA-only (21.5%), workflow-only (41.0%), and QA+workflow (33.5%). Workflow deviations dominate: 74.5% of successful attacks involve Tool Misfire or state drift.

Stealth is enforced with SSIM-based visual invariance (SSIM = 1.0). In a human spot-check of 30 document pairs, annotators reported no visible differences in 97% of cases.

Quantitative Highlights

Use the grid below for key tables or plots from the paper, such as mechanism-specific QA-F1 degradation, Tool Misfire rates, and detection performance.

Table 1: Semantic edit strategy to injection channel mapping Table 2: Original-document (O-DOC) performance Table 3: Adversarial-document (M-DOC) performance Table 4: Planner-agnostic ASR Table 5: Stealthiness evaluation
Table 1: semantic edit strategies to injection channels. Table 2: O-DOC performance. Table 3: M-DOC performance. Table 4: planner-agnostic ASR. Table 5: stealthiness evaluation.

BibTeX

@misc{maldoc2026,
  title   = {MALDOC: A Modular Red-Teaming Platform for Document Processing AI Agents},
  author  = {Shekhar, Ashish Raj and Bordoloi, Priyanuj and Agarwal, Shiven and Shah, Yash and De, Sandipan and Gupta, Vivek},
  year    = {2026},
  note    = {Preprint},
  url     = {https://github.com/shekharashishraj/MalDoc}
}