MMFormalizer: Multimodal Autoformalization in the Wild

*Equal contribution.
1The University of Hong Kong, 2University of Michigan, Ann Arbor, 3University of Edinburgh, 4University of California, Santa Barbara, 5Ohio State University, 6University of California, Los Angeles,

Case

The MMFormalizer Pipeline

MMFormalizer follows a recursive decomposition-and-grounding strategy. It systematically breaks down complex multimodal physics problems into formalizable components and resolves logical dependencies through a tree-based search.

Abstract

Autoformalization faces fundamental challenges "in the wild" due to the multimodal nature of the physical world. We propose MMFORMALIZER, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. Our framework recursively constructs formal propositions in LEAN from perceptually grounded primitives. To evaluate this, we introduce PHYX-AF, a new benchmark covering classical mechanics, relativity, quantum mechanics, and thermodynamics. Results show that frontier models like GPT-5 and Gemini-3-Pro achieve the highest accuracy, while geometry remains the most challenging domain.

Recursive Grounding Dynamics

MMFormalizer Process

······

Benchmark Analysis

PhyX-AF Statistics

Our benchmark covers 4 major physical domains and 12 sub-categories. The dataset is curated to ensure high-quality multimodal alignment between visual diagrams and formal Lean code.

Experimental Evaluation

Multi-Dimensional Performance

Performance Radar Chart

BibTeX

@article{xiong2026mmformalizer,
  title={MMFormalizer: Multimodal Autoformalization in the Wild},
  author={Xiong, Jing and Han, Qi and Hsieh, Yunta and Shen, Hui and Xin, Huajian and Tao, Chaofan and Zhao, Chenyang and Zhang, Hengyuan and Wu, Taiqiang and Zhang, Zhen and others},
  journal={arXiv preprint},
  year={2026}
}

This website template is adapted from Nerfies and PhyX.