SkillTester: Benchmarking Utility and Security of Agent Skills

Published in arXiv:2603.28815, 2026

SkillTester is a benchmark system for evaluating the utility and security of agent skills. It combines paired baseline and with_skill execution with a separate security probe suite, then normalizes raw execution artifacts into interpretable benchmark outcomes.

The public service is available at skilltester.ai, and the full benchmark pipeline is maintained in the project repository.

SkillTester Dashboard

Recommended citation: Leye Wang, Zixing Wang, and Anjie Xu. (2026). "SkillTester: Benchmarking Utility and Security of Agent Skills." arXiv:2603.28815.
Download Paper