AI Agent Evaluation Runner 🕵🏻‍♂️

Evaluate an AI agent on a subset of validation questions from the General AI Assistants (GAIA) Benchmark.

Note: This space run on minimal setup and takes time to answer the questions, the agent will report only the final answer.

AI Assistant