Tool
OpenAI Evals
Framework for evaluating LLMs and AI systems with standardized benchmarks and custom test suites.
evaluationbenchmarkstestingllm
More resources
Honcho
Memory library for building stateful AI agents. Continual learning and persistent memory management.
OpenViking
Context database for AI agents. File system paradigm for agent memory with self-evolving capabilities.
Supermemory
Memory engine and API for AI agents. Fast, scalable memory-as-a-service for persistent agent context.