How to Test MCP Servers

Comprehensive Guide to Testing Model Context Protocol Servers

Prerequisites & Learning Path

This guide assumes you've completed the Quick Startand understand basic MCP testing concepts. If you're new to MCP Aegis, start there first.

Focus here: Advanced testing patterns, YAML and programmatic approaches, production validation strategies, and comprehensive server testing.

MCP Aegis provides comprehensive testing capabilities for Model Context Protocol servers. This guide covers advanced testing strategies including YAML declarative testing, programmatic validation, pattern matching, performance testing, and production-ready validation workflows. All examples reference real tools from the included example servers (filesystem, multi‑tool, stateful session, API testing) and demonstrate patterns used in production MCP deployments.

Architecture Overview

Section 1 of 7: Architecture Overview

Why this matters: A clear mental model of the MCP handshake & tool surface lets you design tests that catch orchestration failures early (before they manifest as opaque agent prompts or silent tool omissions).

MCP in AI Agent Systems

The Model Context Protocol (MCP) standardises JSON‑RPC 2.0 over stdio so agents can safely enumerate & invoke tools. Aegis automates validation of each lifecycle phase and the structural guarantees required for reliable orchestration:

initialize: Client declares intent & capabilities
initialized: Server confirms readiness / negotiated features
tools/list: Enumerate complete, schema‑rich tool inventory
tools/call: Deterministic execution producing human + structured outputs

AI Agent Integration Flow

AI Agent → MCP Client → MCP Server → Tools/Services

Aegis validates startup, handshake, tool discovery, execution results & error semantics end‑to‑end.

Common AI Agent Tools

Data Retrieval: Database queries, API calls, file system access
Content Generation: Text processing, template rendering, document creation
External Services: Email, notifications, third-party API integration
Analysis Tools: Data processing, calculations, validations

Tool Testing Patterns

Section 2 of 7: Tool Testing Patterns

Why this matters: High‑signal tests catch schema drift, brittle naming, or non‑deterministic outputs before agents hallucinate tool capabilities or retry loops degrade performance. (Note: Description length ≥20 chars is a production recommendation—demo tools like read_file are shorter.)

YAML vs Programmatic Approaches

Aspect	YAML Tests	Programmatic Tests
Ideal Use	Declarative request/response validation & pattern matching	Conditional logic, loops, multi-step workflows
Strength	Readable, non-code, rich patterns (50+)	Full JS power, dynamic assertions
Performance Assertions	Built-in `performance.maxResponseTime`	Custom timing logic (e.g. `Date.now()`)
Best For Agents	Protocol conformance & schema coverage	Workflow orchestration & stateful scenarios
Buffer Hygiene	Automatic per test case	`beforeEach(() => client.clearAllBuffers())` required

Mix both styles: YAML for broad coverage + targeted programmatic tests for complex flows.

Minimal Agent Test Template

YAML (Discovery + Call)

yaml

description: "Agent sanity tests"
tests:
  - it: "lists tools"
    request: { jsonrpc: '2.0', id: 't1', method: 'tools/list', params: {} }
    expect:
      response:
        result:
          tools: 'match:not:arrayLength:0'
      stderr: 'toBeEmpty'

  - it: "executes a tool"
    request:
      jsonrpc: '2.0'
      id: 't2'
      method: 'tools/call'
      params:
        name: 'read_file'
        arguments: { path: './data/hello.txt' }
    expect:
      response:
        result:
          content:
            - type: 'text'
              text: 'match:contains:Hello'
          isError: false
      stderr: 'toBeEmpty'

Programmatic (Node test runner)

javascript

import { connect } from 'mcp-aegis';
import assert from 'node:assert/strict';

let client;
before(async () => client = await connect('./aegis.config.json'));
after(async () => client && await client.disconnect());
beforeEach(() => client.clearAllBuffers()); // critical

test('lists tools', async () => {
  const tools = await client.listTools();
  assert.ok(Array.isArray(tools) && tools.length > 0);
});

test('executes tool', async () => {
  const r = await client.callTool('read_file', { path: './data/hello.txt' });
  assert.equal(r.isError, false);
  assert.ok(r.content[0].text.includes('Hello'));
});

Includes buffer hygieneProduction naming conventionsPattern usage

See full real-world examples in examples/filesystem-server – e.g. filesystem-execution-only.test.mcp.yml.

Tool Discovery & Schema Standards

Each production tool SHOULD provide: stable snake_case name, descriptive ≥20 char description (example sandbox tools like read_file are intentionally shorter), and a JSON Schema with type: object, properties (each with type), and (when applicable) a required array. These guarantees let AI agents enumerate capabilities, generate valid arguments and explain failures.

Naming: ^[a-z][a-z0-9_]*$ (no camelCase / spaces)
Description Length: ≥ 20 chars (production target) – shorten only in minimal demo servers
Schema Completeness: All parameters documented under properties
Required Integrity: Every field listed in required exists in properties
Deterministic Output Shape: Consistent content[] object structure (avoid shape drift)

Info

Tip: Enforce these constraints with a single YAML test using match:arrayElements: & regex patterns—then rely on programmatic tests only for advanced conditional logic.

Info

Format Legend: YAML tests excel at broad request/response validation with powerful pattern operators. Programmatic (JavaScript) tests shine for complex branching, loops, performance timing, and multi‑step orchestration.

yaml

description: "AI Agent Tool Discovery"
tests:
  - it: "should discover all agent tools"
    request:
      jsonrpc: "2.0"
      id: "discover"
      method: "tools/list"
      params: {}
    expect:
      response:
        jsonrpc: "2.0"
        id: "discover"
        result:
          tools:
            match:arrayElements:
              name: "match:type:string"
              description: "match:type:string"
              inputSchema:
                type: "object"
                properties: "match:type:object"
                required: "match:type:array"
      stderr: "toBeEmpty"

  - it: "should have well-documented tool descriptions"
    request:
      jsonrpc: "2.0"
      id: "descriptions"
      method: "tools/list"
      params: {}
    expect:
      response:
        result:
          tools:
            match:arrayElements:
              description: "match:regex:.{20,}"  # At least 20 chars
      stderr: "toBeEmpty"

Programmatic Tool Schema Validation

Comprehensive tool validation for AI agent compatibility (run with the Node.js test runner / Jest / Mocha). Prefer connect() helper unless you need delayed start.

Context-Aware Tool Testing

Use real example tools to validate authentic behavior paths and prevent “works in mock, fails in prod” regressions. Provided tool sets include:

read_file (filesystem server)
calculator, text_processor (multi‑tool server)
data_validator, file_manager (multi‑tool server)

The snippet below exercises text_processor (analyze action) validating textual metrics an agent may leverage for follow‑up reasoning.

yaml

- it: "should return analysis metrics for text_processor"
  request:
    jsonrpc: "2.0"
    id: "tp-analyze-1"
    method: "tools/call"
    params:
      name: "text_processor"
      arguments:
        action: "analyze"
        text: "Alpha line\nBeta line"
  expect:
    response:
      result:
        isError: false
        content:
          - type: "text"
            text: "match:contains:Characters:"
    stderr: "toBeEmpty"

Agent Behavior Validation

Section 3 of 7: Agent Behavior Validation

Multi-Step Tool Sequences

Demonstrate orchestration using existing example tools. We combine read_file (filesystem), text_processor (multi‑tool) and calculator (multi‑tool). Adjust sequencing to mirror your production chain.

javascript

test('multi-step workflow with real example tools', async () => {
  // Step 1: Read baseline file (assumes this file is present in working dir)
  const fileResult = await client.callTool('read_file', { path: './README.md' });
  assert.equal(fileResult.isError, false);
  const baseText = fileResult.content[0].text;

  // Step 2: Analyze text
  const analysis = await client.callTool('text_processor', { action: 'analyze', text: baseText.slice(0, 120) });
  assert.equal(analysis.isError, false);

  // Step 3: Derive simple numeric metric with calculator (length * 2)
  const calc = await client.callTool('calculator', { operation: 'multiply', a: baseText.length, b: 2 });
  assert.equal(calc.isError, false);
});

State Management Testing

The repository includes a real stateful example server: examples/stateful-session-server. It exposes a session_store tool supporting init, set, append, get, clear. Below is a focused YAML excerpt (full file: session-state.test.mcp.yml).

yaml

- it: "initializes a session"
    request:
      jsonrpc: "2.0"
      id: "sess-init"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "init"
          session_id: "demo-1"
    expect:
      response:
        result:
          isError: false
          content:
            - type: "text"
              text: "match:contains:initialized"
      stderr: "toBeEmpty"

  - it: "sets and appends values"
    request:
      jsonrpc: "2.0"
      id: "sess-set"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "set"
          session_id: "demo-1"
          key: "notes"
          value: "alpha"
    expect:
      response:
        result:
          isError: false
      stderr: "toBeEmpty"

  - it: "appends value"
    request:
      jsonrpc: "2.0"
      id: "sess-append"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "append"
          session_id: "demo-1"
          key: "notes"
          value: "-beta"
    expect:
      response:
        result:
          isError: false
      stderr: "toBeEmpty"

  - it: "retrieves combined value"
    request:
      jsonrpc: "2.0"
      id: "sess-get"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "get"
          session_id: "demo-1"
          key: "notes"
    expect:
      response:
        result:
          isError: false
          content:
            - type: "text"
              text: "match:contains:alpha-beta"
      stderr: "toBeEmpty"

Note

Programmatic test available: session-store.programmatic.test.js

Error Recovery Testing

Use a real tool path that produces a result.isError: true without being a transport failure. In the multi‑tool server the calculator tool dividing by zero throws an internal error that is surfaced as a logical tool error (caught and wrapped) — perfect for exercising retry / remediation logic.

yaml

- it: "successful calculation"
    request:
      jsonrpc: "2.0"
      id: "calc-ok"
      method: "tools/call"
      params:
        name: "calculator"
        arguments:
          operation: "add"
          a: 2
          b: 3
    expect:
      response:
        result:
          isError: false
          content:
            - type: "text"
              text: "match:contains:Result:"
      stderr: "toBeEmpty"

  - it: "division by zero yields logical tool error"
    request:
      jsonrpc: "2.0"
      id: "calc-err"
      method: "tools/call"
      params:
        name: "calculator"
        arguments:
          operation: "divide"
          a: 10
          b: 0
    expect:
      response:
        result:
          isError: true
          content:
            - type: "text"
              text: "match:contains:Division by zero"
      stderr: "toBeEmpty"

Real-World Examples

Section 4 of 7: Real-World Examples

Real YAML Tests (multi-tool-server) (view repo)

These are actual excerpts from examples/multi-tool-server/multi-tool.test.mcp.yml. They demonstrate tool discovery, success + error handling, regex pattern matching, and multi‑step validation. All tools (calculator, text_processor, data_validator, file_manager) are implemented in the example server.

yaml

description: "Multi-Tool Server (excerpt)"
tests:
  - it: "should list all available tools"
    request:
      jsonrpc: "2.0"
      id: "multi-1"
      method: "tools/list"
      params: {}
    expect:
      response:
        result:
          tools:
            match:arrayElements:
              name: "match:type:string"
              description: "match:type:string"
              inputSchema: "match:type:object"
          # Pattern based assertions let the suite stay stable if ordering changes
  - it: "should perform addition correctly"
    request:
      jsonrpc: "2.0"
      id: "calc-1"
      method: "tools/call"
      params:
        name: "calculator"
        arguments: { operation: "add", a: 15, b: 27 }
    expect:
      response:
        result:
          content:
            - type: "text"
              text: "Result: 42"
          isError: false
  - it: "should handle division by zero error"
    request:
      jsonrpc: "2.0"
      id: "calc-3"
      method: "tools/call"
      params:
        name: "calculator"
        arguments: { operation: "divide", a: 10, b: 0 }
    expect:
      response:
        result:
          isError: true
          content:
            - type: "text"
              text: "Division by zero"
  - it: "should validate correct email address"
    request:
      jsonrpc: "2.0"
      id: "valid-1"
      method: "tools/call"
      params:
        name: "data_validator"
        arguments: { type: "email", data: "[email protected]" }
    expect:
      response:
        result:
          content:
            - type: "text"
              text: "match:Valid email.*VALID"
          isError: false
  - it: "should list directory contents"
    request:
      jsonrpc: "2.0"
      id: "file-3"
      method: "tools/call"
      params:
        name: "file_manager"
        arguments: { action: "list", path: "../shared-test-data" }
    expect:
      response:
        result:
          content:
            - type: "text"
              text: "match:Files: .*hello\.txt.*"
          isError: false

Stateful Session Example (stateful-session-server)

The stateful-session-server demonstrates maintaining context across calls. Below excerpt shows creating and then retrieving session state. Use this pattern when validating agent memory or multi‑turn tool flows.

yaml

description: "Stateful session excerpt"
tests:
  - it: "initializes a session"
    request:
      jsonrpc: "2.0"
      id: "sess-init"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "init"
          session_id: "demo-1"
    expect:
      response:
        result:
          isError: false
          session_id: "demo-1"
          content:
            - type: "text"
              text: "match:contains:initialized"
  - it: "sets a value"
    request:
      jsonrpc: "2.0"
      id: "sess-set"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "set"
          session_id: "demo-1"
          key: "notes"
          value: "alpha"
    expect:
      response:
        result:
          isError: false
          session_id: "demo-1"
          content:
            - type: "text"
              text: "match:contains:Set"
  - it: "retrieves existing value"
    request:
      jsonrpc: "2.0"
      id: "sess-get"
      method: "tools/call"
      params:
        name: "session_store"
        arguments:
          action: "get"
          session_id: "demo-1"
          key: "notes"
    expect:
      response:
        result:
          isError: false
          session_id: "demo-1"
          content:
            - type: "text"
              text: "alpha"

Performance & Resource Testing

Section 5 of 7: Performance Testing

Why this matters: Latency & memory regressions silently degrade agent reasoning quality (timeouts, truncated context, tool avoidance). Early detection prevents brittle compensating prompt logic.

Response Time Testing (filesystem-server)

Ensure tools meet AI agent response time requirements. Use coarse time assertions to prevent flakiness—only enforce strict budgets for latency‑sensitive operations.

yaml

description: "Performance - Response Time"
tests:
  - it: "file read responds within 2s"
    request:
      jsonrpc: "2.0"
      id: "perf-read"
      method: "tools/call"
      params:
        name: "read_file"
        arguments:
          path: "./README.md"
    expect:
      performance:
        maxResponseTime: "2000ms"
      response:
        result:
          isError: false
      stderr: "toBeEmpty"

  - it: "text analysis completes under 3s"
    request:
      jsonrpc: "2.0"
      id: "perf-text"
      method: "tools/call"
      params:
        name: "text_processor"
        arguments:
          action: "analyze"
          text: "Short performance sample"
    expect:
      performance:
        maxResponseTime: "3000ms"
      response:
        result:
          isError: false
      stderr: "toBeEmpty"

Memory and Resource Testing (multi-tool-server)

Validate efficient resource usage for long-running AI agent sessions. Consider adding a control (baseline) measurement for comparison.

javascript

// Real memory efficiency test (shipping in examples/multi-tool-server)
// Notes:
//  * Uses lightweight 'calculator' tool for repeatable calls
//  * Periodically clears stderr to avoid buffer accumulation
//  * Optional GC hints if Node started with --expose-gc
//  * Adjust ITERATIONS / LIMIT_MB via env for CI tuning
import assert from 'node:assert/strict';

const ITERATIONS = parseInt(process.env.MEM_TEST_ITER || '120', 10);
const LIMIT_MB = parseInt(process.env.MEM_TEST_LIMIT_MB || '50', 10);

test('should manage resources efficiently for AI agents', async () => {
  if ((globalThis).gc) { (globalThis).gc(); } // pre-sample GC if available
  const memBefore = process.memoryUsage();
  for (let i = 0; i < ITERATIONS; i++) {
    const res = await client.callTool('calculator', { operation: 'add', a: i, b: i + 1 });
    assert.equal(res.isError, false);
    if (i % 10 === 0) {
      client.clearStderr();
      await new Promise(r => setTimeout(r, 0)); // yield for GC / event loop
    }
  }
  if ((globalThis).gc) { (globalThis).gc(); } // post-loop GC if available
  const memAfter = process.memoryUsage();
  const heapGrowthBytes = memAfter.heapUsed - memBefore.heapUsed;
  const heapGrowthMB = heapGrowthBytes / (1024 * 1024);
  assert.ok(heapGrowthMB < LIMIT_MB, `Memory growth should be under ${LIMIT_MB}MB (actual ${heapGrowthMB.toFixed(2)}MB)`);
});

Best Practices

Section 6 of 7: Best Practices

Why this matters: Strong conventions shrink prompt surface area, reduce retry loops and increase agent planning confidence.

✅ Agent-Friendly Tool Design

Problem: Generic naming forces LLM guesswork. Guidance: Express domain and operation explicitly.

yaml

# ✅ Good - Clear, specific tool names
tools:
  - name: "search_customer_data"
    description: "Search customer database with filters and pagination"
  - name: "generate_report"
    description: "Generate formatted reports from data sources"

# ❌ Bad - Vague, generic names  
tools:
  - name: "search"
    description: "Search stuff"
  - name: "process"
    description: "Process data"

✅ Comprehensive Error Information

Problem: Opaque errors trigger wasteful re‑planning. Guidance: Provide structured remediation hints.

javascript

// Real error handling examples using existing multi-tool server
import assert from 'node:assert/strict';

// 1. Logical validation failure (email format) returns isError:false but semantic INVALID marker in text
//    Pattern: agent can parse 'INVALID' substring to branch remediation.
 test('invalid email returns semantic failure marker', async () => {
  const result = await client.callTool('data_validator', { type: 'email', data: 'not-an-email' });
  assert.equal(result.isError, false); // Validation tool encodes failure in content, not isError
  const txt = result.content[0].text;
  assert.match(txt, /Invalid email/i);
  assert.match(txt, /INVALID/);
});

// 2. Unsupported calculator operation triggers hard error (isError:true) with explanatory text
 test('unsupported calculator operation surfaces hard error', async () => {
  const result = await client.callTool('calculator', { operation: 'power', a: 2, b: 3 });
  assert.equal(result.isError, true);
  assert.match(result.content[0].text, /Unsupported operation: power/);
});

// 3. Unknown tool demonstrates top-level routing error
 test('unknown tool produces Unknown tool message', async () => {
  const result = await client.callTool('totally_missing_tool', {});
  assert.equal(result.isError, true);
  assert.match(result.content[0].text, /Unknown tool/);
});

✅ Structured Output for AI Processing

Problem: Free‑form text requires extra parsing. Guidance: Pair human text with machine‑friendly structured_data. The text_processor tool in the multi-tool-server now emits actionable metrics (chars, words, lines, lengths) alongside the human readable string.

yaml

- it: "should analyze text with structured metrics"
  request:
    jsonrpc: "2.0"
    id: "text-analyze"
    method: "tools/call"
    params:
      name: "text_processor"
      arguments: { action: "analyze", text: "Hello MCP Aegis" }
  expect:
    response:
      result:
        match:partial:
          content:
            - type: "text"
              text: "match:Characters: \d+, Words: \d+, Lines: 1"
          structured_data:
            action: "analyze"
            chars: "match:type:number"
            words: "match:type:number"
            lines: "match:type:number"
          isError: false"

✅ Context Preservation

Problem: Lost conversational state increases token spend. Guidance: Persist session + lightweight preference objects.

javascript

// Rewritten using real stateful tool: session_store
// Demonstrates preserving conversational context via explicit session state.
import assert from 'node:assert/strict';

test('should preserve context across tool calls (session_store)', async () => {
  // Initialize session
  const init = await client.callTool('session_store', { action: 'init', session_id: 'ctx-1' });
  assert.equal(init.isError, false);

  // Store preference keys (simulating context)
  await client.callTool('session_store', { action: 'set', session_id: 'ctx-1', key: 'preferences', value: 'format=detailed;lang=en' });

  // Append incremental conversational artifact
  await client.callTool('session_store', { action: 'append', session_id: 'ctx-1', key: 'history', value: 'User asked about charts.' });
  await client.callTool('session_store', { action: 'append', session_id: 'ctx-1', key: 'history', value: 'Requested drill-down.' });

  // Retrieve combined context
  const history = await client.callTool('session_store', { action: 'get', session_id: 'ctx-1', key: 'history' });
  assert.equal(history.isError, false);
  const text = history.content[0].text;
  assert.ok(text.includes('charts') && text.includes('drill-down'));
});

✅ AI Agent Compatibility Testing

Problem: Cross‑test buffer leakage causes nondeterministic flakes. Guidance: Enforce hygiene + validate multi‑platform suitability.

Critical Buffer Hygiene

Always clear buffers between tests to avoid flaky cross‑pollination of stderr or stdout partial frames:

javascript

before(async () => client = await connect('./aegis.config.json'));
after(async () => client && await client.disconnect());
beforeEach(() => client.clearAllBuffers());

Missing this step is the most common source of nondeterministic failures (mismatched ids, unexpected stderr assertions).

javascript

import assert from 'node:assert/strict';

describe('AI Agent Compatibility (real tools)', () => {
  test('text analysis provides machine-usable metrics', async () => {
    const result = await client.callTool('text_processor', { action: 'analyze', text: 'Line one.
Line two.' });
    assert.equal(result.isError, false);
    const text = result.content[0].text;
    assert.ok(/Characters:/i.test(text) && /Words:/i.test(text));
  });

  test('session-based contextual accumulation', async () => {
    await client.callTool('session_store', { action: 'init', session_id: 'compat-1' });
    await client.callTool('session_store', { action: 'append', session_id: 'compat-1', key: 'history', value: 'Discussed Q3 report.' });
    await client.callTool('session_store', { action: 'append', session_id: 'compat-1', key: 'history', value: ' Focus on revenue.' });
    const combined = await client.callTool('session_store', { action: 'get', session_id: 'compat-1', key: 'history' });
    assert.equal(combined.isError, false);
    assert.ok(combined.content[0].text.includes('Q3') && combined.content[0].text.includes('revenue'));
  });
});

MCP Server Testing Checklist

Section 7 of 7: Testing Checklist

Tool Discovery: All tools discoverable with snake_case names & (production) ≥20 char descriptions (demo tools may be shorter)
Schema Validation: Input schemas are complete and well-documented
Handshake: Successful initialize + initialized sequence prior to tool usage
Response Times: Tools respond within 2–5 seconds (or documented SLA) (See Performance Testing)
Error Handling: Errors provide actionable information for agents
Context Management: Tools maintain state across conversations
Structured Output: Responses include both human-readable and structured data
Concurrent Usage: Tools handle multiple agent requests simultaneously
Memory Efficiency: Resource usage remains stable during long sessions
Agent Compatibility: Works with major AI platforms (Claude, GPT, etc.)
Buffer Hygiene: Buffers cleared between tests (clearAllBuffers())

Common Debugging Scenarios

Real-world problems and their solutions when testing MCP servers:

🚨 Scenario: Agent Can't Discover Tools

Symptoms: Agent says "no tools available" or tries to call non-existent tools

Root Causes:

MCP handshake failed silently
tools/list returns empty array
Server crashes after handshake but before tool discovery

Debugging Commands:

bash

# Test handshake + tool discovery manually
aegis query --config config.json --debug
# Should show: handshake → tools/list → tool array

# Check for server crashes
aegis query --config config.json --verbose
# Look for process exit codes or stderr output

⚠️ Scenario: Tool Calls Return Empty Results

Symptoms: Tools execute but return empty or malformed content

Root Causes:

Tool logic errors not caught in basic tests
Invalid argument mapping from agent requests
Async operations not properly awaited

Debugging Test:

yaml

- it: "debug tool output structure"
  request:
    method: "tools/call"
    params: { name: "your_tool", arguments: { test_input: "debug" } }
  expect:
    response:
      result:
        content: "match:not:arrayLength:0"  # Not empty
        isError: false
  stderr: "match:not:contains:error"

🔄 Scenario: Flaky Tests in CI/CD

Symptoms: Tests pass locally but fail in CI, or pass/fail randomly

Root Causes:

Buffer bleeding between test cases
Race conditions in server startup
Environment differences (file paths, permissions)

Solution Pattern:

javascript

// In programmatic tests, always clear buffers
beforeEach(() => {
  client.clearAllBuffers(); // CRITICAL for stability
});

// For YAML tests, use unique IDs and proper timeouts
tests:
  - it: "isolated test with unique ID"
    request:
      id: "unique-test-1-{{timestamp}}"  # Prevent ID conflicts

📊 Scenario: Performance Degradation Over Time

Symptoms: First few tool calls fast, then progressively slower

Root Causes:

Memory leaks in server implementation
Unclosed resources (files, connections)
Event listener accumulation

Performance Test Pattern:

javascript

test('performance stability over multiple calls', async () => {
  const times = [];
  
  for (let i = 0; i < 10; i++) {
    const start = Date.now();
    await client.callTool('your_tool', { iteration: i });
    times.push(Date.now() - start);
  }
  
  // Response time should not degrade significantly
  const avg = times.reduce((a, b) => a + b) / times.length;
  const maxTime = Math.max(...times);
  assert.ok(maxTime < avg * 2, 'Performance degraded significantly');
});

Production-Ready MCP Server Testing

All testing patterns and examples have been validated with real integrations (component libraries, knowledge bases, data enrichment services, AI agents) and the example MCP servers included in this repository. These patterns ensure reliability with Claude, GPT and future MCP‑compatible platforms.

What's Next?

Now that you understand AI agent testing patterns, here are recommended next steps to deepen your MCP testing expertise:

🎯 Expand Your Testing Skills

Advanced Pattern Matching
Master all 50+ pattern types for comprehensive validation
Performance Testing
Load testing, latency monitoring, and resource validation
Troubleshooting Guide
Debug complex testing scenarios and solve common issues

🚀 Production Deployment

Real-World Examples
Study complete example servers and their test suites
Error Reporting
CI/CD integration and automated test reporting
API Reference
Complete programmatic API documentation

Ready for Production?

You now have the knowledge to build robust test suites for AI agent MCP servers. The patterns and examples shown here are production-tested with real AI agent integrations.

Explore Complete Examples →Troubleshooting Guide

Contributing & Community

Found a testing pattern that could help others? Consider contributing to the MCP Aegis project:

GitHub Repository - Submit issues, PRs, and example servers
Development Guide - Contribute to MCP Aegis development
AI Agent Support - Integration guides for specific AI platforms

MCP

MCP

How to Test MCP Servers

Prerequisites & Learning Path

Architecture Overview

MCP in AI Agent Systems

AI Agent Integration Flow

Common AI Agent Tools

Tool Testing Patterns

YAML vs Programmatic Approaches

YAML (Discovery + Call)

Programmatic (Node test runner)

Tool Discovery & Schema Standards

Info

Info

Programmatic Tool Schema Validation

Context-Aware Tool Testing

Agent Behavior Validation

Multi-Step Tool Sequences

State Management Testing

Note

Error Recovery Testing

Real-World Examples

Real YAML Tests (multi-tool-server) (view repo)

Stateful Session Example (stateful-session-server)

Performance & Resource Testing

Response Time Testing (filesystem-server)

Memory and Resource Testing (multi-tool-server)

Best Practices

✅ Agent-Friendly Tool Design

✅ Comprehensive Error Information

✅ Structured Output for AI Processing

✅ Context Preservation

✅ AI Agent Compatibility Testing

Critical Buffer Hygiene

MCP Server Testing Checklist

Common Debugging Scenarios

Production-Ready MCP Server Testing

Related Documentation

What's Next?

🎯 Expand Your Testing Skills

🚀 Production Deployment

Ready for Production?

Contributing & Community