Skip to main content
The SDK provides APIs to save eval results to MCPJam for visualization in the CI Evals dashboard. Results can be saved automatically via EvalTest/EvalSuite, or manually using the APIs below.

Environment Variables

VariableRequiredDefaultDescription
MCPJAM_API_KEYYes-Your MCPJam workspace API key
MCPJAM_BASE_URLNohttps://sdk.mcpjam.comMCPJam API base URL override
Use MCPJAM_BASE_URL only when you need to override the default ingest host, such as internal development against a non-production backend. MCPJAM_API_KEY controls whether results are uploaded. Replay credential capture only happens when you provide serverReplayConfigs, agent, or mcpClientManager. MCP App widget snapshots in each result’s trace (for iframe replay in MCPJam) come from PromptResult.getWidgetSnapshots(), which is only populated when TestAgent was constructed with mcpClientManager.

Tool execution and passed

Uploaded iterations have a single boolean passed that drives pass rate on the dashboard. The SDK distinguishes:
  1. Structural pass — Your test returned true, or expected tool calls were satisfied (Inspector/UI flows use shared matching logic).
  2. Tool execution — The trace shows a real tool failure: MCP results with isError: true, timeline spans where a tool step ended in error, tool-result parts the UI would treat as errors, or a runner-level iterationError after a thrown tool step.
Default behavior: When the SDK derives passed from a trace (for example EvalTest / EvalSuite auto-save, PromptResult.toEvalResult(), createEvalRunReporter helpers, or Inspector suite runs), structural success is not enough if tool execution failed — the iteration is recorded as failed unless you opt out. Opt out: set failOnToolError: false on MCPJamReportingConfig (global for that reporter or auto-save), or pass it on specific helper options (addFromPrompt / recordFromRun / RunToEvalResultsOptions, etc.). Use this when you only care that the model invoked the right tools with the right arguments, not that every MCP call returned a success payload. Manual reportEvalResults: If you build results[] yourself and set passed explicitly, MCPJam stores your values as-is. The execution gate applies to code paths that compute passed from prompts, traces, and iterations. Programmatic reuse: @mcpjam/sdk also exports finalizePassedForEval, traceIndicatesToolExecutionFailure, isCallToolResultError, and traceMessagePartIndicatesToolFailure for custom ingestion pipelines.

reportEvalResults()

One-shot reporting. Sends all results in a single call. Throws on failure.
import { MCPClientManager, reportEvalResults } from "@mcpjam/sdk";

Signature

reportEvalResults(input: ReportEvalResultsInput): Promise<ReportEvalResultsOutput>

Example

const manager = new MCPClientManager({
  asana: {
    url: process.env.MCP_SERVER_URL!,
    refreshToken: process.env.MCP_REFRESH_TOKEN!,
    clientId: process.env.MCP_CLIENT_ID!,
    clientSecret: process.env.MCP_CLIENT_SECRET,
  },
});

await manager.connectToServer("asana");

const output = await reportEvalResults({
  suiteName: "Nightly",
  mcpClientManager: manager,
  results: [
    { caseTitle: "healthcheck", passed: true },
    { caseTitle: "tool-selection", passed: true, durationMs: 1200 },
    { caseTitle: "edge-case", passed: false, error: "Wrong tool called" },
  ],
  passCriteria: { minimumPassRate: 90 },
  ci: {
    branch: "main",
    commitSha: "abc123",
  },
});

console.log(`Run ${output.runId}: ${output.result}`);
// "Run abc123: passed"
console.log(`${output.summary.passed}/${output.summary.total} passed`);

reportEvalResultsSafely()

Same as reportEvalResults(), but returns null instead of throwing on failure. Warnings are logged to the console.
import { MCPClientManager, reportEvalResultsSafely } from "@mcpjam/sdk";

Signature

reportEvalResultsSafely(input: ReportEvalResultsInput): Promise<ReportEvalResultsOutput | null>

Example

const manager = new MCPClientManager({
  asana: {
    url: process.env.MCP_SERVER_URL!,
    refreshToken: process.env.MCP_REFRESH_TOKEN!,
    clientId: process.env.MCP_CLIENT_ID!,
  },
});

await manager.connectToServer("asana");

const output = await reportEvalResultsSafely({
  suiteName: "Nightly",
  mcpClientManager: manager,
  results: [{ caseTitle: "healthcheck", passed: true }],
});

if (output) {
  console.log(`Reported: ${output.summary.passRate * 100}% pass rate`);
} else {
  console.log("Reporting failed (non-blocking)");
}
Use reportEvalResultsSafely() when you don’t want eval reporting failures to break your CI pipeline. Use reportEvalResults() (strict) when reporting is critical.

createEvalRunReporter()

Creates an incremental reporter for long-running processes. Results are buffered and flushed in batches (up to 200 results or 1MB per batch).
import { createEvalRunReporter } from "@mcpjam/sdk";

Signature

createEvalRunReporter(input: CreateEvalRunReporterInput): EvalRunReporter
CreateEvalRunReporterInput accepts the same replay source fields as reportEvalResults(): serverReplayConfigs, agent, and mcpClientManager.

EvalRunReporter Methods

MethodDescription
add(result)Buffer a result (no network call)
record(result)Buffer a result and auto-flush when buffer is large
flush()Upload all buffered results
finalize()Flush remaining results and finalize the run
getBufferedCount()Number of results in the buffer
getAddedCount()Total results added (including flushed)
setExpectedIterations(count)Set expected iteration count for progress tracking

PromptResult Helpers

MethodDescription
addFromPrompt(promptResult, overrides?)Convert a PromptResult and buffer it
recordFromPrompt(promptResult, overrides?)Convert a PromptResult, buffer it, and auto-flush

EvalTest/EvalSuite Run Helpers

MethodDescription
addFromRun(run, options)Convert all iterations from an EvalTest run
recordFromRun(run, options)Convert and auto-flush from an EvalTest run
addFromSuiteRun(suiteRun, options)Convert all iterations from an EvalSuite run
recordFromSuiteRun(suiteRun, options)Convert and auto-flush from an EvalSuite run

Example

// Assumes `agent` was created with `mcpClientManager: manager`
const reporter = createEvalRunReporter({
  suiteName: "Integration Tests",
  passCriteria: { minimumPassRate: 85 },
  agent,
  ci: {
    branch: process.env.GITHUB_REF_NAME,
    commitSha: process.env.GITHUB_SHA,
  },
});

Example with manual replay source resolution

const reporter = createEvalRunReporter({
  suiteName: "Integration Tests",
  passCriteria: { minimumPassRate: 85 },
  mcpClientManager: manager,
});

// Add results as tests complete
await reporter.record({ caseTitle: "test-1", passed: true, durationMs: 500 });
await reporter.record({ caseTitle: "test-2", passed: false, error: "timeout" });
await reporter.record({ caseTitle: "test-3", passed: true });

// Finalize the run
const output = await reporter.finalize();
console.log(`${output.summary.passed}/${output.summary.total} passed`);

Replay credential sources

Authenticated HTTP evals can securely persist replay credentials for reruns and debugging. Manual reporting APIs resolve replay configs in this order:
  1. serverReplayConfigs
  2. agent.getServerReplayConfigs()
  3. mcpClientManager.getServerReplayConfigs()
Prefer passing agent or mcpClientManager directly. Use serverReplayConfigs only when you need a low-level override. If you also provide serverNames, inferred replay configs are filtered to those server IDs before upload. Explicit serverReplayConfigs are left unchanged.

Replay metadata for the MCPJam UI

Uploaded runs can show Replay this run / server-side MCP replay when the ingest payload includes derived serverReplayConfigs (stored as hasServerReplayConfig on the run). In practice:
  • HTTP MCP (url) — Replay configs are built for typical streamable HTTP connections. Stdio transports do not produce entries from MCPClientManager.getServerReplayConfigs(); use HTTP when you need dashboard replay.
  • TestAgent vs reporter — Putting mcpClientManager on TestAgent fills MCP App widget snapshots on PromptResult. The reporter (and one-shot report*) resolves server replay from its own agent / mcpClientManager fields. Pass agent or mcpClientManager into createEvalRunReporter as well; agent-only wiring can still upload traces and widgets but omit hasServerReplayConfig.
  • Teardown orderfinalize() / one-shot reporting calls getServerReplayConfigs() against connected registrations. In afterAll, run await reporter.finalize() (or reportEvalResults) before await manager.disconnectAllServers(). Disconnecting first clears manager state and uploads without replay metadata.
LLM API keys are not stored on the run. Replaying in the MCPJam UI still requires provider keys (e.g. OpenRouter) in Settings for your suite’s models.

Using with PromptResult

// Pass `agent` or `mcpClientManager` when you need server replay metadata in MCPJam
const reporter = createEvalRunReporter({ suiteName: "Prompt Tests", agent });

const result = await agent.prompt("Add 2 and 3");
reporter.addFromPrompt(result, {
  caseTitle: "addition",
  passed: result.hasToolCall("add"),
});

const output = await reporter.finalize();

Using with EvalTest Runs

const reporter = createEvalRunReporter({ suiteName: "Full Suite" });

const test = new EvalTest({
  name: "addition",
  test: async (agent) => (await agent.prompt("Add 2+3")).hasToolCall("add"),
});

const run = await test.run(agent, { iterations: 10 });
await reporter.recordFromRun(run, { casePrefix: "addition" });

const output = await reporter.finalize();

uploadEvalArtifact()

Parses test artifacts (JUnit XML, Jest JSON, Vitest JSON) and reports the results to MCPJam.
import { uploadEvalArtifact } from "@mcpjam/sdk";

Signature

uploadEvalArtifact(input: UploadEvalArtifactInput): Promise<ReportEvalResultsOutput>

Supported Formats

FormatDescription
"junit-xml"JUnit XML test reports
"jest-json"Jest JSON output (--json flag)
"vitest-json"Vitest JSON reporter output
"custom"Custom parser via customParser option

Example

import { readFileSync } from "fs";

// Upload JUnit XML
await uploadEvalArtifact({
  suiteName: "CI Results",
  format: "junit-xml",
  artifact: readFileSync("test-results.xml", "utf-8"),
});

// Upload Jest JSON
await uploadEvalArtifact({
  suiteName: "Jest Results",
  format: "jest-json",
  artifact: readFileSync("jest-results.json", "utf-8"),
});

// Custom parser
await uploadEvalArtifact({
  suiteName: "Custom",
  format: "custom",
  artifact: myData,
  customParser: (data) => [
    { caseTitle: "test-1", passed: true },
    { caseTitle: "test-2", passed: false, error: "failed" },
  ],
});

Types

ReportEvalResultsInput

type ReportEvalResultsInput = MCPJamReportingConfig & {
  suiteName: string;
  results: EvalResultInput[];
  agent?: {
    getServerReplayConfigs?: () => MCPServerReplayConfig[] | undefined;
  };
  mcpClientManager?: MCPClientManager;
};

MCPJamReportingConfig

PropertyTypeRequiredDescription
enabledbooleanNoEnable/disable reporting (default: true)
apiKeystringNoMCPJam API key (falls back to MCPJAM_API_KEY env var)
baseUrlstringNoMCPJam API base URL override (useful for internal development or tests)
suiteNamestringNoSuite name for the run
suiteDescriptionstringNoDescription of the suite
serverNamesstring[]NoMCP server names being tested
serverReplayConfigsMCPServerReplayConfig[]NoAdvanced override for replay credential capture
notesstringNoFree-form notes
passCriteria{ minimumPassRate: number }NoPass threshold (0-100)
failOnToolErrorbooleanNoWhen not false, results derived from traces treat tool execution failures as failed iterations (default: strict). See Tool execution and passed
strictbooleanNoThrow on upload errors (false = warn only)
externalRunIdstringNoCustom run ID (auto-generated if omitted)
frameworkstringNoTest framework name (e.g., "jest", "vitest")
ciEvalCiMetadataNoCI/CD pipeline context
expectedIterationsnumberNoExpected total iterations for progress tracking
agent{ getServerReplayConfigs?: () => MCPServerReplayConfig[] | undefined }NoPreferred replay source for manual reporting; use an agent created with mcpClientManager so results can include widgetSnapshots from PromptResult.toEvalResult()
mcpClientManagerMCPClientManagerNoReplay source when no agent is provided; does not populate widgetSnapshots unless your results[].widgetSnapshots or trace payloads already include them

MCPServerReplayConfig

Advanced replay override. Most users should not construct this manually.
PropertyTypeRequiredDescription
serverIdstringYesMCP server identifier
urlstringYesMCP server URL
preferSSEbooleanNoPrefer SSE transport for replay
accessTokenstringNoStatic bearer token for replay
refreshTokenstringNoRefresh token for replay
clientIdstringNoOAuth client ID, required with refreshToken
clientSecretstringNoOAuth client secret when needed for token refresh

EvalCiMetadata

PropertyTypeDescription
providerstringCI provider (e.g., "github", "gitlab")
pipelineIdstringPipeline/workflow identifier
jobIdstringJob identifier
runUrlstringURL to the CI run
branchstringGit branch name
commitShastringGit commit SHA

EvalResultInput

PropertyTypeRequiredDescription
caseTitlestringYesTest case title
passedbooleanYesWhether the test passed
querystringNoThe prompt/query sent
durationMsnumberNoTest duration in ms
providerstringNoLLM provider name
modelstringNoModel identifier
expectedToolCallsEvalExpectedToolCall[]NoExpected tool calls
actualToolCallsEvalExpectedToolCall[]NoActual tool calls made
tokens{ input?, output?, total? }NoToken usage
errorstringNoError message
errorDetailsstringNoDetailed error info
traceEvalTraceInputNoConversation trace
externalIterationIdstringNoCustom iteration ID
externalCaseIdstringNoCustom case ID
metadataRecord<string, string | number | boolean>NoCustom metadata
isNegativeTestbooleanNoWhether this is a negative test
widgetSnapshotsEvalWidgetSnapshotInput[]NoMCP App HTML replay payloads (typically from getWidgetSnapshots() on PromptResult). Omitted when not using MCP Apps or when TestAgent had no mcpClientManager

ReportEvalResultsOutput

PropertyTypeDescription
suiteIdstringCreated/matched suite ID
runIdstringCreated run ID
status"completed" | "failed"Run status
result"passed" | "failed"Pass/fail based on criteria
summary.totalnumberTotal iterations
summary.passednumberPassed iterations
summary.failednumberFailed iterations
summary.passRatenumberPass rate (0.0 - 1.0)