Atlas Graph Explorer

Wiki Graph Edges

Clusters

catalog-meta1769

CiSurface1
Claim825
ClaimTest5
ClaimTestRun5
DeploymentTarget2
EvidencePolicy2
EvidenceSource227
Gap4
Grammar1
MetaAttribute366
MetaCluster16
MetaEdgeKind84
MetaEnum34
MetaNodeKind114
OntologySchema1
OpenQuestion5
OutOfScopeReason4
PackageSurface23
PathDescriptor45
RunJournalEvent5

extensions1288

domain1003

Domain108
Framework36
Language39
Library63
Platform10
PlatformService65
SkillArea319
Specialization39
StackPart52
StackProfile7
Tool101
Topic164

workflows668

Workflow668

terminology444

Acronym25
Definition116
Synonym11
Term292

agent-stack349

AgentCoreImpl28
AgentPlatformImpl20
AgentProduct38
AgentRuntimeImpl24
AgentUIImpl18
AgentVersion42
CapabilityProfile3
InteractionPrimitive257
InteractionPrimitiveCategory12
LaunchConfig10
Presentation22
SessionModel4

benchmarks232

Benchmark65
EvalHarness5
EvalResult73
EvalRun70
Judge3
Rubric3
TestSet13

role186

Customer3
EndUser3
OrgUnit59
Responsibility49
Role66
Tenant3
VendorRelationship3

sourceref-scope175

ScopeBoundary111
SourceRef64

channels-hooks137

Channel6
DecisionVerb5
HookMapping54
HookSurface70
MergePolicy2

capabilities-and-models133

CapabilitySupport100
Modality7
ModelProviderProduct7
ModelProviderVersion7
ProcessDescriptor2
TransportProtocol10

compute116

APIErrorClass3
AgentHostTransport7
CapacityCascadeSignal2
MCPTransport4
ModelFamily23
ModelTransportProtocol12
ModelVersion49
Provider12
TranscriptIngressEndpoint1
TransportClient2
TransportProxy1

lifecycle106

capabilities85

Capability74
InstallMethod11

wiki44

Page44

security21

FilesystemSafetyInvariant3
HarnessHardeningGuidance3
OutputGuard5
OutputModeChange2
OutputStyle2
PermissionDenialReason3
SecretHandlingPolicy3

stack-layers11

Layer11

context-engineering8

BackgroundConsolidation2
CompactionPolicy2
MemoryHierarchy2
ProactiveSurface2

trust5

TrustLevel5

HomeEvalRuneval-run:gaia.claude-code.2025

eval-run:gaia.claude-code.2025

eval-run:gaia.claude-code.2025

EvalRunbenchmarks/eval-runs/gaia-claude-code.yaml·Open in Graph →

overview json graph

Attributes

benchmarkId

testSetId

test-set:gaia-validation

target

agent-version:claude-code@1.x

targetId

agent-version:claude-code@1.x

runAt

2025-06-01T00:00:00Z

runBy

@a5c-ai/team

configHash

sha256:placeholder-claude-code-gaia

Outgoing edges (16)

evaluated_by1

benchmark:gaia·BenchmarkGAIA

evaluates_target1

agent-version:claude-code@1.x·AgentVersion

for_benchmark1

benchmark:gaia·BenchmarkGAIA

judged_by3

judge:gpt-4o-pairwise·JudgeGPT-4o pairwise preference judge
judge:claude-3-5-sonnet-rubric·JudgeClaude 3.5 Sonnet rubric-based judge
judge:exact-match·JudgeExact-match programmatic judge

produced_result1

eval-result:mmlu.qwen-2-5-72b.001·EvalResult

scored_against_rubric3

rubric:helpfulness-1-5·RubricHelpfulness 1-5 rubric
rubric:safety-3-axis·RubricSafety 3-axis rubric (harm, bias, refusal-appropriateness)
rubric:code-quality·RubricCode-quality rubric

uses_harness5

eval-harness:inspect-ai·EvalHarnessInspect AI
eval-harness:helm·EvalHarnessStanford HELM
eval-harness:lm-eval-harness·EvalHarnessEleutherAI lm-evaluation-harness
eval-harness:openai-evals·EvalHarnessOpenAI Evals
eval-harness:promptfoo·EvalHarnesspromptfoo

uses_test_set1

test-set:gaia-validation·TestSetGAIA validation split

Incoming edges (1)

belongs_to_eval_run1

eval-result:gaia.claude-code.001·EvalResult