Claude Code, Copilot CLI, and Codex as one verification site
jugeo-agents interfaces with coding-agent CLIs by treating each run as a structured evidence object rather than a blob of generated text. Patches, explanations, tool calls, file reads, and test results become local sections that can be compared, challenged, fused, or rejected.
Why coding-agent CLIs are special
Claude Code, Copilot CLI, and Codex sit at a particularly interesting boundary: they are still language models, but they also act like small software engineers. They inspect repositories, run shell commands, read files, propose edits, and sometimes execute tests. That means their outputs already come with a latent geometry of evidence.
JuGeo exploits that structure directly. A coding-agent run is not only "an answer"; it is a coordinate in a coding site with attached artifacts, overlap claims, and trust-carrying evidence channels.
Coordinate
Repository, task, files touched, model identity, and tool context.
Section
Proposed code, explanations, behavioral claims, and architectural assertions.
Evidence
Shell commands, file reads, grep/glob results, test runs, citations, and logs.
Trust
Pure proposal, repository-grounded analysis, tool-executed result, or tool-verified output.
The adapter layer
The concrete interface lives in jugeo-agents/src/jugeo_agents/adapters/coding_agents.py.
JuGeo implements three dedicated adapters that normalize external CLI behavior into a common shape:
| External system | Adapter | Captured evidence | Typical trust entry |
|---|---|---|---|
| Claude Code | ClaudeCodeAdapter |
patches, file operations, bash/tool calls, test results, explanations | TOOL_VERIFIED when tools + passing tests are present |
| Copilot CLI | CopilotCLIAdapter |
suggested edits, repository inspection, shell commands, explanations | TOOL_EXECUTED or CITATION_BACKED when grounded by inspection |
| Codex | CodexAdapter |
candidate implementations, reasoning text, optional tool/test metadata | often a strong proposal unless paired with explicit execution evidence |
The key move is normalization. After transport through the adapter layer, these distinct
systems all become comparable CodeOutput / AgentOutput objects.
CodeOutput(
agent_name="claude-code" | "copilot-cli" | "codex",
agent_model="claude-sonnet-4" | "gpt-4.1" | "o4-mini",
code="...",
explanation="...",
files_modified=["src/parser.py"],
tools_used=["bash:pytest", "grep", "read_file"],
test_results={"passed": True, "exit_code": 0},
citations=[...],
metadata={...},
)
What JuGeo can ask once they are normalized
A single coding CLI cannot easily tell you whether its own patch should beat another model's patch. JuGeo can, because it compares outputs as local sections over the same site.
- Did Claude Code and Copilot CLI make contradictory claims about the same function?
- Did Codex claim a refactor preserves behavior without test evidence?
- Did a model say "all tests pass" even though no executed test result appears in the evidence?
- Are three agents agreeing because they independently verified a property, or because they echoed the same unsupported explanation?
This is the central idea: coding-agent outputs are not ranked by vibes or model prestige. They are checked under descent and ordered by trust.
A concrete orchestration example
Suppose the same refactoring task is sent to Claude Code, Copilot CLI, and Codex. Claude
Code edits parser.py and runs tests. Copilot CLI inspects the repository and
proposes a shorter equivalent rewrite. Codex offers a third implementation with a cleaner
API boundary but no execution evidence.
from jugeo_agents.adapters.coding_agents import (
ClaudeCodeAdapter,
CopilotCLIAdapter,
CodexAdapter,
CodingAgentOrchestrator,
)
orch = CodingAgentOrchestrator()
orch.add_output(ClaudeCodeAdapter.from_response(
code="def parse(items): return [x.strip() for x in items]",
explanation="Normalizes whitespace and preserves order.",
tools_used=["bash:pytest"],
test_results={"passed": True, "exit_code": 0},
))
orch.add_output(CopilotCLIAdapter.from_response(
code="def parse(items): return list(map(str.strip, items))",
explanation="Equivalent map-based implementation.",
tools_used=["grep", "read_file"],
))
orch.add_output(CodexAdapter.from_response(
code="def parse(items): return [item.strip() for item in items if item]",
explanation="Filters falsy inputs before normalization.",
))
section = orch.verify()
print(section.summary_text())
JuGeo then extracts code-level claims ("preserves order", "tests pass", "filters falsy inputs"), checks overlaps, demotes unsupported assertions, and computes a fused global section only if the outputs actually glue.
Trust transport across coding agents
The adapters do not just rename fields. They encode different trust transport laws:
- Claude Code: strongest when tool usage and passing tests are both present.
- Copilot CLI: stronger when repository inspection or shell execution backs the claim.
- Codex: often a proposal until explicit execution or citation evidence appears.
This lets JuGeo distinguish three qualitatively different outputs: executable evidence, repository-grounded analysis, and pure proposal. Existing coding-agent systems tend to collapse those into a single transcript. JuGeo keeps them separate because they glue differently.