Why coding-agent CLIs are special

Claude Code, Copilot CLI, and Codex sit at a particularly interesting boundary: they are still language models, but they also act like small software engineers. They inspect repositories, run shell commands, read files, propose edits, and sometimes execute tests. That means their outputs already come with a latent geometry of evidence.

🤖

JuGeo exploits that structure directly. A coding-agent run is not only "an answer"; it is a coordinate in a coding site with attached artifacts, overlap claims, and trust-carrying evidence channels.

Coordinate

Repository, task, files touched, model identity, and tool context.

Section

Proposed code, explanations, behavioral claims, and architectural assertions.

Evidence

Shell commands, file reads, grep/glob results, test runs, citations, and logs.

Trust

Pure proposal, repository-grounded analysis, tool-executed result, or tool-verified output.

The adapter layer

The concrete interface lives in jugeo-agents/src/jugeo_agents/adapters/coding_agents.py. JuGeo implements three dedicated adapters that normalize external CLI behavior into a common shape:

External system Adapter Captured evidence Typical trust entry
Claude Code ClaudeCodeAdapter patches, file operations, bash/tool calls, test results, explanations TOOL_VERIFIED when tools + passing tests are present
Copilot CLI CopilotCLIAdapter suggested edits, repository inspection, shell commands, explanations TOOL_EXECUTED or CITATION_BACKED when grounded by inspection
Codex CodexAdapter candidate implementations, reasoning text, optional tool/test metadata often a strong proposal unless paired with explicit execution evidence

The key move is normalization. After transport through the adapter layer, these distinct systems all become comparable CodeOutput / AgentOutput objects.

python
CodeOutput(
    agent_name="claude-code" | "copilot-cli" | "codex",
    agent_model="claude-sonnet-4" | "gpt-4.1" | "o4-mini",
    code="...",
    explanation="...",
    files_modified=["src/parser.py"],
    tools_used=["bash:pytest", "grep", "read_file"],
    test_results={"passed": True, "exit_code": 0},
    citations=[...],
    metadata={...},
)

What JuGeo can ask once they are normalized

A single coding CLI cannot easily tell you whether its own patch should beat another model's patch. JuGeo can, because it compares outputs as local sections over the same site.

  • Did Claude Code and Copilot CLI make contradictory claims about the same function?
  • Did Codex claim a refactor preserves behavior without test evidence?
  • Did a model say "all tests pass" even though no executed test result appears in the evidence?
  • Are three agents agreeing because they independently verified a property, or because they echoed the same unsupported explanation?
💡

This is the central idea: coding-agent outputs are not ranked by vibes or model prestige. They are checked under descent and ordered by trust.

A concrete orchestration example

Suppose the same refactoring task is sent to Claude Code, Copilot CLI, and Codex. Claude Code edits parser.py and runs tests. Copilot CLI inspects the repository and proposes a shorter equivalent rewrite. Codex offers a third implementation with a cleaner API boundary but no execution evidence.

python
from jugeo_agents.adapters.coding_agents import (
    ClaudeCodeAdapter,
    CopilotCLIAdapter,
    CodexAdapter,
    CodingAgentOrchestrator,
)

orch = CodingAgentOrchestrator()

orch.add_output(ClaudeCodeAdapter.from_response(
    code="def parse(items): return [x.strip() for x in items]",
    explanation="Normalizes whitespace and preserves order.",
    tools_used=["bash:pytest"],
    test_results={"passed": True, "exit_code": 0},
))

orch.add_output(CopilotCLIAdapter.from_response(
    code="def parse(items): return list(map(str.strip, items))",
    explanation="Equivalent map-based implementation.",
    tools_used=["grep", "read_file"],
))

orch.add_output(CodexAdapter.from_response(
    code="def parse(items): return [item.strip() for item in items if item]",
    explanation="Filters falsy inputs before normalization.",
))

section = orch.verify()
print(section.summary_text())

JuGeo then extracts code-level claims ("preserves order", "tests pass", "filters falsy inputs"), checks overlaps, demotes unsupported assertions, and computes a fused global section only if the outputs actually glue.

Trust transport across coding agents

The adapters do not just rename fields. They encode different trust transport laws:

  • Claude Code: strongest when tool usage and passing tests are both present.
  • Copilot CLI: stronger when repository inspection or shell execution backs the claim.
  • Codex: often a proposal until explicit execution or citation evidence appears.

This lets JuGeo distinguish three qualitatively different outputs: executable evidence, repository-grounded analysis, and pure proposal. Existing coding-agent systems tend to collapse those into a single transcript. JuGeo keeps them separate because they glue differently.

Where to go next