잡동사니

How I Turned GitLab into a Coordination Layer for Autonomous AI Development Agents 본문

IT/AI

How I Turned GitLab into a Coordination Layer for Autonomous AI Development Agents

yeTi 2026. 5. 14. 10:06

Lessons from building a multi-agent AI development workflow for a production project

TL;DR

Building a reliable AI coding agent is one engineering problem.

Building a reliable AI development workflow with multiple agents is another.

A single agent mostly struggles with execution quality.

Multiple agents introduce coordination problems:

  • task ownership
  • shared state visibility
  • race conditions
  • workspace contamination
  • lock recovery
  • operational governance

While building an autonomous development workflow for the sqlgen project, I learned that code generation was only one part of the problem.

The dominant challenge was coordination.

GitLab labels became the shared state machine that allowed independent agents to coordinate work safely. GitLab’s scoped labels are explicitly designed to support mutually exclusive workflow states, which makes them a practical coordination primitive for workflow orchestration. ([GitLab 문서][1])

The Goal

The original goal was straightforward.

I wanted engineering work inside the sqlgen project to move through an AI-assisted delivery workflow with minimal manual execution.

The target flow looked like this:

Issue discovered
→ planned
→ implemented
→ reviewed
→ tested
→ merged

The initial assumption was simple:

If the coding model is good enough, autonomous delivery becomes practical.

That assumption turned out to be incomplete.

Code generation solved only part of the problem.

Once multiple agents became involved, coordination became the dominant engineering challenge.

This Was Not a Single-Agent Problem

I was not building a coding assistant.

I was building a workflow where multiple agents had distinct responsibilities.

A simplified structure:

Human PM
   ↓
PM Bot
   ↓
Review Bot
   ↓
Dev Bot
   ↓
QA Bot
   ↓
Human Approval

Each agent had a narrower role.

That part was intentional.

Specialized agents are easier to reason about than one general-purpose autonomous actor.

But specialization creates a new requirement:

shared operational context.

A human team can rely on conversation, memory, and implicit understanding.

Independent agents cannot.

Task ownership, workflow progress, and execution state must be externally visible.

That made coordination state an explicit architectural concern.

Why GitLab?

A natural question:

Why use GitLab instead of building a dedicated orchestration service?

The answer was practical.

GitLab already provided several useful properties.

1. Existing Workflow Surface

The engineering workflow already lived in GitLab:

  • issues
  • merge requests
  • labels

That meant no additional operational UI needed.

Agents could integrate into the workflow engineers were already using.

2. Shared Visibility

Humans and agents could observe the same workflow state.

This matters operationally.

A coordination system that only agents understand becomes difficult to debug.

GitLab gave immediate human inspectability.

An engineer could look at an issue and immediately understand where work was stuck.

3. Simple Polling Model

The initial MVP used a cron-based automation model.

Example:

find issues with workflow::dev-ready

This approach was intentionally simple.

No event bus.
No dedicated orchestration queue.
No new infrastructure.

For an MVP, operational simplicity mattered more than architectural purity.

4. Explicit State Representation

Scoped labels gave a lightweight way to encode workflow lifecycle state.

Example:

workflow::pm-ready
workflow::dev-running
workflow::review-ready

Because labels within the same scope are mutually exclusive, workflow transitions become naturally enforceable. ([GitLab 문서][1])

That significantly reduced coordination ambiguity.

The architectural tradeoff was intentional:

Instead of introducing a separate orchestration system, I reused the existing engineering control plane.

GitLab as a Shared State Machine

The workflow state model looked like this:

workflow::pm-ready
workflow::pm-running
workflow::dev-ready
workflow::dev-running
workflow::review-ready
workflow::qa-ready
workflow::done
workflow::failed

Example lifecycle:

Issue created
→ workflow::pm-ready

PM Bot claims task
→ workflow::pm-running

Planning complete
→ workflow::dev-ready

Dev Bot claims task
→ workflow::dev-running

Implementation complete
→ workflow::review-ready

This solved a critical coordination problem.

Agents no longer depended on hidden internal context.

Workflow state became:

  • explicit
  • queryable
  • observable

GitLab was no longer just storing code.

It was acting as the coordination layer for distributed autonomous workers.

First Working MVP

The initial MVP worked under normal execution conditions.

The execution flow looked like this:

1-minute cron poller
↓
Find issues labeled workflow::dev-ready
↓
Acquire workspace lock
↓
Mark issue workflow::dev-running
↓
Execute Codex implementation flow
↓
Create merge request
↓
Transition issue to workflow::review-ready

This was enough to validate the architectural direction.

But happy paths do not validate operational systems.

Failure behavior does.

What Actually Broke

The dominant failures were operational coordination failures rather than model capability failures.

1. Double Pickup

Without explicit claiming, multiple agents can observe the same available task.

Example:

Agent A sees workflow::dev-ready
Agent B sees workflow::dev-ready
Both begin execution

Classic race condition.

Humans resolve this socially.

Distributed workers do not.

The fix:

  • explicit task claiming
  • state transition before execution
  • locking

2. Dirty Workspace Contamination

A failed execution could leave behind:

  • modified files
  • temporary branches
  • partial generated output
  • broken local state

The next execution inherited polluted state.

This produced misleading failures.

The issue was not reasoning quality.

It was environment integrity.

The fix:

  • workspace isolation
  • cleanup contracts
  • pre-execution guards

3. Cron Environment Drift

Manual execution succeeded.

Automated execution failed.

This is a classic operational issue.

Cron environments differ from interactive shells.

Common failures:

  • PATH mismatch
  • missing environment variables
  • CLI auth assumptions
  • host normalization issues

In practice, this surfaced as:

  • Codex working manually but failing in automation
  • glab targeting the wrong host
  • executables missing during scheduled execution

These are not glamorous problems.

But production automation usually fails on operational details, not architecture diagrams.

4. Stale Locks

Locks prevent concurrent execution.

But failed runs can leave stale locks behind.

Result:

lock exists
→ no new work claimed
→ workflow silently stalls

Without recovery logic, the system appears healthy while doing nothing.

The fix:

  • lock TTL
  • stale lock detection
  • cleanup recovery

Human-Governed Autonomy

A design correction emerged during implementation.

Full autonomy is not the immediate objective.

A more practical operational model is:

human-governed autonomy

Humans remain responsible for:

  • defining goals
  • approving critical changes
  • resolving ambiguity
  • production governance

Agents handle:

  • execution
  • repetitive workflow progression
  • structured implementation tasks

This boundary preserves automation benefits while reducing operational risk.

Key Engineering Lesson

Single-agent reliability asks:

How do I make one agent execute correctly?

Multi-agent workflow reliability asks:

How do independent agents coordinate safely?

These are different engineering problems.

The second problem looks much closer to distributed systems engineering than prompt engineering.

Because the failure modes are familiar:

  • shared state consistency
  • ownership conflicts
  • stale resources
  • operational recovery
  • workflow observability

Reliable agents are useful.

Reliable coordination is essential.

Comments