| ์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
|---|---|---|---|---|---|---|
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |
| 31 |
- ์ถ์ฒ
- ํ๊ธฐ
- ์๊ณ ๋ฆฌ์ฆ
- ์ฝ๋ฉํ ์คํธ
- zabbix
- ์ก์
- MariaDB
- ์ฝํ
- ๊ฐ์ ์๋ฐ๊ฒฌ
- ๊ฐ๋ฐ
- ๊ณต๋ต
- ๊ฐ์
- ๊ฐ๋ฐ์
- ์ด์
- ๋ฐฉ๋ฒ
- ์๋น ์ค
- ์ฟ ํค๋ฐ
- ํด๊ฒฐ
- ์ค๋ผํด
- oracle
- ๊ฐ์
- ๋๋งํฌ
- ๋ชจ๋ํฐ๋ง
- ์ฟ ํค
- error
- ์๋ฐ
- java
- solve
- ํ๋ฌ์ค
- window
- Today
- Total
์ก๋์ฌ๋
Why Prompt Engineering Alone Fails in LLM Systems (And How to Fix It with Convergence) ๋ณธ๋ฌธ
Why Prompt Engineering Alone Fails in LLM Systems (And How to Fix It with Convergence)
yeTi 2026. 4. 13. 16:53
Lessons learned from building a real-world LLM coding agent with local models
๐ TL;DR
- LLMs are non-deterministic โ same input, different outputs
- Pipeline architectures amplify failure probabilities
- Prompt engineering improves outputs but cannot guarantee reliability
- The real solution is not better prompts, but convergence systems
1. Problem โ You Canโt Even Get Stable Outputs
I wanted to build a local LLM-powered coding assistant.
So I set up:
- Mac Studio
- Ollama
- Claude Code CLI
- qwen3.5
Then I tried the simplest possible task:
Build a simple API
But the results were unstable:
- Sometimes no output at all
- Sometimes excessive file exploration (over-exploration)
- Sometimes the task never completed
The problem wasnโt correctness.
The problem was that I couldnโt reliably get results at all.
2. Observation โ Small Tasks Work
After multiple attempts, I noticed a pattern:
Local LLMs perform much better on small, well-defined tasks.
For example:
- Implementing a single function
- Fixing a specific bug
- Tasks with clear input/output
This led to an important insight:
โBreak the problem down into smaller pieces.โ
3. Approach โ Role Decomposition
Instead of one large prompt, I split the task into stages:
[Analyze] โ [Design] โ [Implement]Each step:
- Has a narrow scope
- Produces structured output
- Can be validated
This significantly improved success rates (in manual runs).
4. Scaling Up โ Pipeline Automation
Naturally, the next step was:
โLetโs automate this workflow.โ
So I built a pipeline:
User Input
โ
[Analyze] โ [Design] โ [Implement]
โ
Final Output5. Problem โ The Pipeline Breaks Easily
After automation, new issues appeared:
- Sometimes it works
- Sometimes it completely fails
The key issue:
A single failure breaks the entire pipeline.
6. Why Pipelines Fail
6.1 LLMs Are Non-Deterministic
Unlike traditional systems:
- Same input โ same output (X)
- Same input โ probabilistic output (O)
6.2 Probability Compounding
If each step succeeds with probability ( p ):
P_{total} = p_1 \times p_2 \times p_3
As the number of steps increases, total success probability drops rapidly.
6.3 Manual vs Automated Execution
| Aspect | Manual | Automated |
|---|---|---|
| Human intervention | Yes | No |
| Error recovery | Possible | None |
| Progress condition | Partial success | Full success |
Pipelines require every step to succeed every time.
7. The Real Problem
Initially, I thought:
โWe need better prompts.โ
But the real issue was:
โHow do we handle failures?โ
This is not a prompt problem.
It is a system design problem.
8. Solution โ Convergence System
Instead of a linear pipeline, I redesigned the system as a convergence loop.
LLM Call
โ
Validation
/ \
OK FAIL
โ โ
Accept Retry9. Implementation โ Retry + Validation
9.1 Retry Loop
def run_with_retry(task_fn, validate_fn, max_retry=3):
for attempt in range(max_retry):
result = task_fn()
if validate_fn(result):
return result
return result
9.2 Validation Example
def validate_code(result):
if "```" not in result:
return False
if "TODO" in result:
return False
return True
9.3 Step Isolation
analysis = analyze(input)
design = design(analysis)
code = implement(design)
Each step is independently validated and recoverable.
10. Results
After introducing convergence mechanisms:
- Reduced over-exploration
- Fewer pipeline failures
- More consistent outputs
The most important change:
The system started working by design, not by luck.
11. Final Takeaway
Prompt engineering matters.
But it is not enough for automation.
LLM systems are not about generating correct answers.
They are about controlling incorrect ones.