I recently needed to write complex state machine code for the financial calculators of a side project, and my first attempt with Claude didn’t yield the desired results; the problem was often not with Claude itself, but with my inability to guide it sufficiently. When expecting code generation from large language models (LLMs) like Claude, we often hope for a magical solution with a single “prompt,” but in reality, this approach usually falls short and doesn’t deliver the expected output. To get more efficient and consistent code output, we need to move LLMs beyond being simple chat tools and integrate them into a more structured engineering process. In this post, I’ll explain step-by-step how we can achieve better results using Claude for code generation with subagent architectures, the CLAUDE.md standard, and effective context management.
These techniques allow us to leverage Claude’s capabilities to their full potential, especially in projects with complex business logic or those requiring multiple files. Instead of a single large prompt, breaking down the task into smaller, manageable parts and delegating each part to a separate “expert” agent both increases output quality and simplifies debugging.
What Are the Common Problems in Claude Code Generation?
One of the most common issues we face when requesting code generation from Claude is the model’s inability to understand and correctly implement all requirements in a single go. Trying to convey all details, constraints, and the expected output with a single prompt often leads to the model overlooking important points or making interpretation errors. For example, when I asked it to write CRUD endpoints for an API, I noticed it neglected the authentication mechanism or specific validation rules.
Another problem is the “context window” limit. Especially in large projects or long conversations, we risk exceeding Claude’s context window to provide enough information. This can cause the model to forget previous instructions or code snippets, leading to inconsistent outputs. Furthermore, the model sometimes “hallucinating,” meaning it uses functions or libraries that don’t actually exist, is also a common issue that can render the generated code non-functional. We need more modular and controlled approaches to overcome these problems.
How Does the Subagent Architecture Work with Claude?
The subagent architecture is based on the principle of dividing a large, complex task into smaller, collaborating agents, each with a specific area of expertise. This approach works like specialists in a software team: an architect designs the overall structure, a backend developer writes the APIs, and a frontend developer creates the user interface. When using subagents with Claude, we define a “main agent” responsible for the overall coordination and phasing of the task, and “subagents” that perform specific sub-tasks under the main agent’s instructions.
I generally make the main agent responsible for the overall coordination and phasing of the task. For instance, if I’m developing a web application, the main agent first analyzes the requirements, then assigns a “Database Agent” to design the database schema, a “Backend Agent” to write the API endpoints, and a “Frontend Agent” to create the UI components. Each subagent receives more specific prompts relevant to its area of expertise and sends its outputs back to the main agent. The main agent then combines these outputs, checks for consistency, and provides feedback to the subagents for corrections if necessary. This structure makes complexity manageable and ensures higher quality, more focused outputs at each stage.
graph TD;
A["Main Agent (Task Manager)"] --> B{"Analyze Requirements"};
B --> C["Assign Database Agent"];
C --> D["Create Database Schema"];
D --> A;
A --> E["Assign Backend Agent"];
E --> F["Write API Endpoints"];
F --> A;
A --> G["Assign Frontend Agent"];
G --> H["Create UI Components"];
H --> A;
A -- "Combine & Verify Outputs" --> I["Final Code Output"];
This diagram illustrates the basic flow of the subagent architecture. Each agent works more efficiently with its own prompt and context management.
Why is the CLAUDE.md Standard Important and How is it Used?
CLAUDE.md is a markup language proposed by Anthropic that aims to provide clear and structured instructions to Claude for performing a specific task. By using this standard, we not only tell Claude what to do but also specify the output format, constraints, and even how it should think. This has been incredibly useful for me, especially in complex code generation tasks, because it minimizes ambiguity in the model’s mind.
The primary benefit of CLAUDE.md is that it improves the readability and consistency of prompts. It also helps the model understand instructions better and produce output in the desired format. A CLAUDE.md file typically includes the following sections:
# ROLE: Defines the agent’s role and area of expertise. For example, “You are a Python FastAPI expert.”# CONSTRAINTS: Specifies the task’s constraints and rules. “Use only Python 3.10+, leverage theasynciolibrary.”# GOAL: Explains the ultimate objective the agent needs to achieve. “Create CRUD API endpoints for a user-based blog.”# INPUT: Shows what the agent expects as input. “You will receive model definitions in JSON format from the user.”# OUTPUT: Details the format and content of the agent’s output. “Your output should include a runnablemain.pyfile and the necessaryrequirements.txt.”# STEPS: Lists the steps to be followed to complete the task. “First, define theSQLAlchemymodels, then create thePydanticschemas, and then write the endpoints.”# EXAMPLE: Provides an example showing the expected output format. This section helps the model visualize what it needs to produce.
By using this structure, especially when preparing a CLAUDE.md file for my “Backend Agent,” I found that the model generated the desired API faster and with fewer errors. This is not just a set of instructions but a guide that directs the model’s internal reasoning process.
Practical Approaches for Effective Context Management
When working with LLMs like Claude, context management is critical, especially in large projects or during long interactions. Exceeding the context window limit can cause the model to forget important information and produce inconsistent outputs. Therefore, managing context wisely reduces costs and improves output quality. I have a few practical approaches I apply in my own projects.
First, I use the method of eliminating unnecessary information. Instead of sending the entire conversation history or the entire project file with every interaction, I provide the model only with the most critical information relevant to the current task. For example, there’s no point in sending all Order or Product models to a subagent developing only the User model. I use simple token counters and dynamic context generation mechanisms to extract relevant parts.
Second, I adopt the strategy of summarization and using summaries. Instead of directly sending long conversation histories or large code files, I create concise summaries of this information and send them to the model. By defining a “Summarizer Agent,” we can summarize the main takeaways from previous interactions or code blocks and add them to the main context. This helps the model stay on topic while also allowing us to use the context window efficiently.
Third, I actively use the “Retrieval-Augmented Generation” (RAG) pattern. This involves integration with an external knowledge base (like a vector database) to allow the model to access information that is not in its knowledge base or is outdated. For instance, when developing a new module for a production ERP, instead of feeding the code base of existing modules directly to Claude, I create vector representations of critical functions and structures and retrieve them for the model in relevant prompts. This way, Claude can focus only on the relevant code snippets and produce more accurate and integrated outputs. This method is very effective, especially in large and constantly changing codebases.
# Example: A simple context management function
def manage_context(messages, max_tokens=4000):
current_tokens = sum(len(message["content"].split()) for message in messages)
if current_tokens <= max_tokens:
return messages
# Remove oldest messages or least important parts
# A simple example: remove oldest messages
while current_tokens > max_tokens and len(messages) > 1:
removed_message = messages.pop(0) # Remove the oldest message
current_tokens -= len(removed_message["content"].split())
# A more sophisticated approach: preserve important messages, summarize code, etc.
# If the context is still too large, issue a warning or error
if current_tokens > max_tokens:
Callout(type="warning", title="Context Exceeded", content="Context window is still too large, important information may be lost.")
print("Warning: Context window is still too large, important information may be lost.")
return messages
# Usage example
# conversation_history = [
# {"role": "user", "content": "Initial prompt."},
# {"role": "assistant", "content": "Initial response."},
# # ... more messages
# ]
# optimized_history = manage_context(conversation_history)
This simple manage_context function demonstrates a basic elimination strategy. In real applications, this logic can be much more complex and may involve content analysis, summarization, or RAG techniques.
Prompt Engineering Tips for Code Generation
To get high-quality code output from Claude, prompt engineering has become an art form. It’s not enough to just say what we want; we also need to specify how we want it, with what constraints, and even how we guide the model’s thought process. In my own experiences, I’ve gained a few critical tips in this area.
First, providing detailed and specific instructions is crucial. Instead of saying “Make a website,” we should say, “Using Python FastAPI, create a REST API connected to a PostgreSQL database, with JWT-based authentication, and CRUD endpoints for users and blog posts.” Also, specifying the expected file structure, libraries to be used (e.g., model validation for Pydantic, ORM for SQLAlchemy), and even the name of the main.py file helps the model produce more accurate outputs.
Second, I use the chain-of-thought technique. Instead of asking Claude to write the code directly, I ask it to break down the task into parts, think through each step, and explain it. For example, giving instructions like “First, think about the database schema, then write the Pydantic models, then design the API endpoints, and finally implement them.” This allows me to follow the model’s problem-solving process and detect errors early on.
Third, using positive and negative constraints together yields effective results. Just as important as saying “Add these features” (positive) is specifying constraints like “Do not use X library” or “Be aware of Y security vulnerability” (negative), which helps the model avoid undesirable behaviors. In a client project, when we needed to integrate with a specific legacy system, specifying that certain new libraries should not be used prevented Claude from adding unnecessary dependencies.
Fourth, providing examples (few-shot prompting) greatly contributes to the model’s understanding of the expected output format and style. Giving a small function or a Pydantic model example allows Claude to reference this example when generating other similar structures. This is especially effective if I want code that adheres to a specific coding standard or architectural pattern; showing this with a few simple examples is much more effective.
What are the Trade-offs of Using Subagents and CLAUDE.md?
While subagent architectures and the use of CLAUDE.md improve efficiency and quality in Claude’s code generation, they also come with certain trade-offs, as with any engineering approach. When implementing these approaches, the project’s size, complexity, and available resources should be considered.
The most obvious advantage is complexity management. Breaking down a large task into smaller, manageable parts ensures that each part is produced more focused and error-free. Additionally, since each subagent has its own context and role, the likelihood of the model “hallucinating” or deviating from the topic decreases. I observed that when I used this method to automate complex business workflows in a production ERP, the dependencies between different modules were managed better.
However, one disadvantage of this approach is that it introduces extra management overhead. Coordinating multiple agents, preparing separate prompts and CLAUDE.md files for each agent, can require more effort initially. The main agent’s responsibility to ensure communication between subagents and integration of outputs is an engineering task in itself. This can be seen as over-engineering, especially for small and simple projects.
Another trade-off is latency and cost. Making multiple API calls (a separate call for each subagent) can increase the total processing time compared to a single large call. Furthermore, re-preparing and sending context for each call increases the total token consumption and thus the cost. Therefore, especially in scenarios requiring real-time or high-volume code generation, these additional costs need to be carefully evaluated. I tried to keep these costs at acceptable levels in my side project by optimizing the call order and intelligent context elimination.
Finally, integration challenges should not be overlooked. Combining code snippets generated by subagents and ensuring they are compatible can sometimes require more manual correction than expected. Conflicts can arise in the integration layer, especially when different subagents produce code with different assumptions. Therefore, the main agent’s ability to combine these outputs and perform consistency checks becomes critical.
Conclusion: Smarter Code Development with Claude
Viewing Claude as merely a prompt box prevents us from fully utilizing its code generation potential. By adopting engineering approaches such as subagent architectures, the CLAUDE.md standard, and careful context management, we can significantly improve the quality, consistency, and reliability of the code output we receive from Claude. These techniques, especially in complex and multi-part projects, transform LLMs from mere “assistants” into true “teammates.”
Of course, these approaches require a certain learning curve and initial investment, but in the long run, they save time and help us build more maintainable codebases with fewer errors. In my own projects, after implementing these methods, I’ve seen Claude capable of producing powerful and accurate outputs not just for simple scripts, but also for complex APIs and even microservice architectures. The next step might be to try these approaches in your own projects and take Claude’s code writing capabilities to the next level.