How do I start with context engineering and what tools should I use?

I stopped writing fancy prompts to get value from large language models and started designing the model's context window with engineering principles. To start with context engineering, you first need to understand how large language models work, and then optimize the context window. In my experience, using visualization tools like Mermaid and special commands like Callout helps to optimize the context window.

What are the advantages and disadvantages of context engineering?

In my experience, the advantage of context engineering is that it enables more accurate and consistent results from large language models. However, its disadvantage is that it requires a lot of time and effort to optimize the context window. Also, incorrectly designing the context window can cause the model to make wrong decisions. For this reason, I believe that context engineering should be applied carefully.

What should I do if there is an error in context engineering?

If there is an error in context engineering, I should first re-examine and optimize the context window. Also, I should re-read the large language model's DOCUMENTATION and review the examples. In my experience, errors usually occur due to incorrect design of the context window or running the model with wrong parameters. Therefore, it is necessary to carefully design the context window and run the model.

Is context engineering replacing prompt engineering?

I believe that context engineering is replacing prompt engineering. As large language models evolve and context windows grow, the impact of prompt engineering decreases. In my experience, context engineering provides more accurate and consistent results from large language models. However, prompt engineering can still be useful in some situations. For this reason, I believe it is necessary to know and apply both methods.

Prompt Engineering is Dead, Long Live Context Engineering: The Model's

The era of whispering fancy adjectives and magic words to large language models (LLMs) to get miraculous outputs is over. Today, the key to success in AI integrations is not how creatively you write a “prompt,” but rather which data, in what order and format, you feed into the model’s limited memory window (context window). I call this context engineering, and it is a true software and system architecture discipline that replaces prompt writing.

As someone with over 20 years in system and backend development, I can tell you: roughly all the problems we encounter when integrating AI into enterprise systems stem not from “wrong prompts,” but from “dirty and uncontrolled context.” Whatever you put in front of the model, it will process. If you indiscriminately dump the entire database in front of it, you’ll inflate the bill and confuse the model, leading to incorrect decisions (hallucinations).

Why is prompt engineering becoming insufficient?

Prompt engineering was a temporary solution that worked when models had small context windows and their reasoning abilities were in their infancy. Back then, guiding the model with templates like “Think step-by-step” or “You are a financial expert” made a difference because these words were the only guidance the model had. However, as models evolved and context windows reached hundreds of thousands of tokens, the impact of these word games significantly diminished.

The biggest problem we face today is not the prompt itself, but the “lost in the middle” phenomenon that arises with the growth of context windows. Research and our field tests clearly show a truth: LLMs tend to focus on information at the beginning and end of the context they are given; they often ignore the massive data pile in the middle. If you write a fancy prompt and then indiscriminately paste 50 pages of documentation below it, there’s a very high chance the model will miss a critical business rule written somewhere in the middle.

What is context engineering and what does it aim for?

Context engineering is the process of filtering, structuring, prioritizing, and packaging the data to be sent to the model in the most optimized way. Its goal is to provide the model with only the most refined information necessary to complete the current task, thereby reducing latency, lowering costs, and maximizing output quality. This is not an art of words, but a design of data pipelines.

When designing a good context, we don’t leave the data in its raw form. We clean rows from databases, logs, or documents, removing unnecessary noise (boilerplate code, repetitive headers, redundant metadata fields). Then, we convert this data into Markdown or structured JSON format, which the model can parse most quickly and accurately.

graph TD;
A["Raw Data Sources"] --> B["Data Cleaning & Chunking"]
B --> C["Metadata Enrichment"]
C --> D["Vector Database (RAG)"]
D --> E["Context Re-ranking (Reranker)"]
E --> F["LLM Context Window"]

How is context designed in RAG architectures?

The biggest mistake made when building Retrieval-Augmented Generation (RAG) systems is directly feeding the first 5-10 results from the vector database to the model. If you rely solely on cosine similarity to create context, you might overwhelm the model with repetitive or completely irrelevant data. This increases token costs and confuses the model.

When designing context in a RAG architecture, we must follow these steps:

Semantic Chunking: Splitting documents only by character limits (e.g., every 1000 characters) breaks semantic integrity. Instead, we should use intelligent chunking strategies that follow paragraphs, Markdown headings, or code blocks.
Metadata Enrichment: Each data chunk should be tagged with information such as which document it belongs to, its creation date, and its authorization level. The model should be able to read the context of the information presented to it from these tags.
Re-ranking: The results returned from the vector database should be passed through a reranker model (e.g., Cohere or BAAI reranker) that optimizes keyword and semantic alignment, selecting the top 3 most relevant results.

How to manage token economics and context window limits?

The growth of context windows does not mean we can use them indefinitely; token costs and network latency still increase linearly (and sometimes exponentially). Sending 100,000 tokens with every request in a production system will quickly drain your wallet and ruin the user experience (TTFT - Time to First Token). Therefore, dynamically managing context size is a critical system engineering task.

To manage this situation, we utilize “Prompt Caching” mechanisms. If the system instructions and fixed documents we send to the system do not change, we cache them with supporting API providers (e.g., Anthropic or OpenAI) to gain significant cost and speed advantages in subsequent requests. Additionally, when storing user history, we should use a “sliding window” approach, keeping only the last N messages in the context and summarizing older messages into a single token block.

# Simple sliding window and summarization logic for context optimization
def build_context(user_history, current_query, max_tokens=4000):
    context = []
    current_tokens = count_tokens(current_query)
    
    # Prioritize the latest messages by iterating backward
    for message in reversed(user_history):
        msg_tokens = count_tokens(message["content"])
        if current_tokens + msg_tokens < max_tokens:
            context.insert(0, message)
            current_tokens += msg_tokens
        else:
            # For old messages exceeding the limit, pass through a summary service or skip
            context.insert(0, {"role": "system", "content": "[Summary of old conversations...]"})
            break
            
    context.append({"role": "user", "content": current_query})
    return context

How should state and memory management be handled in an LLM agent architecture?

When designing autonomous agents, the agent needs to remember its past actions and their outcomes. However, if this “memory” grows uncontrollably, the agent will eventually get lost in its own loops. Memory management is one of the most complex areas of context engineering.

In agent architectures, we store state data in fast and persistent data stores like Redis or PostgreSQL. Instead of sending the entire history to the agent at each step, we design a minimalist JSON object representing the agent’s current “state.” For example, if we are designing an e-commerce return agent, the agent’s context should contain only a clean state object like this, rather than the entire chat history:

{
  "current_step": "verify_invoice",
  "invoice_id": "INV-2026-0042",
  "verification_status": "pending_user_signature",
  "attempts": 2
}

This structured data allows the agent to focus directly on business logic without getting confused about what to do next.

Context engineering application in a real production ERP

In a manufacturing ERP, we designed an AI assistant for operators to analyze machine error codes and automatically open work orders for maintenance teams. In our initial attempts, we sent all sensor logs and historical maintenance documents from the machine to the LLM in their raw form. The result was a complete disaster: the model constantly made incorrect fault diagnoses, and each query took seconds.

To solve the problem, we stopped classic prompt modification and redesigned the context pipeline from scratch. First, we normalized the sensor data; we only added anomalous values to the context. For maintenance documents, we indexed them by error codes and only retrieved paragraphs matching the current error code.

# ERP Context Preparation Pipeline Example
def prepare_operator_context(machine_id, error_code):
    # 1. Get only active and anomalous sensor data (Reduce noise)
    telemetry = get_active_anomalies(machine_id) 
    
    # 2. Filter historical maintenance records specific to the error code
    history = query_maintenance_db(error_code, limit=2)
    
    # 3. Combine the context in Markdown format, which the model understands best
    context = f"""
    # MACHINE STATUS: {machine_id}
    Active Anomalies: {telemetry}
    
    # RELEVANT MAINTENANCE HISTORY FOR ERROR {error_code}:
    {history}
    """
    return context

After this structural change, the model’s accuracy in diagnosis significantly increased, and we reduced token consumption by roughly a third.

Conclusion

My clear stance is this: success in AI projects comes from data engineering, not wordplay. Put prompt writing aside; focus on filtering, prioritizing, and presenting data to the model in its most refined form. When you manage context correctly, even the most mediocre model can turn into a genius; when you pollute the context, even the most advanced model will only produce garbage for you.

Why is prompt engineering becoming insufficient?

What is context engineering and what does it aim for?

How is context designed in RAG architectures?

How to manage token economics and context window limits?

How should state and memory management be handled in an LLM agent architecture?

Context engineering application in a real production ERP

Conclusion

Frequently Asked Questions

Comments

Get notified about new posts

Your Reading Stats

Related Posts

Windsurf or Cursor? I Used Both AI Editors: Which One for Whom

45% of AI-Generated Code Contains Security Vulnerabilities: Code

Your Browser Extensions Can Betray You: Innocent One Moment, Malicious

Why is prompt engineering becoming insufficient?

What is context engineering and what does it aim for?

How is context designed in RAG architectures?

How to manage token economics and context window limits?

How should state and memory management be handled in an LLM agent architecture?

Context engineering application in a real production ERP

Conclusion

Frequently Asked Questions

Comments

Get notified about new posts

Your Reading Stats

Related Posts

Windsurf or Cursor? I Used Both AI Editors: Which One for Whom

45% of AI-Generated Code Contains Security Vulnerabilities: Code

Your Browser Extensions Can Betray You: Innocent One Moment, Malicious

Klavye Kısayolları