JUL 18 2025 // 3 min read

Production-Ready MCP Servers at Scale with Claude & Daytona

Zachary Zaro

Production-Ready MCP Servers at Scale with Claude & Daytona

# Contents

Claude Code CLI: The Brain of Our Operation The Technical Deep Dive: MCP Generation Workflow Real-World Example: E-commerce API Integration Performance and Scale Considerations Lessons Learned and Best Practices The Future: Where We're Heading Conclusion: The Power of Composable AI Infrastructure Try It Yourself

Introduction: The Challenge of API Integration at Scale

At Coherence, we’re building the infrastructure layer that enables products to embed intelligent, agentic chat into their apps. Our goal: get a fully functional AI chat interface running in your application with access to your backend APIs in under an hour.

That’s hard. Every customer has unique APIs, models, auth schemes, and logic. Our agents need to understand and safely interact with those APIs.

That’s where MCP comes in.

What Is MCP?

Model Context Protocol (MCP) is Anthropic’s open standard for connecting AI agents to external systems. It's a consistent abstraction for tool use, letting agents call real APIs safely and with structure.

The problem? Manually writing MCP servers for each customer doesn’t scale. So we automated it.

This post breaks down how we generate production-ready MCP servers using Claude Code and Daytona sandboxes.

Why MCP?

Standardized: Same format for any tool
Secure: Runs on isolated servers
Flexible: Handles stateless and stateful operations
Ecosystem: Part of a growing agentic tooling stack

Our Architecture at 30,000 Feet

Code copied successfully!

1Frontend → Coherence SDK → Coherence Backend → LangGraph Agent
2                                       ↓
3                              Generated MCP Server
4                                       ↓
5                                    Your APIs

The generated MCP server is the bridge that maps the agent’s high-level intent to your actual backend. It handles auth, retries, schema translation, and tool registration.

Daytona: Secure, Ephemeral Environments

We use Daytona to create secure, disposable sandboxes for each generation task. These environments provide:

Security Isolation
Reproducibility
Resource Limits and Cleanup
Full Automation via SDK

Code copied successfully!

1async def create_mcp_generation_environment():
2    workspace = await daytona_client.create(
3        image="our-mcp-generator:latest",  # valid image
4        resources={
5            "cpu": 2,
6            "memory": 4,
7            "disk": 3  # optional, but recommended
8        },
9        env={
10            "CLAUDE_CODE_PATH": "/usr/local/bin/claude-code",
11            "OUTPUT_DIR": "/workspace/generated"
12        },
13        auto_stop_interval=30,  # in minutes
14    )
15    return workspace

Claude Code CLI: The Brain of Our Operation

The second key piece is Anthropic's Claude Code CLI. While many know Claude as a chat interface, Claude Code is a powerful command-line tool designed specifically for software development tasks.

Why Claude Code CLI?

Context Window Management: Handles large codebases intelligently
Tool Use: Native support for file operations, code analysis, and more
Deterministic Outputs: Consistent code generation patterns
Error Recovery: Sophisticated retry and correction mechanisms

Our Integration Approach

We use Claude Code in a headless mode within our Daytona environments:

Code copied successfully!

1# Example of how we invoke Claude Code
2claude-code 
3
4  -p "OUR PROMPT"
5
6  --max-iterations 5

The Technical Deep Dive: MCP Generation Workflow

Now let's get into the meat of how this works. Our MCP generation pipeline consists of several stages:

1. API Specification Analysis

First, we analyze the customer's API documentation. The job here is to extract info about the endpoints available, the parameters supported, and the data returned. This can be a structured spec, such as an OpenAPI document, or can be other inputs, such as plain-text descriptions or code snippets from the server. The Claude code agent with the LLM is able to work with a wide range of data, but the old truth about "garbage-in, garbage-out" always applies!

Code copied successfully!

1def analyze_api_spec(spec_data):
2    # Extract endpoints, parameters, auth requirements
3    endpoints = extract_endpoints(spec_data)
4    auth_scheme = detect_auth_pattern(spec_data)
5    
6    # Build a semantic understanding of the API
7    api_context = {
8        "endpoints": endpoints,
9        "auth": auth_scheme,
10        "patterns": detect_common_patterns(endpoints),
11        "relationships": infer_resource_relationships(endpoints)
12    }
13    
14    return api_context

2. Prompt Engineering for MCP Generation

This is where the magic happens. We've developed a sophisticated prompting framework that guides Claude Code to generate optimal MCP servers. Without revealing our secret sauce, here's the high-level approach:

Code copied successfully!

1def build_generation_prompt(api_context):
2    # Framework generates prompts with:
3    # - API context and patterns
4    # - MCP best practices
5    # - Error handling requirements
6    # - Performance optimizations
7    # - Security constraints
8    
9    prompt = PromptTemplate(
10        system_context=MCP_BEST_PRACTICES,
11        api_details=api_context,
12        constraints=SECURITY_REQUIREMENTS,
13        examples=relevant_examples(api_context)
14    )
15    
16    return prompt.render()

3. Iterative Generation and Validation

Claude Code doesn't just generate code - it iterates and improves:

Code copied successfully!

1async def generate_mcp_server(workspace, prompt):
2    # Initial generation
3    await workspace.run_claude_code(prompt)
4    
5    # Validation loop
6    for iteration in range(MAX_ITERATIONS):
7        validation_result = await validate_generated_code(workspace)
8        
9        if validation_result.is_valid:
10            break
11            
12        # Self-correction
13        correction_prompt = build_correction_prompt(validation_result.errors)
14        await workspace.run_claude_code(correction_prompt)
15    
16    return await workspace.get_generated_files()

4. Testing and Verification

Every generated MCP server goes through rigorous testing:

Code copied successfully!

1async def test_mcp_server(server_path, test_cases):
2    # Spin up the MCP server
3    server_process = await start_mcp_server(server_path)
4    
5    # Run test cases
6    results = []
7    for test in test_cases:
8        result = await execute_mcp_command(
9            server_process,
10            test.tool_name,
11            test.parameters
12        )
13        results.append(validate_response(result, test.expected))
14    
15    return TestReport(results)

Real-World Example: E-commerce API Integration

Let's walk through a concrete example. Imagine we're integrating with an e-commerce platform's API:

Input: OpenAPI Specification

Code copied successfully!

1openapi: 3.0.0
2paths:
3  /products:
4    get:
5      parameters:
6        - name: category
7          in: query
8          schema:
9            type: string
10        - name: limit
11          in: query
12          schema:
13            type: integer
14  /orders:
15    post:
16      security:
17        - bearerAuth: []
18      requestBody:
19        content:
20          application/json:
21            schema:
22              $ref: '#/components/schemas/Order'

Generated MCP Server (Simplified)

In the example below, the Coherence SDK handles all the hard parts of passing authentication information in real time to the Coherence backend, the LangGraph agent, the MCP servers, and then your backend. You don't need to manage these security-critical transfers, and your users can chat with the same permissions and login they already have. It "just works!"

Code copied successfully!

1# Auto-generated by Coherence MCP Generator
2import asyncio
3from mcp import MCPServer, Tool, ToolResult
4
5class EcommerceMCPServer(MCPServer):
6    def __init__(self, api_base_url, auth_token):
7        super().__init__()
8        self.api_base_url = api_base_url
9        self.auth_token = auth_token
10        
11        # Register tools
12        self.register_tool(self.search_products)
13        self.register_tool(self.create_order)
14    
15    @Tool(
16        name="search_products",
17        description="Search for products in the catalog",
18        parameters={
19            "category": {"type": "string", "description": "Product category"},
20            "limit": {"type": "integer", "description": "Max results", "default": 10}
21        }
22    )
23    async def search_products(self, category=None, limit=10):
24        params = {"limit": limit}
25        if category:
26            params["category"] = category
27            
28        response = await self.http_client.get(
29            f"{self.api_base_url}/products",
30            params=params
31        )
32        
33        return ToolResult(
34            success=True,
35            data=response.json()
36        )
37    
38    @Tool(
39        name="create_order",
40        description="Create a new order",
41        parameters={
42            "items": {"type": "array", "description": "Order items"},
43            "shipping_address": {"type": "object", "description": "Shipping details"}
44        }
45    )
46    async def create_order(self, items, shipping_address):
47        response = await self.http_client.post(
48            f"{self.api_base_url}/orders",
49            json={"items": items, "shipping_address": shipping_address},
50            headers={"Authorization": f"Bearer {self.auth_token}"}
51        )
52        
53        return ToolResult(
54            success=True,
55            data=response.json()
56        )

Performance and Scale Considerations

Generating MCP servers at scale requires careful optimization:

1. Caching and Reuse

Code copied successfully!

1# We cache common patterns and components
2@lru_cache(maxsize=1000)
3def get_auth_handler(auth_type, config):
4    # Returns cached auth handler implementation
5    pass

2. Parallel Generation

Code copied successfully!

1# Generate multiple tools in parallel
2async def generate_tools_parallel(tool_specs):
3    tasks = [generate_single_tool(spec) for spec in tool_specs]
4    return await asyncio.gather(*tasks)
5

3. Resource Management

Daytona workspaces are auto-scaled
Generation timeout limits prevent runaway processes
Automatic cleanup of failed generations and recource archiving for cost management

Lessons Learned and Best Practices

After generating our first batches of MCP servers, here's what we've learned:

1. Prompt Engineering is Everything

The quality of generated code is directly proportional to prompt quality. We maintain a library of prompt components that we compose for different scenarios.

2. Validation is Non-Negotiable

Every generated server must pass:

Static type checking
Security scanning
Functional tests
Performance benchmarks

3. Human-in-the-Loop for Edge Cases

While automation handles 90% of cases, complex APIs benefit from human review. We've built tooling to make this review process efficient. We've also built UI to allow Coherence users to view and edit their MCP code directly and deploy new versions whenever they want to.

4. Version Control and Rollback

Generated servers are version-controlled with clear rollback procedures. This is critical when APIs change. In the Coherence UI, you can see a lot of previous versions, their timestamps, status, and other info. You can generate new versions and roll back at any time.

The Future: Where We're Heading

1. Self-Improving Generation

We're building systems where generated MCP servers learn from usage patterns and self-optimize.

2. Multi-Modal MCP

Beyond REST APIs - integrating with databases, message queues, and even UI automation.

3. Open Source Contributions

We're working with Anthropic to contribute improvements back to the MCP ecosystem.

Conclusion: The Power of Composable AI Infrastructure

The combination of Daytona's secure environments and Claude Code's generation capabilities has allowed us to solve what seemed like an intractable problem: making any API instantly accessible to AI agents.

This approach - using AI to build AI infrastructure - represents a new paradigm in software development. We're not just writing code; we're building systems that write code, with all the quality and security guarantees of human-written software.

If you're building in the AI space, I encourage you to think about similar multiplicative approaches. What manual processes in your workflow could be automated with the right combination of tools?

Try It Yourself

Interested in adding intelligent chat to your application? Check out Coherence - we handle all the complexity described above, so you can focus on building great products.

Want to experiment with MCP? Start with Anthropic's MCP documentation and try building a simple server manually first.

Building agentic computing or development environments? Daytona is revolutionizing how we think about containerized infrastructure.

Have questions or want to discuss MCP generation?

Find me on Twitter or HN. If you're solving similar problems, I'd love to hear your approach!

Tags::

AI Infrastructure
Claude Code
Daytona
MCP
Agentic Workflows
API Integration
DevOps
LLMOps
Code Generation
Secure Sandboxes

About

Coherence gives business experts control of automated rules and real-time guardrails that scale your review workflow—minimizing risk, improving accuracy, and eliminating developer bottlenecks.

[ → Learn more ]

The author

Zachary Zaro

Co-founder and CEO at Coherence

Zachary Zaro is the Co-founder and CEO of Coherence, the first Developer Experience Platform. Coherence automates infrastructure in your cloud, manages CI/CD, and creates cloud-based dev environments from one config. It integrates best-in-class tools to streamline the full software development lifecycle.

Production-Ready MCP Servers at Scale with Claude & Daytona

Zachary Zaro

# Contents

Introduction: The Challenge of API Integration at Scale

What Is MCP?

Why MCP?

Our Architecture at 30,000 Feet

Daytona: Secure, Ephemeral Environments

Claude Code CLI: The Brain of Our Operation

Why Claude Code CLI?

Our Integration Approach

The Technical Deep Dive: MCP Generation Workflow

1. API Specification Analysis

2. Prompt Engineering for MCP Generation

3. Iterative Generation and Validation

4. Testing and Verification

Real-World Example: E-commerce API Integration

Input: OpenAPI Specification

Generated MCP Server (Simplified)

Performance and Scale Considerations

1. Caching and Reuse

2. Parallel Generation

3. Resource Management

Lessons Learned and Best Practices

1. Prompt Engineering is Everything

2. Validation is Non-Negotiable

3. Human-in-the-Loop for Edge Cases

4. Version Control and Rollback

The Future: Where We're Heading

1. Self-Improving Generation

2. Multi-Modal MCP

3. Open Source Contributions

Conclusion: The Power of Composable AI Infrastructure

Try It Yourself

About

The author

Zachary Zaro

Related Content

Fastest-Growing Infra Company in History

Async Python SDK

Declarative Image Builder

Newsletter