Multi-Model and Toolchain Integration
In-depth exploration of multi-AI model and toolchain integration in the HotelByte project, including model selection, cost optimization, performance comparison, and real-world application cases.
Multi-Model and Toolchain Integration

Introduction
In the HotelByte project, we realized that a single AI model cannot meet all scenario needs. Different tasks require different capabilities: code generation needs precision, natural language understanding needs context, long document processing needs large context windows, and cost control requires efficient inference speed.
This article will dive deep into how we integrated multiple AI models (DeepSeek, GLM4.6, KIMI2, LongCat, MiniMax M2, Qwen) into a complete toolchain, as well as our model selection strategy and cost optimization solutions.
Model Matrix
Supported Models
| Model | Provider | Main Advantages | Use Cases | Cost Level |
|---|---|---|---|---|
| DeepSeek | DeepSeek | Cost-effective, good Chinese | Regular code generation, simple queries | 💰 Low |
| GLM4.6 | Zhipu | Strong overall capabilities | Architecture design, complex logic | 💰💰 Medium |
| KIMI2 | Moonshot | Long context | Document analysis, code review | 💰💰 Medium |
| LongCat | Meituan | Multimodal, fast speed | Quick iteration, real-time coding | 💰💰 Medium |
| MiniMax M2 | MiniMax | Strong reasoning | Complex problem solving | 💰💰💰 High |
| Qwen | Alibaba Cloud | Strong scaling | Batch processing, large-scale tasks | 💰💰💰 High |
Model Selection Strategy
Scenario-Model Mapping
We defined model selection rules for different development scenarios:
1. Code Generation
Recommended Models: DeepSeek → GLM4.6 → LongCat
Selection Logic:
IF task == "simple code generation"
AND budget_limit == "strict"
THEN use DeepSeek
ELSE IF task == "medium complexity code"
AND quality > cost
THEN use GLM4.6
ELSE IF task == "real-time coding assistance"
AND fast_response_required
THEN use LongCat
ELSE IF task == "complex architecture design"
THEN use MiniMax M2
END
2. Code Review
Recommended Models: KIMI2 → GLM4.6
Selection Logic:
- Long code files need long context → KIMI2 (200K+ tokens)
- Medium files → GLM4.6
- Small files → DeepSeek
3. Documentation Analysis
Recommended Models: KIMI2 → Qwen
Selection Logic:
- Need to understand large API documentation → KIMI2
- Batch analyze multiple documents → Qwen
4. Test Generation
Recommended Models: GLM4.6 → LongCat
Selection Logic:
- Need to understand business logic → GLM4.6
- Need fast generation of many tests → LongCat
5. Troubleshooting
Recommended Models: MiniMax M2 → GLM4.6
Selection Logic:
- Complex problems need deep reasoning → MiniMax M2
- Regular problems → GLM4.6
Toolchain Integration
CCM (Claude Code Switch)
We use Claude Code Switch to manage multi-model switching.
Configuration File: .ccm_config
# CCM Configuration File
# Language setting
CCM_LANGUAGE=zh
# API Keys
DEEPSEEK_API_KEY=sk-your-deepseek-api-key
GLM_API_KEY=xxx
KIMI_API_KEY=xxx
LONGCAT_API_KEY=xxx
MINIMAX_API_KEY=xxx
QWEN_API_KEY=xxx
# Model Selection
DEEPSEEK_MODEL=deepseek-chat
GLM_MODEL=glm-4.6
KIMI_MODEL=kimi-k2-thinking
MINIMAX_MODEL=MiniMax-M2
Model Switching Commands
Manual Switching
# Switch to DeepSeek
ccm use deepseek
# Switch to GLM4.6
ccm use glm
# Switch to KIMI2
ccm use kimi
# View current model
ccm status
Cost Optimization Strategy
Cost Analysis
Model Cost Comparison (USD per million tokens)
| Model | Input | Output | Notes |
|---|---|---|---|
| DeepSeek | 0.14 | 0.28 | Cheapest |
| GLM4.6 | 0.50 | 1.00 | Good cost-performance |
| KIMI2 | 0.60 | 1.20 | Long context |
| LongCat | 0.40 | 0.80 | Fast speed |
| MiniMax M2 | 1.20 | 2.40 | Most expensive but powerful |
| Qwen | 0.80 | 1.60 | Scaling advantages |
Monthly Cost Distribution (Typical Month)
DeepSeek: $120 (40%) - Simple code generation, regular queries
GLM4.6: $75 (25%) - Architecture design, complex logic
KIMI2: $45 (15%) - Code review, document analysis
LongCat: $30 (10%) - Real-time coding
MiniMax M2: $20 (7%) - Complex problem solving
Qwen: $10 (3%) - Batch document processing
----------------------------------------
Total: $300
Optimization Strategies
1. Smart Routing
Automatically select optimal model based on task complexity and type.
2. Caching
Cache answers to common questions.
3. Batch Processing
Use Qwen for batch document processing.
4. Context Compression
Compress unnecessary context before sending to large models.
5. Hierarchical Models
Use small models for preprocessing, large models for refinement.
Cost Optimization Effects
| Optimization Measure | Cost Savings | Implementation Difficulty |
|---|---|---|
| Smart Routing | 40% | Medium |
| Caching | 25% | Low |
| Batch Processing | 15% | Low |
| Context Compression | 10% | High |
| Hierarchical Models | 5% | High |
| Total | 95% | - |
Actual monthly cost reduced from $600 to $300 (after optimization).
Performance Comparison
Response Time (Average)
| Task Type | DeepSeek | GLM4.6 | KIMI2 | LongCat | MiniMax M2 | Qwen |
|---|---|---|---|---|---|---|
| Simple code generation | 2s | 3s | 4s | 1s | 5s | 4s |
| Complex architecture design | 8s | 6s | 7s | 5s | 7s | 8s |
| Code review (large file) | 15s | 10s | 8s | 12s | 9s | 10s |
| Document analysis | 20s | 15s | 12s | 18s | 14s | 16s |
| Troubleshooting | 12s | 8s | 10s | 9s | 8s | 11s |
Code Quality (Human Review Score)
| Task Type | DeepSeek | GLM4.6 | KIMI2 | LongCat | MiniMax M2 | Qwen |
|---|---|---|---|---|---|---|
| Simple code generation | 7/10 | 8/10 | 8/10 | 7/10 | 9/10 | 8/10 |
| Complex architecture design | 6/10 | 9/10 | 8/10 | 7/10 | 9/10 | 8/10 |
| Code review | 6/10 | 8/10 | 8/10 | 7/10 | 9/10 | 8/10 |
| Test generation | 7/10 | 9/10 | 8/10 | 7/10 | 8/10 | 8/10 |
| Document generation | 6/10 | 7/10 | 8/10 | 7/10 | 8/10 | 9/10 |
Real-World Application Cases
Case 1: Supplier Onboarding Automation
Task: Automate onboarding of new supplier Netstorming
Timeline: 8 minutes (previously 2-3 days)
Total Cost: ~$2 (Human cost: $200)
Case 2: Performance Issue Troubleshooting
Task: Troubleshoot 3% timeout rate on booking API
Timeline: 5 minutes (previously 2-4 hours)
Total Cost: ~$1.5 (Human cost: $300)
Case 3: Batch API Documentation Migration
Task: Migrate API documentation for 50 suppliers to new format
Timeline: 1.5 hours (previously 2-3 weeks)
Total Cost: ~$50 (Human cost: $5,000)
Best Practices Summary
1. Model Selection Principles
- ✅ Simple tasks use cheap models (DeepSeek)
- ✅ Complex tasks use powerful models (MiniMax M2, GLM4.6)
- ✅ Long documents use long-context models (KIMI2)
- ✅ Real-time tasks use fast models (LongCat)
- ✅ Batch tasks use scalable models (Qwen)
2. Cost Control
- ✅ Use smart routing
- ✅ Enable caching
- ✅ Batch processing
- ✅ Compress context
- ✅ Hierarchical model usage
3. Quality Assurance
- ✅ Multi-model verification for critical tasks
- ✅ Use powerful models for code review
- ✅ Monitor test coverage
- ✅ Human review final results
4. Workflow Integration
- ✅ Integrate model selection into OpenSpec
- ✅ Define clear switching rules
- ✅ Record model usage
- ✅ Regularly evaluate and optimize
Series Navigation
- From DeepSeek Copy-Paste to Claude Code
- Deep Claude Code Integration
- Multi-Model and Toolchain Integration ✅ (This article)
- OpenSpec-Driven Development
- AI Coding Best Practices
Related Resources:
Comments