Multi-Model and Toolchain Integration

Multi-Model Integration

Introduction

In the HotelByte project, we realized that a single AI model cannot meet all scenario needs. Different tasks require different capabilities: code generation needs precision, natural language understanding needs context, long document processing needs large context windows, and cost control requires efficient inference speed.

This article will dive deep into how we integrated multiple AI models (DeepSeek, GLM4.6, KIMI2, LongCat, MiniMax M2, Qwen) into a complete toolchain, as well as our model selection strategy and cost optimization solutions.

Model Matrix

Supported Models

Model	Provider	Main Advantages	Use Cases	Cost Level
DeepSeek	DeepSeek	Cost-effective, good Chinese	Regular code generation, simple queries	💰 Low
GLM4.6	Zhipu	Strong overall capabilities	Architecture design, complex logic	💰💰 Medium
KIMI2	Moonshot	Long context	Document analysis, code review	💰💰 Medium
LongCat	Meituan	Multimodal, fast speed	Quick iteration, real-time coding	💰💰 Medium
MiniMax M2	MiniMax	Strong reasoning	Complex problem solving	💰💰💰 High
Qwen	Alibaba Cloud	Strong scaling	Batch processing, large-scale tasks	💰💰💰 High

Model Selection Strategy

Scenario-Model Mapping

We defined model selection rules for different development scenarios:

1. Code Generation

Recommended Models: DeepSeek → GLM4.6 → LongCat

Selection Logic:

IF task == "simple code generation"
    AND budget_limit == "strict"
    THEN use DeepSeek
ELSE IF task == "medium complexity code"
    AND quality > cost
    THEN use GLM4.6
ELSE IF task == "real-time coding assistance"
    AND fast_response_required
    THEN use LongCat
ELSE IF task == "complex architecture design"
    THEN use MiniMax M2
END

2. Code Review

Recommended Models: KIMI2 → GLM4.6

Selection Logic:

Long code files need long context → KIMI2 (200K+ tokens)
Medium files → GLM4.6
Small files → DeepSeek

3. Documentation Analysis

Recommended Models: KIMI2 → Qwen

Selection Logic:

Need to understand large API documentation → KIMI2
Batch analyze multiple documents → Qwen

4. Test Generation

Recommended Models: GLM4.6 → LongCat

Selection Logic:

Need to understand business logic → GLM4.6
Need fast generation of many tests → LongCat

5. Troubleshooting

Recommended Models: MiniMax M2 → GLM4.6

Selection Logic:

Complex problems need deep reasoning → MiniMax M2
Regular problems → GLM4.6

Toolchain Integration

CCM (Claude Code Switch)

We use Claude Code Switch to manage multi-model switching.

Configuration File: .ccm_config

# CCM Configuration File

# Language setting
CCM_LANGUAGE=zh

# API Keys
DEEPSEEK_API_KEY=sk-your-deepseek-api-key
GLM_API_KEY=xxx
KIMI_API_KEY=xxx
LONGCAT_API_KEY=xxx
MINIMAX_API_KEY=xxx
QWEN_API_KEY=xxx

# Model Selection
DEEPSEEK_MODEL=deepseek-chat
GLM_MODEL=glm-4.6
KIMI_MODEL=kimi-k2-thinking
MINIMAX_MODEL=MiniMax-M2

Model Switching Commands

Manual Switching

# Switch to DeepSeek
ccm use deepseek

# Switch to GLM4.6
ccm use glm

# Switch to KIMI2
ccm use kimi

# View current model
ccm status

Cost Optimization Strategy

Cost Analysis

Model Cost Comparison (USD per million tokens)

Model	Input	Output	Notes
DeepSeek	0.14	0.28	Cheapest
GLM4.6	0.50	1.00	Good cost-performance
KIMI2	0.60	1.20	Long context
LongCat	0.40	0.80	Fast speed
MiniMax M2	1.20	2.40	Most expensive but powerful
Qwen	0.80	1.60	Scaling advantages

Monthly Cost Distribution (Typical Month)

DeepSeek:     $120 (40%) - Simple code generation, regular queries
GLM4.6:       $75  (25%) - Architecture design, complex logic
KIMI2:        $45  (15%) - Code review, document analysis
LongCat:      $30  (10%) - Real-time coding
MiniMax M2:   $20  (7%)  - Complex problem solving
Qwen:         $10  (3%)  - Batch document processing
----------------------------------------
Total:        $300

Optimization Strategies

1. Smart Routing

Automatically select optimal model based on task complexity and type.

2. Caching

Cache answers to common questions.

3. Batch Processing

Use Qwen for batch document processing.

4. Context Compression

Compress unnecessary context before sending to large models.

5. Hierarchical Models

Use small models for preprocessing, large models for refinement.

Cost Optimization Effects

Optimization Measure	Cost Savings	Implementation Difficulty
Smart Routing	40%	Medium
Caching	25%	Low
Batch Processing	15%	Low
Context Compression	10%	High
Hierarchical Models	5%	High
Total	95%	-

Actual monthly cost reduced from $600 to $300 (after optimization).

Performance Comparison

Response Time (Average)

Task Type	DeepSeek	GLM4.6	KIMI2	LongCat	MiniMax M2	Qwen
Simple code generation	2s	3s	4s	1s	5s	4s
Complex architecture design	8s	6s	7s	5s	7s	8s
Code review (large file)	15s	10s	8s	12s	9s	10s
Document analysis	20s	15s	12s	18s	14s	16s
Troubleshooting	12s	8s	10s	9s	8s	11s

Code Quality (Human Review Score)

Task Type	DeepSeek	GLM4.6	KIMI2	LongCat	MiniMax M2	Qwen
Simple code generation	7/10	8/10	8/10	7/10	9/10	8/10
Complex architecture design	6/10	9/10	8/10	7/10	9/10	8/10
Code review	6/10	8/10	8/10	7/10	9/10	8/10
Test generation	7/10	9/10	8/10	7/10	8/10	8/10
Document generation	6/10	7/10	8/10	7/10	8/10	9/10

Real-World Application Cases

Case 1: Supplier Onboarding Automation

Task: Automate onboarding of new supplier Netstorming

Timeline: 8 minutes (previously 2-3 days)

Total Cost: ~$2 (Human cost: $200)

Case 2: Performance Issue Troubleshooting

Task: Troubleshoot 3% timeout rate on booking API

Timeline: 5 minutes (previously 2-4 hours)

Total Cost: ~$1.5 (Human cost: $300)

Case 3: Batch API Documentation Migration

Task: Migrate API documentation for 50 suppliers to new format

Timeline: 1.5 hours (previously 2-3 weeks)

Total Cost: ~$50 (Human cost: $5,000)

Best Practices Summary

1. Model Selection Principles

✅ Simple tasks use cheap models (DeepSeek)
✅ Complex tasks use powerful models (MiniMax M2, GLM4.6)
✅ Long documents use long-context models (KIMI2)
✅ Real-time tasks use fast models (LongCat)
✅ Batch tasks use scalable models (Qwen)

2. Cost Control

✅ Use smart routing
✅ Enable caching
✅ Batch processing
✅ Compress context
✅ Hierarchical model usage

3. Quality Assurance

✅ Multi-model verification for critical tasks
✅ Use powerful models for code review
✅ Monitor test coverage
✅ Human review final results

4. Workflow Integration

✅ Integrate model selection into OpenSpec
✅ Define clear switching rules
✅ Record model usage
✅ Regularly evaluate and optimize

Related Resources:

Multi-Model and Toolchain Integration

Introduction

Model Matrix

Supported Models

Model Selection Strategy

Scenario-Model Mapping

1. Code Generation

2. Code Review

3. Documentation Analysis

4. Test Generation

5. Troubleshooting

Toolchain Integration

CCM (Claude Code Switch)

Configuration File: .ccm_config

Model Switching Commands

Manual Switching

Cost Optimization Strategy

Cost Analysis

Model Cost Comparison (USD per million tokens)

Monthly Cost Distribution (Typical Month)

Optimization Strategies

1. Smart Routing

2. Caching

3. Batch Processing

4. Context Compression

5. Hierarchical Models

Cost Optimization Effects

Performance Comparison

Response Time (Average)

Code Quality (Human Review Score)

Real-World Application Cases

Case 1: Supplier Onboarding Automation

Case 2: Performance Issue Troubleshooting

Case 3: Batch API Documentation Migration

Best Practices Summary

1. Model Selection Principles

2. Cost Control

3. Quality Assurance

4. Workflow Integration

Series Navigation

Comments