Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Customize ClickHouse Assistant chat with a semantic layer
The ClickHouse Assistant chat agent can be customized to understand your specific business logic, data structures, and domain knowledge through AGENTS.md—a special saved query that acts as a semantic layer over the agent’s system prompt.
By creating an AGENTS.md file, you can provide custom instructions that are injected at the start of every conversation to guide SQL query generation and data analysis based on your organization’s unique requirements, calculations, and conventions.
How it works
When you save a query named “AGENTS.md” (case-sensitive) in the Cloud Console:
- The ClickHouse Assistant chat agent automatically loads this file when a message is sent
- The content is placed within a structured content tag and injected into the agent’s system prompt
- The instructions are applied to all ClickHouse Assistant chat conversations in that service
Creating AGENTS.md
Create the saved query
- In the Cloud Console, create a new query
- Name it exactly: “AGENTS.md” (case-sensitive)
- Write your custom instructions in the query text editor (not actual SQL)
- Save the query
Add your instructions
Structure your instructions using clear, actionable language. Include:
- Business rules and calculations
- Data structure guidance
- Domain-specific terminology
- Common query patterns
- Performance optimization rules
Best practices
Treat context as a finite resource
Context is precious—every token depletes the agent’s “attention budget.” Like humans with limited working memory, language models experience performance degradation as context grows. This means finding the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.
Find the right altitude
Strike a balance between two extremes:
- Too specific: Hard coding brittle if-else logic that creates fragility and maintenance complexity
- Too vague: High-level guidance that fails to give concrete signals or falsely assumes shared context
The optimal altitude is specific enough to guide behavior effectively, yet flexible enough for the model to apply strong heuristics. Start with a minimal prompt on the best model available, then add clear instructions based on observed failure modes.
Organize with structured sections
Use XML tags or Markdown headers to create distinct, scannable sections:
<background_information>
Context about your data and domain
</background_information>
<calculation_rules>
Specific formulas and business logic
</calculation_rules>
<tool_guidance>
How to use specific ClickHouse features
</tool_guidance>
Provide diverse, canonical examples
Examples are the “pictures worth a thousand words.” Rather than stuffing every edge case into your prompt, curate a focused set of diverse examples that effectively portray expected behavior.
Keep it minimal yet complete
- Include only frequently-needed instructions
- Be concise—larger context degrades performance due to “context rot”
- Remove outdated or rarely-used rules
- Ensure sufficient information to guide desired behavior
Minimal doesn’t necessarily mean short. You need enough detail to ensure the agent adheres to expected behavior, just avoid unnecessary verbosity.
Example: Calculated Metrics from raw data
Guide the agent when metrics require specific calculations rather than direct column access:
<metric_calculations>
IMPORTANT: "active_sessions" is NOT a column. It must be calculated.
To calculate active sessions:
COUNT(DISTINCT session_id || '|' || user_id) AS active_sessions
This counts unique combinations of session and user identifiers.
When the user asks for "active sessions" or "session count", always use this formula:
SELECT
date,
COUNT(DISTINCT session_id || '|' || user_id) AS active_sessions
FROM events
GROUP BY date;
</metric_calculations>
Example: Business logic rules
Define domain-specific calculations and categorizations:
<business_rules>
Revenue Calculation:
- Exclude refunded transactions: WHERE transaction_status != 'refunded'
- Apply regional tax rates using CASE expressions
- Use MRR for subscriptions:
SUM(CASE
WHEN billing_cycle = 'monthly' THEN amount
WHEN billing_cycle = 'yearly' THEN amount / 12
ELSE 0
END) AS mrr
Traffic Source Classification:
Use CASE expression to categorize:
CASE
WHEN traffic_source IN ('google', 'bing', 'organic') THEN 'Organic Search'
WHEN traffic_source IN ('facebook', 'instagram', 'social') THEN 'Social Media'
WHEN traffic_source = 'direct' THEN 'Direct'
ELSE 'Other'
END AS source_category
Customer Segmentation:
- Enterprise: annual_contract_value >= 100000
- Mid-Market: annual_contract_value >= 10000 AND annual_contract_value < 100000
- SMB: annual_contract_value < 10000
Always include these categorizations when generating traffic or revenue reports.
</business_rules>
Example: Data structure quirks
Document unconventional data formats or legacy schema decisions:
<data_structure_notes>
The user_status column uses numeric codes, not strings:
- 1 = 'active'
- 2 = 'inactive'
- 3 = 'suspended'
- 99 = 'deleted'
When filtering or displaying user status, always use:
CASE user_status
WHEN 1 THEN 'active'
WHEN 2 THEN 'inactive'
WHEN 3 THEN 'suspended'
WHEN 99 THEN 'deleted'
END AS status_label
The product_metadata column contains JSON strings that must be parsed:
SELECT
product_id,
JSONExtractString(product_metadata, 'category') AS category,
JSONExtractInt(product_metadata, 'inventory_count') AS inventory
FROM products;
</data_structure_notes>
Example: domain terminology
Map business terms to technical implementation:
<terminology>
When users refer to "conversions", they mean:
- For e-commerce: transactions WHERE transaction_type = 'purchase'
- For SaaS: subscriptions WHERE subscription_status = 'active' AND first_payment_date IS NOT NULL
"Churn" is calculated as:
COUNT(DISTINCT user_id) WHERE last_active_date < today() - INTERVAL 90 DAY
AND previous_subscription_status = 'active'
"DAU" (Daily Active Users) means:
COUNT(DISTINCT user_id) WHERE activity_date = today()
"Qualified leads" must meet ALL criteria:
- lead_score >= 70
- company_size >= 50
- budget_confirmed = true
- contact_role IN ('Director', 'VP', 'C-Level')
</terminology>