AI Inference Templates
Define reusable, multi-message prompt files with variable declarations, system and user message blocks, and inline tool definitions. Reference them from any inference step with a single path.
Inline prompts work for simple cases. As prompts grow - multi-message conversations, tool definitions, reusable system instructions - move them into a template file. Reference it from any inference step with one line. Share it across workflows without duplication.
Referencing a template
Point prompt at a file path prefixed with file::
- name: extract
component: inference
vars:
provider: openai
model: gpt-4o
prompt: "file:internal/prompts/extract_entities.md"
prompt_vars:
document: "{{ fetch.content }}"
language: "{{ vars.lang | default:'en' }}"The engine loads the file, injects prompt_vars into the template expressions, and assembles the full message list before calling the model. The inference component behaves identically whether the prompt is inline or file-based.
Template file format
A prompt template is a Markdown file with three building blocks: a YAML front matter that declares variables, and any number of ---system, ---user, and ---tool section blocks.
---
variable_name: default_value
---
---system
System message content here.
---user
First user message.
---user
{{ variable_name }}Sections are processed in order. Multiple sections of the same type produce multiple messages of that type in the conversation sent to the model.
Front matter: variable declarations
The YAML front matter declares the variables the template accepts, with optional defaults:
---
content: ''
language: en
max_results: 5
---Variables with empty defaults are required - the engine will raise an error if they are missing from prompt_vars. Variables with non-empty defaults are optional; the default is used if the caller does not supply them.
Inside section bodies, reference variables with standard template expressions:
{{ content }}
{{ language | upper }}
{{ max_results | default:10 }}All Sintax modifiers work inside template files - default, upper, json, key, filter, and the rest.
Message sections
---system
The system message. Sets context, persona, and constraints for the model. There should typically be at most one, but the engine will concatenate multiple system sections if present.
---system
You are an expert data extraction assistant.
You only respond with valid JSON. No prose, no explanation.
Language: {{ language }}---user
A user message. Add as many as needed to build a few-shot conversation or a multi-turn prompt structure.
---user
Here are three examples of correct output:
Input: "Acme Corp acquired Globex for $2.4B"
Output: {"acquirer":"Acme Corp","target":"Globex","value":"$2.4B"}
---user
Now extract entities from this text:
{{ content }}The two ---user blocks produce two separate user messages in the conversation, in order. This is the standard pattern for few-shot prompting: examples in the first message, the actual input in the last.
Tool definitions
Add ---tool sections to define tools the model can call. Each section is a YAML tool definition - name, description, and a JSON Schema parameters block.
---tool
name: web_search
description: |
Search the web for up-to-date information.
Use this when the answer requires recent data not in your training.
parameters:
type: object
required: [query]
properties:
query:
type: string
description: The search query. Be specific and concise.Multiple ---tool sections define multiple tools. The model sees all of them and picks which to call based on tool_choice (set in the inference step vars, not in the template).
Constraining the output schema
Tools are also the standard way to force structured output when the provider doesn't support native JSON mode. Define a single tool whose parameters match the shape you want, then set tool_choice: specific in the step - the model is forced to call it, producing exactly that structure.
---tool
name: extract_keywords
description: |
Use this tool to return the extracted search keywords.
Always call this tool - do not reply with plain text.
parameters:
type: object
required: [context, language, queries]
properties:
context:
type: string
description: A focused summary of the search intent.
language:
type: string
description: ISO 639-1 language code.
queries:
type: array
items:
type: object
required: [keywords]
properties:
keywords:
type: string- name: parse_intent
component: inference
vars:
provider: openai
model: gpt-4o
prompt: "file:internal/prompts/extract_keywords.md"
tool_choice: specific
prompt_vars:
content: "{{ vars.user_message }}"The tool call result is returned in the tools output as { tool_name: raw_json } - ready for downstream steps to parse.
Complete example
This template takes a user message, generates targeted web search queries, and forces the output into a structured schema via tool use.
---
content: ''
---
---tool
name: web_search_keywords
description: |
Use this tool to generate web search queries based on the user's current request.
Extract the user's search intent and produce targeted queries.
If the user provides explicit URLs, return those instead of generating queries.
parameters:
type: object
required: [context, language]
properties:
context:
type: string
description: |
A focused summary of the current search intent. Include the specific question,
relevant constraints, and what kind of information would be useful.
3-8 sentences. Focus strictly on the current request.
language:
type: string
description: ISO 639-1 language code detected from the user's message.
queries:
type: array
description: 1-2 targeted search queries for the current topic.
items:
type: object
required: [keywords]
properties:
keywords:
type: string
description: Concise search query (2-6 words).
urls:
type: array
description: URLs explicitly provided by the user to read.
items:
type: object
required: [url]
properties:
url:
type: string
---user
Extract search intent from the user's current request.
Generate 1-2 targeted search queries. Prefer 1 unless the request has
distinctly different sub-topics. Each query must be 2-6 words.
If the user provided specific URLs, extract them into the `urls` array instead.
---user
{{ content }}Referenced from a workflow step:
- name: parse_intent
component: inference
vars:
provider: anthropic
model: claude-opus-4-20250514
prompt: "file:internal/skills/web_search_extract_keywords.md"
tool_choice: specific
prompt_vars:
content: "{{ vars.user_message }}"
- name: search
component: search:web
vars:
query: "{{ parse_intent.tools.web_search_keywords | from:'json' | key:'queries' | first | key:'keywords' }}"Tips
Keep templates focused. One template, one job. A template that handles ten different prompt strategies is harder to test and harder to reuse than ten single-purpose templates.
Use front matter defaults for optional context. Callers that don't need a field shouldn't have to supply it. Set sensible defaults so the template degrades gracefully.
Version templates like code. Templates live in your repo, get reviewed in PRs, and travel with the workflow that uses them. A changed template is a changed behaviour - treat it that way.
Put examples in a separate user block. The few-shot pattern (examples in the first ---user, actual input in the last) is easier to read and easier to update than mixing examples with instructions in a single block.
The prompt_vars map supports the same template expressions as workflow vars - step outputs, conditionals, modifier chains. The full templating system is available at the call site, not just inside the template file.
How is this guide?
AI Inference
Call any AI model from any workflow step. Multi-provider, tool use, extended thinking, dynamic prompts, and per-call cost tracking - all built in.
Accounting
Production-grade cost and performance tracking for every workflow step. Per-token pricing, multi-currency conversion, file and network I/O, step timing - all attached to every result automatically.