AI Inference
Call any AI model from any workflow step. Multi-provider, tool use, extended thinking, dynamic prompts, and per-call cost tracking - all built in.
The engine treats every AI model call as a first-class workflow step. Configure the provider, compose the prompt from prior step outputs, set thinking depth, wire up tools - and get a structured result with full token and cost accounting attached, every time.
inference
Call any configured model with a dynamic prompt. Tool use, extended thinking, and structured output built in.
inference:estimate
Tokenise a prompt and return cost in your currency - without calling the model.
inference:models
List every model available through the configured providers at runtime.
Multi-provider by design
Switch AI providers by changing one field. The engine routes calls, handles auth, and normalises responses - your workflow YAML stays identical.
# swap this one field to change everything
provider: anthropic # or: openai | ollama | openrouter
model: claude-opus-4-20250514| Provider | What you get |
|---|---|
| Anthropic | Claude family - Opus, Sonnet, Haiku. Extended thinking supported. |
| OpenAI | GPT-4o, o1, o3, and every other model in the OpenAI catalogue. |
| Ollama | Any model running locally. Zero egress, zero per-token cost. |
| OpenRouter | 100+ models from a single endpoint. One config, instant access to the whole market. |
Provider credentials and routing live in PipelineConfig - never in workflow YAML.
Basic call
- name: summarise
component: inference
vars:
provider: anthropic
model: claude-opus-4-20250514
prompt: |
Summarise the following article in three bullet points.
{{ fetch.content }}The content output holds the model's reply. Token counts and cost are attached automatically - no extra configuration needed.
Dynamic prompts
Prompts are template expressions. Every prior step output, every workflow variable, every upstream result is available.
- name: analyse
component: inference
vars:
provider: openai
model: gpt-4o
system: |
You are an expert analyst in {{ vars.domain }}.
Today's date: {{ vars.date }}.
Respond in {{ vars.language | default:'English' }}.
prompt: |
Customer segment: {{ segment.name }}
Data: {{ records.output | json }}
Identify the three most actionable insights.Point prompt at a file path instead of an inline string to load prompt templates from disk - useful for long system prompts shared across workflows.
Extended thinking
Enable step-by-step reasoning before the model answers. Thinking tokens are tracked and costed separately.
- name: reason
component: inference
vars:
provider: anthropic
model: claude-opus-4-20250514
think: true
think_level: 2 # 0 = off | 1 = basic | 2 = advanced
max_thinking_tokens: 8000
prompt: "{{ vars.complex_problem }}"think_level: 2 unlocks the full extended thinking budget. The model works through the problem internally; you get the polished answer in content. Thinking token cost is reported separately in CostInference.ThinkingTokens - see Accounting.
Tool use
Define tools and let the model decide when to call them. Four choice strategies give you fine-grained control.
- name: agent
component: inference
vars:
provider: openai
model: gpt-4o
tool_choice: auto
prompt: "Find the current exchange rate for EUR/USD and calculate the cost in EUR."
tools:
search_web:
description: Search the web for up-to-date information
parameters:
type: object
properties:
query:
type: string
required: [query]tool_choice | Behaviour |
|---|---|
auto | Model picks which tool to call, or answers directly |
any | Model must call at least one tool |
specific | Model must call the first tool in the list |
none | Tools are defined but the model cannot use them |
Tool call results are returned in the tools output as { tool_name: raw_json } - ready for a follow-up inference step or downstream processing.
Cost estimation before calling
Use inference:estimate to tokenise a prompt and calculate the expected cost before the model call. Gate expensive calls in loops or batch workflows.
- name: check_budget
component: inference:estimate
vars:
provider: anthropic
model: claude-opus-4-20250514
prompt: "{{ document.content }}"
currency: EUR
- name: summarise
component: inference
if: "{{ check_budget.cost_input_converted | lt '0.05' }}"
vars:
provider: anthropic
model: claude-opus-4-20250514
prompt: "Summarise this document: {{ document.content }}"
else:
- name: abort
component: error
vars:
message: "Estimated cost {{ check_budget.cost_input_converted }} exceeds budget"inference:estimate returns tokens_input, cost_input_usd, and cost_input_converted (in the currency you specify). Nothing is sent to the model - you get the number instantly.
Per-call cost tracking
Every inference call returns a SkillInferenceCost record attached to the component result. It breaks down token usage and cost by phase:
| Phase | Tokens | Provider cost | Client cost |
|---|---|---|---|
| Input | InputTokens | InputCostProvider | InputCostClient |
| Output | OutputTokens | OutputCostProvider | OutputCostClient |
| Thinking | ThinkingTokens | ThinkingCostProvider | ThinkingCostClient |
| Estimate | EstimateInputTokens | EstimateInputCostProvider | EstimateInputCostClient |
Provider cost is in the model's base currency (typically USD). Client cost is converted using the daily exchange rate for your configured currency. See Accounting for how to read these values from ComponentResult and ExposedComponentMetric.
Configuration reference
| Field | Type | Description |
|---|---|---|
provider | string | Provider name: anthropic, openai, ollama, openrouter |
model | string | Model identifier - provider-specific (e.g. claude-opus-4-20250514, gpt-4o) |
prompt | string | Prompt string or path to a prompt template file |
prompt_vars | map | Extra variables injected into the prompt template |
system | string | System prompt. Supports template expressions |
temperature | float32 | Sampling temperature (0–2). Lower = more deterministic |
top_p | float32 | Nucleus sampling threshold |
max_tokens | int | Maximum tokens in the completion |
think | bool | Enable extended thinking before answering |
think_level | int | Thinking depth: 0 off, 1 basic, 2 advanced |
max_thinking_tokens | int | Token budget for the thinking phase |
tools | map | Tool name → definition (description + JSON Schema parameters) |
tool_choice | string | Tool selection strategy: auto, any, specific, none |
currency | string | Currency code for cost conversion (e.g. EUR, GBP) |
Providers like Anthropic require max_tokens to be set explicitly. The engine will return a clear error if a required field is missing.
How is this guide?