AI Inference

Call any AI model from any workflow step. Multi-provider, tool use, extended thinking, dynamic prompts, and per-call cost tracking - all built in.

The engine treats every AI model call as a first-class workflow step. Configure the provider, compose the prompt from prior step outputs, set thinking depth, wire up tools - and get a structured result with full token and cost accounting attached, every time.

inference

Call any configured model with a dynamic prompt. Tool use, extended thinking, and structured output built in.

inference:estimate

Tokenise a prompt and return cost in your currency - without calling the model.

inference:models

List every model available through the configured providers at runtime.

Multi-provider by design

Switch AI providers by changing one field. The engine routes calls, handles auth, and normalises responses - your workflow YAML stays identical.

# swap this one field to change everything
provider: anthropic   # or: openai | ollama | openrouter
model: claude-opus-4-20250514

Provider	What you get
Anthropic	Claude family - Opus, Sonnet, Haiku. Extended thinking supported.
OpenAI	GPT-4o, o1, o3, and every other model in the OpenAI catalogue.
Ollama	Any model running locally. Zero egress, zero per-token cost.
OpenRouter	100+ models from a single endpoint. One config, instant access to the whole market.

Provider credentials and routing live in PipelineConfig - never in workflow YAML.

Basic call

- name: summarise
  component: inference
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: |
      Summarise the following article in three bullet points.

      {{ fetch.content }}

The content output holds the model's reply. Token counts and cost are attached automatically - no extra configuration needed.

Dynamic prompts

Prompts are template expressions. Every prior step output, every workflow variable, every upstream result is available.

- name: analyse
  component: inference
  vars:
    provider: openai
    model: gpt-4o
    system: |
      You are an expert analyst in {{ vars.domain }}.
      Today's date: {{ vars.date }}.
      Respond in {{ vars.language | default:'English' }}.
    prompt: |
      Customer segment: {{ segment.name }}
      Data: {{ records.output | json }}

      Identify the three most actionable insights.

Point prompt at a file path instead of an inline string to load prompt templates from disk - useful for long system prompts shared across workflows.

Extended thinking

Enable step-by-step reasoning before the model answers. Thinking tokens are tracked and costed separately.

- name: reason
  component: inference
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    think: true
    think_level: 2          # 0 = off | 1 = basic | 2 = advanced
    max_thinking_tokens: 8000
    prompt: "{{ vars.complex_problem }}"

think_level: 2 unlocks the full extended thinking budget. The model works through the problem internally; you get the polished answer in content. Thinking token cost is reported separately in CostInference.ThinkingTokens - see Accounting.

Tool use

Define tools and let the model decide when to call them. Four choice strategies give you fine-grained control.

- name: agent
  component: inference
  vars:
    provider: openai
    model: gpt-4o
    tool_choice: auto
    prompt: "Find the current exchange rate for EUR/USD and calculate the cost in EUR."
    tools:
      search_web:
        description: Search the web for up-to-date information
        parameters:
          type: object
          properties:
            query:
              type: string
          required: [query]

`tool_choice`	Behaviour
`auto`	Model picks which tool to call, or answers directly
`any`	Model must call at least one tool
`specific`	Model must call the first tool in the list
`none`	Tools are defined but the model cannot use them

Tool call results are returned in the tools output as { tool_name: raw_json } - ready for a follow-up inference step or downstream processing.

Cost estimation before calling

Use inference:estimate to tokenise a prompt and calculate the expected cost before the model call. Gate expensive calls in loops or batch workflows.

- name: check_budget
  component: inference:estimate
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: "{{ document.content }}"
    currency: EUR

- name: summarise
  component: inference
  if: "{{ check_budget.cost_input_converted | lt '0.05' }}"
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: "Summarise this document: {{ document.content }}"
  else:
    - name: abort
      component: error
      vars:
        message: "Estimated cost {{ check_budget.cost_input_converted }} exceeds budget"

inference:estimate returns tokens_input, cost_input_usd, and cost_input_converted (in the currency you specify). Nothing is sent to the model - you get the number instantly.

Per-call cost tracking

Every inference call returns a SkillInferenceCost record attached to the component result. It breaks down token usage and cost by phase:

Phase	Tokens	Provider cost	Client cost
Input	`InputTokens`	`InputCostProvider`	`InputCostClient`
Output	`OutputTokens`	`OutputCostProvider`	`OutputCostClient`
Thinking	`ThinkingTokens`	`ThinkingCostProvider`	`ThinkingCostClient`
Estimate	`EstimateInputTokens`	`EstimateInputCostProvider`	`EstimateInputCostClient`

Provider cost is in the model's base currency (typically USD). Client cost is converted using the daily exchange rate for your configured currency. See Accounting for how to read these values from ComponentResult and ExposedComponentMetric.

Configuration reference

Field	Type	Description
`provider`	string	Provider name: `anthropic`, `openai`, `ollama`, `openrouter`
`model`	string	Model identifier - provider-specific (e.g. `claude-opus-4-20250514`, `gpt-4o`)
`prompt`	string	Prompt string or path to a prompt template file
`prompt_vars`	map	Extra variables injected into the prompt template
`system`	string	System prompt. Supports template expressions
`temperature`	float32	Sampling temperature (0–2). Lower = more deterministic
`top_p`	float32	Nucleus sampling threshold
`max_tokens`	int	Maximum tokens in the completion
`think`	bool	Enable extended thinking before answering
`think_level`	int	Thinking depth: `0` off, `1` basic, `2` advanced
`max_thinking_tokens`	int	Token budget for the thinking phase
`tools`	map	Tool name → definition (description + JSON Schema parameters)
`tool_choice`	string	Tool selection strategy: `auto`, `any`, `specific`, `none`
`currency`	string	Currency code for cost conversion (e.g. `EUR`, `GBP`)

Providers like Anthropic require max_tokens to be set explicitly. The engine will return a clear error if a required field is missing.

How is this guide?

AI Inference

inference

inference:estimate

inference:models

On this page