Awee
Inference

AI Inference

Call any AI model from any workflow step. Multi-provider, tool use, extended thinking, dynamic prompts, and per-call cost tracking - all built in.

The engine treats every AI model call as a first-class workflow step. Configure the provider, compose the prompt from prior step outputs, set thinking depth, wire up tools - and get a structured result with full token and cost accounting attached, every time.


Multi-provider by design

Switch AI providers by changing one field. The engine routes calls, handles auth, and normalises responses - your workflow YAML stays identical.

# swap this one field to change everything
provider: anthropic   # or: openai | ollama | openrouter
model: claude-opus-4-20250514
ProviderWhat you get
AnthropicClaude family - Opus, Sonnet, Haiku. Extended thinking supported.
OpenAIGPT-4o, o1, o3, and every other model in the OpenAI catalogue.
OllamaAny model running locally. Zero egress, zero per-token cost.
OpenRouter100+ models from a single endpoint. One config, instant access to the whole market.

Provider credentials and routing live in PipelineConfig - never in workflow YAML.


Basic call

- name: summarise
  component: inference
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: |
      Summarise the following article in three bullet points.

      {{ fetch.content }}

The content output holds the model's reply. Token counts and cost are attached automatically - no extra configuration needed.


Dynamic prompts

Prompts are template expressions. Every prior step output, every workflow variable, every upstream result is available.

- name: analyse
  component: inference
  vars:
    provider: openai
    model: gpt-4o
    system: |
      You are an expert analyst in {{ vars.domain }}.
      Today's date: {{ vars.date }}.
      Respond in {{ vars.language | default:'English' }}.
    prompt: |
      Customer segment: {{ segment.name }}
      Data: {{ records.output | json }}

      Identify the three most actionable insights.

Point prompt at a file path instead of an inline string to load prompt templates from disk - useful for long system prompts shared across workflows.


Extended thinking

Enable step-by-step reasoning before the model answers. Thinking tokens are tracked and costed separately.

- name: reason
  component: inference
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    think: true
    think_level: 2          # 0 = off | 1 = basic | 2 = advanced
    max_thinking_tokens: 8000
    prompt: "{{ vars.complex_problem }}"

think_level: 2 unlocks the full extended thinking budget. The model works through the problem internally; you get the polished answer in content. Thinking token cost is reported separately in CostInference.ThinkingTokens - see Accounting.


Tool use

Define tools and let the model decide when to call them. Four choice strategies give you fine-grained control.

- name: agent
  component: inference
  vars:
    provider: openai
    model: gpt-4o
    tool_choice: auto
    prompt: "Find the current exchange rate for EUR/USD and calculate the cost in EUR."
    tools:
      search_web:
        description: Search the web for up-to-date information
        parameters:
          type: object
          properties:
            query:
              type: string
          required: [query]
tool_choiceBehaviour
autoModel picks which tool to call, or answers directly
anyModel must call at least one tool
specificModel must call the first tool in the list
noneTools are defined but the model cannot use them

Tool call results are returned in the tools output as { tool_name: raw_json } - ready for a follow-up inference step or downstream processing.


Cost estimation before calling

Use inference:estimate to tokenise a prompt and calculate the expected cost before the model call. Gate expensive calls in loops or batch workflows.

- name: check_budget
  component: inference:estimate
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: "{{ document.content }}"
    currency: EUR

- name: summarise
  component: inference
  if: "{{ check_budget.cost_input_converted | lt '0.05' }}"
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: "Summarise this document: {{ document.content }}"
  else:
    - name: abort
      component: error
      vars:
        message: "Estimated cost {{ check_budget.cost_input_converted }} exceeds budget"

inference:estimate returns tokens_input, cost_input_usd, and cost_input_converted (in the currency you specify). Nothing is sent to the model - you get the number instantly.


Per-call cost tracking

Every inference call returns a SkillInferenceCost record attached to the component result. It breaks down token usage and cost by phase:

PhaseTokensProvider costClient cost
InputInputTokensInputCostProviderInputCostClient
OutputOutputTokensOutputCostProviderOutputCostClient
ThinkingThinkingTokensThinkingCostProviderThinkingCostClient
EstimateEstimateInputTokensEstimateInputCostProviderEstimateInputCostClient

Provider cost is in the model's base currency (typically USD). Client cost is converted using the daily exchange rate for your configured currency. See Accounting for how to read these values from ComponentResult and ExposedComponentMetric.


Configuration reference

FieldTypeDescription
providerstringProvider name: anthropic, openai, ollama, openrouter
modelstringModel identifier - provider-specific (e.g. claude-opus-4-20250514, gpt-4o)
promptstringPrompt string or path to a prompt template file
prompt_varsmapExtra variables injected into the prompt template
systemstringSystem prompt. Supports template expressions
temperaturefloat32Sampling temperature (0–2). Lower = more deterministic
top_pfloat32Nucleus sampling threshold
max_tokensintMaximum tokens in the completion
thinkboolEnable extended thinking before answering
think_levelintThinking depth: 0 off, 1 basic, 2 advanced
max_thinking_tokensintToken budget for the thinking phase
toolsmapTool name → definition (description + JSON Schema parameters)
tool_choicestringTool selection strategy: auto, any, specific, none
currencystringCurrency code for cost conversion (e.g. EUR, GBP)

Providers like Anthropic require max_tokens to be set explicitly. The engine will return a clear error if a required field is missing.

How is this guide?

On this page