Awee
Inference

AI Inference Templates

Define reusable, multi-message prompt files with variable declarations, system and user message blocks, and inline tool definitions. Reference them from any inference step with a single path.

Inline prompts work for simple cases. As prompts grow - multi-message conversations, tool definitions, reusable system instructions - move them into a template file. Reference it from any inference step with one line. Share it across workflows without duplication.


Referencing a template

Point prompt at a file path prefixed with file::

- name: extract
  component: inference
  vars:
    provider: openai
    model: gpt-4o
    prompt: "file:internal/prompts/extract_entities.md"
    prompt_vars:
      document: "{{ fetch.content }}"
      language: "{{ vars.lang | default:'en' }}"

The engine loads the file, injects prompt_vars into the template expressions, and assembles the full message list before calling the model. The inference component behaves identically whether the prompt is inline or file-based.


Template file format

A prompt template is a Markdown file with three building blocks: a YAML front matter that declares variables, and any number of ---system, ---user, and ---tool section blocks.

---
variable_name: default_value
---

---system
System message content here.
---user
First user message.
---user
{{ variable_name }}

Sections are processed in order. Multiple sections of the same type produce multiple messages of that type in the conversation sent to the model.


Front matter: variable declarations

The YAML front matter declares the variables the template accepts, with optional defaults:

---
content: ''
language: en
max_results: 5
---

Variables with empty defaults are required - the engine will raise an error if they are missing from prompt_vars. Variables with non-empty defaults are optional; the default is used if the caller does not supply them.

Inside section bodies, reference variables with standard template expressions:

{{ content }}
{{ language | upper }}
{{ max_results | default:10 }}

All Sintax modifiers work inside template files - default, upper, json, key, filter, and the rest.


Message sections

---system

The system message. Sets context, persona, and constraints for the model. There should typically be at most one, but the engine will concatenate multiple system sections if present.

---system
You are an expert data extraction assistant.
You only respond with valid JSON. No prose, no explanation.
Language: {{ language }}

---user

A user message. Add as many as needed to build a few-shot conversation or a multi-turn prompt structure.

---user
Here are three examples of correct output:

Input: "Acme Corp acquired Globex for $2.4B"
Output: {"acquirer":"Acme Corp","target":"Globex","value":"$2.4B"}

---user
Now extract entities from this text:

{{ content }}

The two ---user blocks produce two separate user messages in the conversation, in order. This is the standard pattern for few-shot prompting: examples in the first message, the actual input in the last.


Tool definitions

Add ---tool sections to define tools the model can call. Each section is a YAML tool definition - name, description, and a JSON Schema parameters block.

---tool
name: web_search
description: |
  Search the web for up-to-date information.
  Use this when the answer requires recent data not in your training.
parameters:
  type: object
  required: [query]
  properties:
    query:
      type: string
      description: The search query. Be specific and concise.

Multiple ---tool sections define multiple tools. The model sees all of them and picks which to call based on tool_choice (set in the inference step vars, not in the template).

Constraining the output schema

Tools are also the standard way to force structured output when the provider doesn't support native JSON mode. Define a single tool whose parameters match the shape you want, then set tool_choice: specific in the step - the model is forced to call it, producing exactly that structure.

---tool
name: extract_keywords
description: |
  Use this tool to return the extracted search keywords.
  Always call this tool - do not reply with plain text.
parameters:
  type: object
  required: [context, language, queries]
  properties:
    context:
      type: string
      description: A focused summary of the search intent.
    language:
      type: string
      description: ISO 639-1 language code.
    queries:
      type: array
      items:
        type: object
        required: [keywords]
        properties:
          keywords:
            type: string
- name: parse_intent
  component: inference
  vars:
    provider: openai
    model: gpt-4o
    prompt: "file:internal/prompts/extract_keywords.md"
    tool_choice: specific
    prompt_vars:
      content: "{{ vars.user_message }}"

The tool call result is returned in the tools output as { tool_name: raw_json } - ready for downstream steps to parse.


Complete example

This template takes a user message, generates targeted web search queries, and forces the output into a structured schema via tool use.

---
content: ''
---

---tool

name: web_search_keywords
description: |
  Use this tool to generate web search queries based on the user's current request.
  Extract the user's search intent and produce targeted queries.
  If the user provides explicit URLs, return those instead of generating queries.
parameters:
  type: object
  required: [context, language]
  properties:
    context:
      type: string
      description: |
        A focused summary of the current search intent. Include the specific question,
        relevant constraints, and what kind of information would be useful.
        3-8 sentences. Focus strictly on the current request.
    language:
      type: string
      description: ISO 639-1 language code detected from the user's message.
    queries:
      type: array
      description: 1-2 targeted search queries for the current topic.
      items:
        type: object
        required: [keywords]
        properties:
          keywords:
            type: string
            description: Concise search query (2-6 words).
    urls:
      type: array
      description: URLs explicitly provided by the user to read.
      items:
        type: object
        required: [url]
        properties:
          url:
            type: string

---user
Extract search intent from the user's current request.

Generate 1-2 targeted search queries. Prefer 1 unless the request has
distinctly different sub-topics. Each query must be 2-6 words.

If the user provided specific URLs, extract them into the `urls` array instead.

---user
{{ content }}

Referenced from a workflow step:

- name: parse_intent
  component: inference
  vars:
    provider: anthropic
    model: claude-opus-4-20250514
    prompt: "file:internal/skills/web_search_extract_keywords.md"
    tool_choice: specific
    prompt_vars:
      content: "{{ vars.user_message }}"

- name: search
  component: search:web
  vars:
    query: "{{ parse_intent.tools.web_search_keywords | from:'json' | key:'queries' | first | key:'keywords' }}"

Tips

Keep templates focused. One template, one job. A template that handles ten different prompt strategies is harder to test and harder to reuse than ten single-purpose templates.

Use front matter defaults for optional context. Callers that don't need a field shouldn't have to supply it. Set sensible defaults so the template degrades gracefully.

Version templates like code. Templates live in your repo, get reviewed in PRs, and travel with the workflow that uses them. A changed template is a changed behaviour - treat it that way.

Put examples in a separate user block. The few-shot pattern (examples in the first ---user, actual input in the last) is easier to read and easier to update than mixing examples with instructions in a single block.

The prompt_vars map supports the same template expressions as workflow vars - step outputs, conditionals, modifier chains. The full templating system is available at the call site, not just inside the template file.

How is this guide?

On this page