---
name: skill-builder
description: This skill should be used when the user asks to "create a new skill", "build a skill", "make a skill for", "scaffold a skill", "validate my skill", "check skill structure", or needs help setting up opencode skills with proper structure, YAML frontmatter, and validation.
license: MIT
compatibility: opencode
metadata:
  category: development
  version: "1.0.0"
  author: user
---

# Skill Builder

Rapidly scaffold, validate, and create well-structured opencode skills. This skill provides scripts and templates to ensure skills follow best practices and pass strict validation.

## Quick Start

Create a new skill:

```bash
# Basic skill
~/.config/opencode/skills/skill-builder/scripts/init-skill.sh my-skill

# Full skill with all resources
~/.config/opencode/skills/skill-builder/scripts/init-skill.sh my-skill --full

# Local skill in current project
~/.config/opencode/skills/skill-builder/scripts/init-skill.sh my-skill --local --full
```

Validate a skill:

```bash
~/.config/opencode/skills/skill-builder/scripts/validate-skill.sh ~/.config/opencode/skills/my-skill
```

## Skill Creation Workflow

Follow this process for consistent, high-quality skills:

### Step 1: Plan the Skill

Before creating, answer these questions:

#### Core Intent Questions

1. **What does it do?** - Core functionality in 1-2 sentences
2. **When is it used?** - Specific trigger phrases users will say
3. **What resources are needed?** - Scripts, references, or assets
4. **What output format is expected?** - How should the model respond?
5. **Should we set up test cases?** - For objectively verifiable outputs (file transforms, data extraction, code generation)

#### Interview and Research

Proactively ask questions about edge cases, input/output formats, example files, success criteria, and dependencies:

- What are common edge cases or error scenarios?
- What input formats will users provide?
- What output format does the user expect?
- Are there example files or data I can reference?
- What constitutes success for this skill?
- Are there any dependencies or prerequisites?

### Step 2: Scaffold the Skill

Use the init script with appropriate flags:

```bash
# Simple knowledge skill (no resources)
init-skill.sh docker-commands

# Skill with scripts for automation
init-skill.sh docker-helper --with-scripts

# Skill with detailed documentation
init-skill.sh docker-guide --with-refs

# Complete skill with all resources
init-skill.sh docker-toolkit --full
```

### Step 3: Edit SKILL.md

Fill in the critical fields:

**Frontmatter (REQUIRED):**
- `name`: Must match directory name, lowercase with hyphens
- `description`: 20-1024 chars, include trigger phrases in quotes

**Body (REQUIRED):**
- Brief description of purpose
- Common tasks with examples
- Important notes or caveats

### Step 4: Validate Strictly

Always validate before use:

```bash
validate-skill.sh ~/.config/opencode/skills/my-skill
```

This checks:
- ✓ Name matches directory
- ✓ Description is 20-1024 characters
- ✓ YAML frontmatter is valid
- ✓ No second-person writing
- ✓ Referenced files exist
- ✓ Scripts are executable

### Step 5: Test and Iterate

Test the skill by asking opencode to use it:

```
"Use my-skill to..."
"Help me with [skill trigger phrase]"
```

## Progressive Disclosure

Organize content efficiently to minimize context usage:

### What Goes in SKILL.md Body

- Core purpose and overview
- Quick start examples
- Most common use cases
- Pointers to detailed resources

**Keep under 500 lines, ideally 1,500-2,000 words**

### What Goes in references/

- Detailed documentation
- Advanced techniques
- API references
- Troubleshooting guides
- Long examples

**Loaded only when referenced**

### What Goes in scripts/

- Executable utilities
- Validation tools
- Automation helpers
- Complex operations

**Executed without loading into context**

### What Goes in assets/

- Template files
- Configuration examples
- Boilerplate code
- Binary resources

**Used in output, not loaded into context**

## Testing and Evaluation

To ensure your skill works correctly, create test cases and iterate based on results.

### When to Create Test Cases

Skills with objectively verifiable outputs benefit from test cases:
- File transforms (converting formats)
- Data extraction (parsing, filtering)
- Code generation (scaffolding, templates)
- Fixed workflow steps (validation, checks)

Skills with subjective outputs often don't need formal test cases:
- Writing style guidance
- Art or design skills
- General advice

### Creating Test Prompts

After writing the skill draft, create 2-3 realistic test prompts - the kind of thing a real user would actually say:

1. **Common case:** The typical scenario
2. **Edge case:** Something tricky or unusual
3. **Varied phrasing:** Same intent, different words

Example test prompts for a Docker helper skill:
```
1. "Show me all running containers and their resource usage"
2. "My container keeps crashing, how do I debug it?"
3. "clean up all the stopped containers and unused images"
```

### The Iteration Loop

```
Draft → Test → Review → Improve → Repeat
```

1. **Draft the skill** - Initial SKILL.md with basic structure
2. **Test it** - Try the skill with realistic prompts
3. **Review outputs** - Check if outputs match expectations
4. **Improve** - Based on what didn't work well
5. **Repeat** - Until outputs are satisfactory

**When to stop iterating:**
- The user says they're happy
- Test case outputs meet expectations
- You're not making meaningful progress

### Generalizing from Feedback

When improving the skill based on test results:

1. **Don't overfit to test cases** - The skill should work for similar prompts, not just the exact test cases
2. **Explain the "why"** - Instead of rigid "ALWAYS" or "NEVER" rules, explain the reasoning so the model understands
3. **Look for patterns** - If multiple tests have similar issues, fix the root cause, not individual symptoms
4. **Keep it lean** - Remove instructions that aren't helping; add clarity where needed

## Writing Patterns and Best Practices

### Defining Output Formats

Explicitly specify output formats when consistency matters:

```markdown
## Output Format

ALWAYS use this exact template:

# [Title]
## Overview
## Steps
## Examples
## Notes
```

### Examples Pattern

Include clear examples in skills. Format them consistently:

```markdown
## Example: Task Name

**Input:** User's request or input data
**Output:** Expected result or response

**Example 1:**
Input: "Create a Python function to reverse a string"
Output: 
```python
def reverse_string(s):
    return s[::-1]
```
```

### Description Writing

The description determines when the skill triggers. Make it:

- **Specific** - Include concrete trigger phrases
- **Comprehensive** - Cover different ways users might ask
- **"Pushy"** - Encourage use when relevant (combat undertriggering)

```yaml
# ❌ Too vague
name: dashboard-helper
description: Helps create dashboards

# ❌ Still vague
name: dashboard-helper
description: How to build dashboards to display data

# ✅ Good - specific and "pushy"
name: dashboard-helper
description: How to build dashboards to display internal data. Make sure to use this skill whenever the user mentions dashboards, data visualization, internal metrics, or wants to display any kind of data, even if they don't explicitly ask for a "dashboard."
```

### Using Imperative Form

Prefer imperative form in instructions:

```markdown
# ❌ Don't use second person
"You should run this command first."
"You need to check the configuration."

# ✅ Use imperative form
"Run this command first."
"Check the configuration for errors."
```

### Explaining the Why

Explain reasoning instead of just dictating rules:

```markdown
# ❌ Rigid rule without context
ALWAYS validate JSON before parsing.

# ✅ Explained reasoning
Validate JSON before parsing. Unvalidated JSON can cause cryptic errors that are hard to debug. The validator catches syntax errors and provides helpful error messages with line numbers.
```

## Common Mistakes to Avoid

### ❌ Bad: Vague Description

```yaml
description: Helps with Docker tasks
```

### ✅ Good: Specific with Triggers

```yaml
description: This skill should be used when the user asks to "create a docker container", "build a docker image", "manage docker compose", or work with Docker workflows
```

### ❌ Bad: Second Person in Body

❌ **Don't write:** "You should run this command first."
❌ **Don't write:** "You need to check the logs."

### ✅ Good: Imperative Form

✅ **Do write:** "Run this command first."
✅ **Do write:** "Check the logs for errors."

### ❌ Bad: Everything in SKILL.md

```markdown
SKILL.md (8,000 words - all content)
```

### ✅ Good: Progressive Disclosure

```markdown
SKILL.md (1,800 words - essentials)
references/
  advanced.md (2,500 words)
  troubleshooting.md (1,200 words)
```

## Resources

### Templates

See `references/skill-templates.md` for copy-paste templates:
- Simple skill (knowledge only)
- Tool integration skill (with scripts)
- Workflow skill (multi-step)
- Documentation skill (heavy references)

### Helper Scripts

- `scripts/init-skill.sh` - Scaffold new skills
- `scripts/validate-skill.sh` - Strict validation

## Troubleshooting

### Validation Fails: Name Mismatch

**Error:** `'name' (my-skill) does not match directory name (my_skill)`

**Fix:** Ensure the `name` field in SKILL.md matches the directory name exactly, including hyphens vs underscores.

### Validation Fails: Description Too Short

**Error:** `'description' must be at least 20 characters`

**Fix:** Add more detail and specific trigger phrases. Example:
- Bad: "Docker helper"
- Good: "This skill should be used when the user asks to 'create a docker container', 'build an image', or manage Docker workflows"

### Skill Not Loading

1. Check SKILL.md filename is ALL CAPS
2. Verify YAML frontmatter starts with `---`
3. Ensure `name` and `description` fields exist
4. Run validation to catch issues

### Permission Denied on Scripts

**Fix:** Make scripts executable:

```bash
chmod +x ~/.config/opencode/skills/my-skill/scripts/*.sh
```

## Integration with Projects

When using `--local` flag, the script:

1. Creates skill in `.opencode/skills/` (project-local)
2. Checks for existing `AGENTS.md` in project root
3. Suggests creating one if missing

**Benefits of local skills:**
- Version controlled with project
- Team-shared
- Project-specific workflows

**Benefits of global skills:**
- Available across all projects
- Personal productivity tools
- Common utilities

## Description Optimization

The description field is the primary mechanism that determines whether a skill is used. After creating a skill, optimize the description for better triggering accuracy.

### What Makes a Good Description

1. **Comprehensive trigger coverage**
   - Include 3-5 specific trigger phrases in quotes
   - Cover different ways users might phrase the same request
   - Include both formal and casual phrasing

2. **"Pushy" encouragement**
   - Models tend to "undertrigger" skills (not use them when they should)
   - Encourage use with phrases like "Make sure to use this skill whenever..."

3. **Clear scope definition**
   - What the skill does
   - When to use it (specific contexts)
   - What it handles

### Good vs Bad Descriptions

```yaml
# ❌ Too vague - won't trigger reliably
description: Docker helper

# ❌ Passive - doesn't encourage use
description: Helps with Docker container management and image building

# ✅ Good - specific triggers and encouraging
description: This skill should be used when the user asks to "create a docker container", "build a docker image", "manage docker compose", or work with Docker workflows. Make sure to use this skill for any Docker-related tasks, container operations, or deployment configurations.
```

### Testing Description Effectiveness

After creating a skill:

1. Try various trigger phrases from the description
2. Note which ones activate the skill
3. If important phrases don't trigger, add them explicitly
4. Test edge cases that are similar but should NOT trigger the skill

### Iterative Refinement

As you test the skill:

1. **Note missed opportunities** - When should the skill have triggered but didn't?
2. **Add those triggers** - Update the description with missing phrases
3. **Test again** - Verify the new triggers work
4. **Watch for overtriggering** - If it triggers when it shouldn't, clarify the scope

## Validation Rules (Strict Mode)

The validator enforces these rules:

1. **Structure**
   - SKILL.md must exist
   - Must start with YAML frontmatter (`---`)
   - Body content required after frontmatter

2. **Frontmatter**
   - `name`: lowercase alphanumeric with hyphens, matches directory
   - `description`: 20-1024 characters
   - No empty fields

3. **Writing Style**
   - No second person ("You should...")
   - No "When to Use This Skill" section in body
   - Description should include quoted trigger phrases

4. **Resources**
   - Referenced files must exist
   - Scripts must be executable
   - No README.md or CHANGELOG.md files

5. **Warnings (Strict Mode = Errors)**
   - Vague descriptions
   - Missing trigger phrases
   - Unreferenced resource files
   - Unexecutable scripts

## Best Practices

✅ **DO:**
- Use specific trigger phrases in description
- Keep SKILL.md lean, move details to references/
- Test scripts before including
- Validate after every change
- Use imperative form in body
- Include working code examples
- Create test cases for objectively verifiable skills
- Iterate based on test results
- Explain the "why" behind instructions (not just rigid rules)
- Generalize improvements from test feedback (don't overfit to specific test cases)
- Make descriptions "pushy" - encourage use when relevant
- Ask questions about edge cases and requirements before building

❌ **DON'T:**
- Put everything in one file
- Use second person writing
- Create auxiliary documentation files
- Skip validation
- Leave placeholder content
- Forget to make scripts executable
- Skip testing and iteration
- Overfit skills to specific test examples

## Test Case Structure

Create test cases to verify your skill works correctly across different scenarios.

### When to Create Test Cases

Skills with objectively verifiable outputs benefit from formal test cases:
- File transforms (converting formats)
- Data extraction (parsing, filtering)
- Code generation (scaffolding, templates)
- Fixed workflow steps (validation, checks)

Skills with subjective outputs often do not need formal test cases:
- Writing style guidance
- Art or design skills
- General advice and best practices

### Test Case Format

Store test cases in `evals/evals.json`:

```json
{
  "skill_name": "my-skill",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "prompt": "Realistic user request with context",
      "expected_output": "Description of expected result",
      "assertions": [
        "Output file exists at expected location",
        "File contains specific content or pattern"
      ]
    }
  ]
}
```

### Required Fields

- **id**: Unique number for each test case
- **name**: Descriptive identifier (e.g., "common-case", "edge-case-with-large-file")
- **prompt**: The exact user request to test
- **expected_output**: Description of what should happen

### Optional Fields

- **assertions**: List of specific checks (text descriptions for manual verification)

### Creating Test Cases

Use the helper script to scaffold test cases:

```bash
# Interactive test case creation
~/.config/opencode/skills/skill-builder/scripts/create-tests.sh ~/.config/opencode/skills/my-skill
```

Or create `evals/evals.json` manually following the template in `references/eval-templates.md`.

### Required Test Case Types

Every skill should have at least 3 test cases:

1. **Common Case**: The typical, straightforward scenario
2. **Edge Case**: Something tricky, unusual, or error-prone
3. **Varied Phrasing**: Same intent as case 1, but expressed differently

This ensures the skill works reliably across different ways users might ask for help.


## Writing Effective Test Prompts

Good test prompts are realistic and include context that real users would provide.

### Good Test Prompt Example

```
ok so my boss just sent me this xlsx file (its in my downloads, called something like 'Q4 sales final FINAL v2.xlsx') and she wants me to add a column that shows the profit margin as a percentage. The revenue is in column C and costs are in column D i think
```

**Why this is good:**
- Includes file path and realistic filename
- Mentions personal context ("my boss sent me")
- Has specific column references
- Uses casual, natural language
- Contains realistic uncertainty ("i think")

### Bad Test Prompt Example

```
Format this data
```

**Why this is bad:**
- Too vague - no context about what "format" means
- No file path or data type mentioned
- Does not test real-world usage
- Too short to trigger complex skills

### Test Prompt Guidelines

**Do:**
- Include file paths when relevant (`~/Downloads/`, `./src/`)
- Add personal context (job role, situation, urgency)
- Use column names, field names, or specific values
- Mix formal and casual language
- Include realistic typos or abbreviations
- Vary the length (some short, some detailed)

**Do not:**
- Use generic requests without specifics
- Make all prompts the same length
- Use only perfect grammar
- Test only obvious, clear-cut cases

### Examples by Skill Type

**File Transform Skill:**
- Good: "Convert the CSV at ./data/raw_export.csv to JSON and save it as ./output/clean_data.json. Make sure dates are in ISO format."
- Bad: "Convert CSV to JSON"

**Code Generation Skill:**
- Good: "Create a Python script that recursively finds all .log files in /var/log, compresses them if they're older than 30 days, and moves them to /archive. Handle permission errors gracefully."
- Bad: "Write a Python script"

**Workflow Skill:**
- Good: "I need to deploy my Node.js app to production. The repo is at ~/projects/myapp, it uses Docker, and I need to make sure the database migrations run before the new version starts. Also need to tag the release."
- Bad: "Deploy my app"


## The Testing Workflow

Follow this iterative process to improve your skill:

### The Iteration Loop

```
Draft → Test → Grade → Improve → Repeat
```

### Step 1: Draft the Skill

Create the initial SKILL.md with:
- Clear description with trigger phrases
- Core instructions in the body
- Any needed scripts or references

### Step 2: Create Test Cases

Write 3 test prompts following the guidelines above:
1. Common case (typical usage)
2. Edge case (tricky scenario)
3. Varied phrasing (different words, same intent)

Save them in `evals/evals.json`.

### Step 3: Run Tests

Execute the test workflow:

```bash
# Run through all test cases interactively
~/.config/opencode/skills/skill-builder/scripts/run-tests.sh ~/.config/opencode/skills/my-skill
```

This will:
1. Display each test prompt
2. Ask you to test it manually in opencode
3. Record your evaluation (pass/fail/skip)
4. Capture notes about issues
5. Save results to `evals/test-results.json`

### Step 4: Grade Outputs

Evaluate the skill's performance:

**Use the grading helper:**
```bash
~/.config/opencode/skills/skill-builder/scripts/grade-output.sh ~/.config/opencode/skills/my-skill
```

**Or grade manually by reviewing:**
- Did the output match expectations?
- Were there errors or omissions?
- Did the skill trigger appropriately?
- Was the output format correct?

See `references/grading-guide.md` for detailed evaluation criteria.

### Step 5: Improve the Skill

Based on test results, update SKILL.md:

**If specific issues found:**
- Add clarifying instructions
- Include examples for edge cases
- Fix incorrect guidance

**If patterns emerge across tests:**
- Generalize the solution
- Add helper scripts for common tasks
- Explain the "why" behind instructions

**General improvement principles:**
- Do not overfit to exact test wording
- Explain reasoning, not just rules
- Remove instructions that are not helping
- Add clarity where needed
- Extract repeated work into scripts/

### Step 6: Repeat

Run the tests again with the improved skill:

```bash
# Run tests again
~/.config/opencode/skills/skill-builder/scripts/run-tests.sh ~/.config/opencode/skills/my-skill
```

Compare results with previous runs and continue iterating.

### When to Stop Iterating

Stop when any of these are true:

1. **User is satisfied** - The skill meets their needs
2. **Outputs meet expectations** - All test cases pass consistently
3. **No meaningful progress** - Iterations are not improving results

Avoid endless tweaking. A skill that works well for 90% of cases is better than one still being perfected.


## Grading Guidelines

Evaluate skill outputs systematically to identify improvements.

### Evaluation Checklist

For each test case, assess:

**Correctness:**
- [ ] Output matches expected result
- [ ] No factual errors
- [ ] Logic is sound
- [ ] Edge cases handled appropriately

**Completeness:**
- [ ] All requested tasks completed
- [ ] No steps skipped
- [ ] Appropriate level of detail
- [ ] Relevant context included

**Format:**
- [ ] Output follows specified format (if any)
- [ ] Consistent with examples in skill
- [ ] Easy to read and understand

**Triggering:**
- [ ] Skill activated when appropriate
- [ ] Did not activate when inappropriate

**Efficiency:**
- [ ] No unnecessary steps
- [ ] Reasonable token usage
- [ ] Not overly verbose

### Identifying Patterns

Look for patterns across multiple test cases:

**Single test failure:**
- Likely a specific edge case
- Add targeted instruction or example
- Fix the specific issue

**Multiple test failures (same issue):**
- Indicates systemic problem
- Fix the root cause, not symptoms
- Consider adding a script to handle this

**Multiple test failures (different issues):**
- Skill may be too broad or unclear
- Clarify scope in description
- Add more specific instructions

### Decision Framework

**Fix the specific case when:**
- Unique edge case not covered
- One-time issue unlikely to recur
- Fix is simple and does not add complexity

**Generalize the solution when:**
- Same issue appears in 2+ tests
- Pattern suggests broader applicability
- Fix would benefit similar future requests

**Extract to script when:**
- Same multi-step process repeated
- Deterministic operation (not creative)
- Would save time on every invocation

### Grading Output Format

When grading, record:

```json
{
  "test_id": 1,
  "result": "pass|fail|partial",
  "issues": [
    "Description of issue 1",
    "Description of issue 2"
  ],
  "suggested_fix": "Brief description of improvement",
  "extract_script": false,
  "priority": "high|medium|low"
}
```

Use the `grade-output.sh` script to generate this structure interactively.


## Manual Description Optimization

Improve the skill description to trigger at the right times.

### The Description's Role

The description field determines when opencode loads your skill. It appears in the `skill` tool's available skills list. The agent decides whether to load your skill based on this description.

### Testing Description Effectiveness

**Step 1: Create Test Queries**

Generate 8 test queries - 4 that should trigger the skill, 4 that should not:

**Should-Trigger Queries (4):**
- Direct request using skill name
- Request using synonyms
- Casual phrasing
- Request needing skill but not naming it

**Should-Not-Trigger Queries (4):**
- Adjacent domain (similar but different)
- Ambiguous phrasing
- Query that touches on skill but needs different tool
- Completely unrelated request

**Example for a Docker skill:**

Should trigger:
1. "Create a docker container for my Node.js app"
2. "Build an image from this Dockerfile"
3. "How do I compose up my services?"
4. "I need to deploy this using containers"

Should not trigger:
1. "Install Docker on my machine" (installation vs usage)
2. "What is containerization?" (education vs hands-on)
3. "Show me Kubernetes commands" (different tool)
4. "Write a fibonacci function" (completely unrelated)

**Step 2: Test Each Query**

For each query:

1. Start a fresh opencode session
2. Type the query
3. Note whether the skill appears in available skills
4. If it appears, note whether it was loaded
5. Record results

**Step 3: Analyze Results**

**Missing triggers (should trigger but does not):**
- Add those phrases to description
- Use "pushy" language: "Make sure to use this skill whenever..."
- Include synonyms and variations

**False triggers (triggers when it should not):**
- Clarify scope with specific examples
- Add exclusions: "Use this for X, not for Y"
- Refine trigger phrases to be more specific

**Step 4: Refine Description**

Update the description field in SKILL.md frontmatter:

```yaml
# ❌ Too vague
description: Helps with Docker tasks

# ❌ Good but not pushy
description: Use for Docker container management and image building

# ✅ Good - specific and encouraging
description: This skill should be used when the user asks to "create a docker container", "build a docker image", "manage docker compose", or work with Docker workflows. Make sure to use this skill for any Docker-related tasks, container operations, or deployment configurations.
```

**Step 5: Re-test**

Run the 8 test queries again with the updated description.

**Step 6: Iterate**

Continue refining until:
- All should-trigger queries activate the skill
- All should-not-trigger queries do not activate it
- Description is still concise (under 1024 characters)

### Description Optimization Tips

**Be specific:**
- Include 3-5 concrete trigger phrases in quotes
- Use exact language users would type
- Mention file types or tools when relevant

**Be comprehensive:**
- Cover different ways to ask
- Include both formal and casual phrasing
- Account for synonyms

**Be "pushy":**
- Models tend to "undertrigger" skills (not use when they should)
- Use phrases like "Make sure to use this skill whenever..."
- Encourage use even if user does not explicitly ask

**Examples of Good Descriptions:**

```yaml
# File processing skill
description: This skill should be used when the user asks to "convert CSV to JSON", "transform this file", "parse this data", or work with file format conversions. Make sure to use this skill whenever the user mentions converting, transforming, or reformatting files, even if they do not specify the exact formats.

# API documentation skill
description: This skill should be used when the user asks to "document an API", "create API docs", "write endpoint documentation", or needs help with REST API documentation. Use for OpenAPI specs, endpoint descriptions, request/response examples, and API reference guides.

# Git workflow skill
description: This skill should be used when the user asks to "create a release", "publish a new version", "tag a release", or prepare software for deployment. Make sure to use this skill for any release-related tasks, version bumps, or deployment preparation.
```

### When to Optimize

Optimize the description:
- After initial skill creation
- When users report skill not triggering
- When skill triggers inappropriately
- After significant skill updates

Do not over-optimize - a description that works for 90% of cases is sufficient.


## Testing Scripts Reference

### Create Test Cases

```bash
# Interactive test case generation
~/.config/opencode/skills/skill-builder/scripts/create-tests.sh ~/.config/opencode/skills/my-skill
```

Generates `evals/evals.json` with 3 template test cases.

### Run Tests

```bash
# Execute test workflow
~/.config/opencode/skills/skill-builder/scripts/run-tests.sh ~/.config/opencode/skills/my-skill
```

Guides you through testing each case and saves results.

### Grade Outputs

```bash
# Interactive grading
~/.config/opencode/skills/skill-builder/scripts/grade-output.sh ~/.config/opencode/skills/my-skill
```

Provides structured evaluation checklist.

### Quick Start with Tests

When creating a new skill with tests:

```bash
# Create skill with test infrastructure
~/.config/opencode/skills/skill-builder/scripts/init-skill.sh my-skill --full --with-tests

# Generate test cases
~/.config/opencode/skills/skill-builder/scripts/create-tests.sh ~/.config/opencode/skills/my-skill

# Edit evals/evals.json to customize prompts

# Run tests
~/.config/opencode/skills/skill-builder/scripts/run-tests.sh ~/.config/opencode/skills/my-skill
```

See `references/eval-templates.md` for copy-paste test case templates.