# Test Case Templates

Copy-paste templates for creating test cases based on skill type.

## Table of Contents

1. [File Transform Skills](#file-transform-skills)
2. [Code Generation Skills](#code-generation-skills)
3. [Workflow Skills](#workflow-skills)
4. [Tool Integration Skills](#tool-integration-skills)
5. [Documentation Skills](#documentation-skills)

---

## File Transform Skills

For skills that convert, parse, or reformat files.

### Example: CSV to JSON Converter

```json
{
  "skill_name": "csv-to-json",
  "description": "Converts CSV files to JSON format with data type handling",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format.",
      "expected_output": "A JSON file at ./output/sales.json containing an array of objects, each representing a row from the CSV with properly formatted dates.",
      "assertions": [
        "JSON file exists at ./output/sales.json",
        "File contains valid JSON array",
        "All dates are in ISO 8601 format (YYYY-MM-DD)",
        "Numeric fields (quantity, price) are numbers, not strings"
      ]
    },
    {
      "id": 2,
      "name": "edge-case-large-file",
      "type": "edge",
      "prompt": "Convert a CSV file with 50,000 rows at ./data/export.csv to JSON. The file contains some rows with missing values in the 'email' column and special characters (emojis) in the 'notes' column.",
      "expected_output": "JSON file is created successfully, handling missing values as null or empty strings, and preserving special characters correctly.",
      "assertions": [
        "Large file is processed without memory errors",
        "Missing email values are handled (null or empty string)",
        "Special characters and emojis are preserved in output",
        "All 50,000 rows are converted"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing-casual",
      "type": "variation",
      "prompt": "Hey, I've got this spreadsheet data/export.csv that I need to turn into JSON. Can you do that for me? Also, the dates are in MM/DD/YYYY format right now - can you make them proper ISO format?",
      "expected_output": "Same as common case: JSON file with ISO 8601 dates",
      "assertions": [
        "Skill triggers with casual language ('turn into', 'proper format')",
        "Implicit date formatting requirement is handled",
        "Output matches common case results"
      ]
    }
  ]
}
```

### Template: File Transform Skill

```json
{
  "skill_name": "YOUR_SKILL_NAME",
  "description": "[Describe what file transformations this skill does]",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "[Realistic request with specific file paths and formats]",
      "expected_output": "[What the transformed file should contain]",
      "assertions": [
        "Output file exists at expected location",
        "File is valid [format: JSON, XML, etc.]",
        "Data is correctly transformed",
        "Specific requirements met (encoding, formatting, etc.)"
      ]
    },
    {
      "id": 2,
      "name": "edge-case",
      "type": "edge",
      "prompt": "[Large files, special characters, missing data, malformed input]",
      "expected_output": "[Graceful handling or error with helpful message]",
      "assertions": [
        "Edge case is handled appropriately",
        "No data loss or corruption",
        "Error messages are helpful if transformation fails"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing",
      "type": "variation",
      "prompt": "[Same request with casual language, different wording]",
      "expected_output": "[Same as common case]",
      "assertions": [
        "Skill triggers with varied phrasing",
        "Output matches common case",
        "Implicit requirements are understood"
      ]
    }
  ]
}
```

---

## Code Generation Skills

For skills that generate code, scripts, or templates.

### Example: Python Script Generator

```json
{
  "skill_name": "python-script-gen",
  "description": "Generates Python scripts for file operations and data processing",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "Create a Python script that recursively finds all .log files in /var/log, compresses them with gzip if they're older than 30 days, and moves them to /archive. Handle permission errors gracefully and log all actions to cleanup.log.",
      "expected_output": "A Python script that implements the log cleanup functionality with proper error handling, logging, and follows Python best practices.",
      "assertions": [
        "Script is syntactically valid Python",
        "Implements recursive file search",
        "Compresses files older than 30 days",
        "Handles permission errors gracefully",
        "Logs actions to cleanup.log"
      ]
    },
    {
      "id": 2,
      "name": "edge-case-empty-directory",
      "type": "edge",
      "prompt": "Create a Python script that processes all CSV files in ./data and generates a summary report. What should it do if the directory is empty or doesn't exist?",
      "expected_output": "Script handles empty/non-existent directories gracefully with informative error messages and doesn't crash.",
      "assertions": [
        "Script checks if directory exists before processing",
        "Handles empty directory case gracefully",
        "Provides informative error message",
        "Returns appropriate exit code"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing-brief",
      "type": "variation",
      "prompt": "Write me a python script to backup my photos. It should copy everything from ~/Pictures to ~/Backups/photos with today's date in the folder name. Skip duplicates if possible.",
      "expected_output": "Python backup script with timestamped folder and duplicate detection",
      "assertions": [
        "Skill works with brief, casual description",
        "Generates complete script without requiring clarification",
        "Handles date formatting for folder name",
        "Implements duplicate detection"
      ]
    }
  ]
}
```

### Template: Code Generation Skill

```json
{
  "skill_name": "YOUR_SKILL_NAME",
  "description": "[Describe what code this skill generates]",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "[Detailed request with specific requirements and constraints]",
      "expected_output": "[Description of the generated code]",
      "assertions": [
        "Code is syntactically valid",
        "Implements all requested features",
        "Follows language best practices",
        "Includes error handling where appropriate",
        "Is well-structured and readable"
      ]
    },
    {
      "id": 2,
      "name": "edge-case",
      "type": "edge",
      "prompt": "[Ambiguous requirements, missing data, error conditions]",
      "expected_output": "[Code handles edge cases or asks for clarification]",
      "assertions": [
        "Edge cases are handled appropriately",
        "Error conditions are managed",
        "Code doesn't crash on unexpected input"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing",
      "type": "variation",
      "prompt": "[Same request with minimal details or casual language]",
      "expected_output": "[Same quality as common case]",
      "assertions": [
        "Skill fills in reasonable defaults",
        "Generates complete solution",
        "Output quality matches detailed request"
      ]
    }
  ]
}
```

---

## Workflow Skills

For skills that guide multi-step processes.

### Example: Release Workflow

```json
{
  "skill_name": "release-workflow",
  "description": "Guides the complete software release process",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "I need to create a new release for my Node.js project. The repo is at ~/projects/myapp. We're currently at version 1.2.3 and this is a minor feature release (1.3.0). I need to update the version, create a changelog entry, commit, tag, and push to GitHub.",
      "expected_output": "Step-by-step guide covering: version bump in package.json, CHANGELOG.md update, commit creation, annotated tag, and push commands. Should provide copy-pasteable commands.",
      "assertions": [
        "All release steps are covered",
        "Commands are copy-pasteable",
        "Version numbers are consistent",
        "Validation steps are included",
        "Provides rollback guidance"
      ]
    },
    {
      "id": 2,
      "name": "edge-case-dirty-worktree",
      "type": "edge",
      "prompt": "Help me release version 2.0.0 of my project. By the way, I have some uncommitted changes in my working directory that I'm not sure about.",
      "expected_output": "Workflow detects dirty worktree, suggests stashing or committing changes before proceeding with release. Provides commands to handle the situation.",
      "assertions": [
        "Detects uncommitted changes",
        "Warns about dirty worktree",
        "Provides options: stash, commit, or abort",
        "Doesn't proceed without addressing the issue"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing-urgent",
      "type": "variation",
      "prompt": "Need to push out v2.1.0 ASAP. Hotfix for critical bug. What's the fastest way to get this released?",
      "expected_output": "Accelerated workflow prioritizing speed while maintaining essential safety checks",
      "assertions": [
        "Recognizes urgency from language ('ASAP', 'fastest')",
        "Still includes critical safety checks",
        "Prioritizes speed without skipping validation",
        "Provides streamlined command sequence"
      ]
    }
  ]
}
```

### Template: Workflow Skill

```json
{
  "skill_name": "YOUR_SKILL_NAME",
  "description": "[Describe what workflow this skill guides]",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "[Standard workflow request with clear requirements]",
      "expected_output": "[Complete step-by-step guide with all necessary steps]",
      "assertions": [
        "All workflow steps are included",
        "Steps are in logical order",
        "Validation points are provided",
        "Commands are copy-pasteable where applicable",
        "Clear success criteria defined"
      ]
    },
    {
      "id": 2,
      "name": "edge-case",
      "type": "edge",
      "prompt": "[Workflow with complications, errors, or unusual state]",
      "expected_output": "[Workflow detects issues and provides guidance]",
      "assertions": [
        "Detects unusual states or errors",
        "Provides recovery options",
        "Doesn't proceed blindly",
        "Offers rollback or alternative paths"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing",
      "type": "variation",
      "prompt": "[Workflow request with urgency, casual language, or minimal details]",
      "expected_output": "[Same workflow adapted to context]",
      "assertions": [
        "Adapts to urgency level",
        "Works with minimal context",
        "Still provides complete guidance"
      ]
    }
  ]
}
```

---

## Tool Integration Skills

For skills that wrap command-line tools.

### Example: Docker Helper

```json
{
  "skill_name": "docker-helper",
  "description": "Streamlines Docker container and image management",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "I need to deploy my Node.js app using Docker. The Dockerfile is in ~/projects/myapp. Build an image tagged as myapp:v1.0, then run a container named 'myapp-prod' that maps port 3000 to the host. Make sure it restarts automatically if it crashes.",
      "expected_output": "Docker commands to build the image and run the container with specified configuration, including restart policy.",
      "assertions": [
        "Provides correct build command with tag",
        "Provides correct run command with port mapping",
        "Includes restart policy (--restart unless-stopped or always)",
        "Sets container name correctly",
        "Commands are copy-pasteable"
      ]
    },
    {
      "id": 2,
      "name": "edge-case-port-conflict",
      "type": "edge",
      "prompt": "Run my Docker container on port 3000, but I think something might already be using that port on my machine. How do I check and handle this?",
      "expected_output": "Commands to check port usage, offer solutions (kill process, use different port, or map to different host port), and proceed accordingly.",
      "assertions": [
        "Detects potential port conflict",
        "Provides command to check port usage",
        "Offers multiple solutions",
        "Explains trade-offs of each option"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing-cleanup",
      "type": "variation",
      "prompt": "Docker is taking up too much space. Clean up old stuff for me?",
      "expected_output": "Commands to clean up stopped containers, unused images, and build cache",
      "assertions": [
        "Understands implicit request from context",
        "Provides safe cleanup commands",
        "Warns about data loss where applicable",
        "Shows space savings after cleanup"
      ]
    }
  ]
}
```

### Template: Tool Integration Skill

```json
{
  "skill_name": "YOUR_SKILL_NAME",
  "description": "[Describe what tool this skill wraps]",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "[Standard tool usage with specific options and requirements]",
      "expected_output": "[Correct command(s) with proper flags]",
      "assertions": [
        "Command syntax is correct",
        "All required flags are included",
        "Best practices are followed",
        "Commands are copy-pasteable",
        "Explains what each part does"
      ]
    },
    {
      "id": 2,
      "name": "edge-case",
      "type": "edge",
      "prompt": "[Tool usage with errors, conflicts, or unusual requirements]",
      "expected_output": "[Troubleshooting steps and solutions]",
      "assertions": [
        "Detects potential issues",
        "Provides diagnostic commands",
        "Offers multiple solutions",
        "Explains risks of each approach"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing",
      "type": "variation",
      "prompt": "[Casual request with vague requirements]",
      "expected_output": "[Tool commands with reasonable defaults]",
      "assertions": [
        "Fills in reasonable defaults",
        "Provides complete solution",
        "Explains assumptions made"
      ]
    }
  ]
}
```

---

## Documentation Skills

For skills that help create or review documentation.

### Example: README Generator

```json
{
  "skill_name": "readme-generator",
  "description": "Creates comprehensive README files for projects",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "Create a README for my Python project. It's a CLI tool called 'file-organizer' that sorts files by type and date. The repo is at ~/projects/file-organizer. It supports Python 3.8+, uses click for CLI, and has features for: organizing by extension, organizing by date, dry-run mode, and config file support.",
      "expected_output": "Complete README with: title, description, installation, usage examples, features list, configuration, and contributing sections. Formatted in Markdown.",
      "assertions": [
        "README is well-structured with clear headings",
        "All requested sections are included",
        "Installation instructions are clear",
        "Usage examples show actual commands",
        "Features are listed comprehensively"
      ]
    },
    {
      "id": 2,
      "name": "edge-case-minimal-info",
      "type": "edge",
      "prompt": "Write a README for my project. It's called 'utils' and it does some stuff with files.",
      "expected_output": "README template with placeholder sections and prompts for missing information. Asks clarifying questions or provides generic placeholders.",
      "assertions": [
        "Creates template structure despite minimal info",
        "Uses placeholders for missing details",
        "Suggests what information to add",
        "Doesn't invent features or functionality"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing-casual",
      "type": "variation",
      "prompt": "Hey can you write a readme for my new js library? It's on npm as 'async-queue'. It helps manage async tasks with a queue so you don't overwhelm APIs. Pretty simple but useful.",
      "expected_output": "README with npm installation, basic usage example, and API overview",
      "assertions": [
        "Understands from package name and casual description",
        "Includes npm install instructions",
        "Provides JavaScript usage examples",
        "Explains the problem it solves"
      ]
    }
  ]
}
```

### Template: Documentation Skill

```json
{
  "skill_name": "YOUR_SKILL_NAME",
  "description": "[Describe what documentation this skill helps with]",
  "evals": [
    {
      "id": 1,
      "name": "common-case",
      "type": "common",
      "prompt": "[Detailed request with project information and requirements]",
      "expected_output": "[Complete, well-structured documentation]",
      "assertions": [
        "Documentation is well-organized",
        "All required sections are included",
        "Examples are clear and relevant",
        "Formatting is consistent",
        "Language is clear and concise"
      ]
    },
    {
      "id": 2,
      "name": "edge-case",
      "type": "edge",
      "prompt": "[Vague request with minimal information]",
      "expected_output": "[Template with placeholders or clarifying questions]",
      "assertions": [
        "Creates structure despite limited info",
        "Uses appropriate placeholders",
        "Identifies missing information",
        "Doesn't invent false details"
      ]
    },
    {
      "id": 3,
      "name": "varied-phrasing",
      "type": "variation",
      "prompt": "[Casual request with implied requirements]",
      "expected_output": "[Documentation meeting implicit needs]",
      "assertions": [
        "Understands implicit requirements",
        "Provides complete documentation",
        "Matches tone of request"
      ]
    }
  ]
}
```

---

## Tips for Customizing Templates

### Making Tests Realistic

**Bad:**
```
"Convert CSV to JSON"
```

**Good:**
```
"Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format."
```

### Assertions Should Be Checkable

**Vague:**
```json
"assertions": [
  "Output looks good"
]
```

**Specific:**
```json
"assertions": [
  "JSON file exists at ./output/sales.json",
  "All dates are in ISO 8601 format",
  "Numeric fields are numbers, not strings"
]
```

### Edge Cases to Consider

1. **Large data** - Files with 10,000+ rows
2. **Special characters** - Emojis, Unicode, unusual symbols
3. **Missing data** - Empty cells, null values
4. **Wrong format** - CSV with inconsistent columns
5. **Permissions** - Read/write errors
6. **Conflicts** - Port conflicts, file already exists
7. **Empty input** - Zero rows, blank files
8. **Unexpected state** - Dirty git worktree, missing dependencies

### Variation Ideas

1. **Casual language** - "Hey, can you...", "I need to..."
2. **Abbreviated** - Minimal words, assumes context
3. **Urgent** - "ASAP", "quickly", "fastest way"
4. **Uncertain** - "I think", "maybe", "probably"
5. **Brief** - Single sentence, minimal details
6. **Verbose** - Extra context, backstory

---

## Validation Checklist

Before using your evals.json:

- [ ] Contains exactly 3 test cases (common, edge, variation)
- [ ] Each test has unique id (1, 2, 3)
- [ ] Each test has descriptive name
- [ ] Prompts are realistic with specific details
- [ ] Expected outputs are clear
- [ ] Assertions are objectively checkable
- [ ] JSON is valid (run through jq or json linter)
- [ ] File is saved as evals/evals.json in skill directory

Run this to validate:
```bash
jq . evals/evals.json > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
```