# Test Case Templates Copy-paste templates for creating test cases based on skill type. ## Table of Contents 1. [File Transform Skills](#file-transform-skills) 2. [Code Generation Skills](#code-generation-skills) 3. [Workflow Skills](#workflow-skills) 4. [Tool Integration Skills](#tool-integration-skills) 5. [Documentation Skills](#documentation-skills) --- ## File Transform Skills For skills that convert, parse, or reformat files. ### Example: CSV to JSON Converter ```json { "skill_name": "csv-to-json", "description": "Converts CSV files to JSON format with data type handling", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format.", "expected_output": "A JSON file at ./output/sales.json containing an array of objects, each representing a row from the CSV with properly formatted dates.", "assertions": [ "JSON file exists at ./output/sales.json", "File contains valid JSON array", "All dates are in ISO 8601 format (YYYY-MM-DD)", "Numeric fields (quantity, price) are numbers, not strings" ] }, { "id": 2, "name": "edge-case-large-file", "type": "edge", "prompt": "Convert a CSV file with 50,000 rows at ./data/export.csv to JSON. The file contains some rows with missing values in the 'email' column and special characters (emojis) in the 'notes' column.", "expected_output": "JSON file is created successfully, handling missing values as null or empty strings, and preserving special characters correctly.", "assertions": [ "Large file is processed without memory errors", "Missing email values are handled (null or empty string)", "Special characters and emojis are preserved in output", "All 50,000 rows are converted" ] }, { "id": 3, "name": "varied-phrasing-casual", "type": "variation", "prompt": "Hey, I've got this spreadsheet data/export.csv that I need to turn into JSON. Can you do that for me? Also, the dates are in MM/DD/YYYY format right now - can you make them proper ISO format?", "expected_output": "Same as common case: JSON file with ISO 8601 dates", "assertions": [ "Skill triggers with casual language ('turn into', 'proper format')", "Implicit date formatting requirement is handled", "Output matches common case results" ] } ] } ``` ### Template: File Transform Skill ```json { "skill_name": "YOUR_SKILL_NAME", "description": "[Describe what file transformations this skill does]", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "[Realistic request with specific file paths and formats]", "expected_output": "[What the transformed file should contain]", "assertions": [ "Output file exists at expected location", "File is valid [format: JSON, XML, etc.]", "Data is correctly transformed", "Specific requirements met (encoding, formatting, etc.)" ] }, { "id": 2, "name": "edge-case", "type": "edge", "prompt": "[Large files, special characters, missing data, malformed input]", "expected_output": "[Graceful handling or error with helpful message]", "assertions": [ "Edge case is handled appropriately", "No data loss or corruption", "Error messages are helpful if transformation fails" ] }, { "id": 3, "name": "varied-phrasing", "type": "variation", "prompt": "[Same request with casual language, different wording]", "expected_output": "[Same as common case]", "assertions": [ "Skill triggers with varied phrasing", "Output matches common case", "Implicit requirements are understood" ] } ] } ``` --- ## Code Generation Skills For skills that generate code, scripts, or templates. ### Example: Python Script Generator ```json { "skill_name": "python-script-gen", "description": "Generates Python scripts for file operations and data processing", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "Create a Python script that recursively finds all .log files in /var/log, compresses them with gzip if they're older than 30 days, and moves them to /archive. Handle permission errors gracefully and log all actions to cleanup.log.", "expected_output": "A Python script that implements the log cleanup functionality with proper error handling, logging, and follows Python best practices.", "assertions": [ "Script is syntactically valid Python", "Implements recursive file search", "Compresses files older than 30 days", "Handles permission errors gracefully", "Logs actions to cleanup.log" ] }, { "id": 2, "name": "edge-case-empty-directory", "type": "edge", "prompt": "Create a Python script that processes all CSV files in ./data and generates a summary report. What should it do if the directory is empty or doesn't exist?", "expected_output": "Script handles empty/non-existent directories gracefully with informative error messages and doesn't crash.", "assertions": [ "Script checks if directory exists before processing", "Handles empty directory case gracefully", "Provides informative error message", "Returns appropriate exit code" ] }, { "id": 3, "name": "varied-phrasing-brief", "type": "variation", "prompt": "Write me a python script to backup my photos. It should copy everything from ~/Pictures to ~/Backups/photos with today's date in the folder name. Skip duplicates if possible.", "expected_output": "Python backup script with timestamped folder and duplicate detection", "assertions": [ "Skill works with brief, casual description", "Generates complete script without requiring clarification", "Handles date formatting for folder name", "Implements duplicate detection" ] } ] } ``` ### Template: Code Generation Skill ```json { "skill_name": "YOUR_SKILL_NAME", "description": "[Describe what code this skill generates]", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "[Detailed request with specific requirements and constraints]", "expected_output": "[Description of the generated code]", "assertions": [ "Code is syntactically valid", "Implements all requested features", "Follows language best practices", "Includes error handling where appropriate", "Is well-structured and readable" ] }, { "id": 2, "name": "edge-case", "type": "edge", "prompt": "[Ambiguous requirements, missing data, error conditions]", "expected_output": "[Code handles edge cases or asks for clarification]", "assertions": [ "Edge cases are handled appropriately", "Error conditions are managed", "Code doesn't crash on unexpected input" ] }, { "id": 3, "name": "varied-phrasing", "type": "variation", "prompt": "[Same request with minimal details or casual language]", "expected_output": "[Same quality as common case]", "assertions": [ "Skill fills in reasonable defaults", "Generates complete solution", "Output quality matches detailed request" ] } ] } ``` --- ## Workflow Skills For skills that guide multi-step processes. ### Example: Release Workflow ```json { "skill_name": "release-workflow", "description": "Guides the complete software release process", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "I need to create a new release for my Node.js project. The repo is at ~/projects/myapp. We're currently at version 1.2.3 and this is a minor feature release (1.3.0). I need to update the version, create a changelog entry, commit, tag, and push to GitHub.", "expected_output": "Step-by-step guide covering: version bump in package.json, CHANGELOG.md update, commit creation, annotated tag, and push commands. Should provide copy-pasteable commands.", "assertions": [ "All release steps are covered", "Commands are copy-pasteable", "Version numbers are consistent", "Validation steps are included", "Provides rollback guidance" ] }, { "id": 2, "name": "edge-case-dirty-worktree", "type": "edge", "prompt": "Help me release version 2.0.0 of my project. By the way, I have some uncommitted changes in my working directory that I'm not sure about.", "expected_output": "Workflow detects dirty worktree, suggests stashing or committing changes before proceeding with release. Provides commands to handle the situation.", "assertions": [ "Detects uncommitted changes", "Warns about dirty worktree", "Provides options: stash, commit, or abort", "Doesn't proceed without addressing the issue" ] }, { "id": 3, "name": "varied-phrasing-urgent", "type": "variation", "prompt": "Need to push out v2.1.0 ASAP. Hotfix for critical bug. What's the fastest way to get this released?", "expected_output": "Accelerated workflow prioritizing speed while maintaining essential safety checks", "assertions": [ "Recognizes urgency from language ('ASAP', 'fastest')", "Still includes critical safety checks", "Prioritizes speed without skipping validation", "Provides streamlined command sequence" ] } ] } ``` ### Template: Workflow Skill ```json { "skill_name": "YOUR_SKILL_NAME", "description": "[Describe what workflow this skill guides]", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "[Standard workflow request with clear requirements]", "expected_output": "[Complete step-by-step guide with all necessary steps]", "assertions": [ "All workflow steps are included", "Steps are in logical order", "Validation points are provided", "Commands are copy-pasteable where applicable", "Clear success criteria defined" ] }, { "id": 2, "name": "edge-case", "type": "edge", "prompt": "[Workflow with complications, errors, or unusual state]", "expected_output": "[Workflow detects issues and provides guidance]", "assertions": [ "Detects unusual states or errors", "Provides recovery options", "Doesn't proceed blindly", "Offers rollback or alternative paths" ] }, { "id": 3, "name": "varied-phrasing", "type": "variation", "prompt": "[Workflow request with urgency, casual language, or minimal details]", "expected_output": "[Same workflow adapted to context]", "assertions": [ "Adapts to urgency level", "Works with minimal context", "Still provides complete guidance" ] } ] } ``` --- ## Tool Integration Skills For skills that wrap command-line tools. ### Example: Docker Helper ```json { "skill_name": "docker-helper", "description": "Streamlines Docker container and image management", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "I need to deploy my Node.js app using Docker. The Dockerfile is in ~/projects/myapp. Build an image tagged as myapp:v1.0, then run a container named 'myapp-prod' that maps port 3000 to the host. Make sure it restarts automatically if it crashes.", "expected_output": "Docker commands to build the image and run the container with specified configuration, including restart policy.", "assertions": [ "Provides correct build command with tag", "Provides correct run command with port mapping", "Includes restart policy (--restart unless-stopped or always)", "Sets container name correctly", "Commands are copy-pasteable" ] }, { "id": 2, "name": "edge-case-port-conflict", "type": "edge", "prompt": "Run my Docker container on port 3000, but I think something might already be using that port on my machine. How do I check and handle this?", "expected_output": "Commands to check port usage, offer solutions (kill process, use different port, or map to different host port), and proceed accordingly.", "assertions": [ "Detects potential port conflict", "Provides command to check port usage", "Offers multiple solutions", "Explains trade-offs of each option" ] }, { "id": 3, "name": "varied-phrasing-cleanup", "type": "variation", "prompt": "Docker is taking up too much space. Clean up old stuff for me?", "expected_output": "Commands to clean up stopped containers, unused images, and build cache", "assertions": [ "Understands implicit request from context", "Provides safe cleanup commands", "Warns about data loss where applicable", "Shows space savings after cleanup" ] } ] } ``` ### Template: Tool Integration Skill ```json { "skill_name": "YOUR_SKILL_NAME", "description": "[Describe what tool this skill wraps]", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "[Standard tool usage with specific options and requirements]", "expected_output": "[Correct command(s) with proper flags]", "assertions": [ "Command syntax is correct", "All required flags are included", "Best practices are followed", "Commands are copy-pasteable", "Explains what each part does" ] }, { "id": 2, "name": "edge-case", "type": "edge", "prompt": "[Tool usage with errors, conflicts, or unusual requirements]", "expected_output": "[Troubleshooting steps and solutions]", "assertions": [ "Detects potential issues", "Provides diagnostic commands", "Offers multiple solutions", "Explains risks of each approach" ] }, { "id": 3, "name": "varied-phrasing", "type": "variation", "prompt": "[Casual request with vague requirements]", "expected_output": "[Tool commands with reasonable defaults]", "assertions": [ "Fills in reasonable defaults", "Provides complete solution", "Explains assumptions made" ] } ] } ``` --- ## Documentation Skills For skills that help create or review documentation. ### Example: README Generator ```json { "skill_name": "readme-generator", "description": "Creates comprehensive README files for projects", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "Create a README for my Python project. It's a CLI tool called 'file-organizer' that sorts files by type and date. The repo is at ~/projects/file-organizer. It supports Python 3.8+, uses click for CLI, and has features for: organizing by extension, organizing by date, dry-run mode, and config file support.", "expected_output": "Complete README with: title, description, installation, usage examples, features list, configuration, and contributing sections. Formatted in Markdown.", "assertions": [ "README is well-structured with clear headings", "All requested sections are included", "Installation instructions are clear", "Usage examples show actual commands", "Features are listed comprehensively" ] }, { "id": 2, "name": "edge-case-minimal-info", "type": "edge", "prompt": "Write a README for my project. It's called 'utils' and it does some stuff with files.", "expected_output": "README template with placeholder sections and prompts for missing information. Asks clarifying questions or provides generic placeholders.", "assertions": [ "Creates template structure despite minimal info", "Uses placeholders for missing details", "Suggests what information to add", "Doesn't invent features or functionality" ] }, { "id": 3, "name": "varied-phrasing-casual", "type": "variation", "prompt": "Hey can you write a readme for my new js library? It's on npm as 'async-queue'. It helps manage async tasks with a queue so you don't overwhelm APIs. Pretty simple but useful.", "expected_output": "README with npm installation, basic usage example, and API overview", "assertions": [ "Understands from package name and casual description", "Includes npm install instructions", "Provides JavaScript usage examples", "Explains the problem it solves" ] } ] } ``` ### Template: Documentation Skill ```json { "skill_name": "YOUR_SKILL_NAME", "description": "[Describe what documentation this skill helps with]", "evals": [ { "id": 1, "name": "common-case", "type": "common", "prompt": "[Detailed request with project information and requirements]", "expected_output": "[Complete, well-structured documentation]", "assertions": [ "Documentation is well-organized", "All required sections are included", "Examples are clear and relevant", "Formatting is consistent", "Language is clear and concise" ] }, { "id": 2, "name": "edge-case", "type": "edge", "prompt": "[Vague request with minimal information]", "expected_output": "[Template with placeholders or clarifying questions]", "assertions": [ "Creates structure despite limited info", "Uses appropriate placeholders", "Identifies missing information", "Doesn't invent false details" ] }, { "id": 3, "name": "varied-phrasing", "type": "variation", "prompt": "[Casual request with implied requirements]", "expected_output": "[Documentation meeting implicit needs]", "assertions": [ "Understands implicit requirements", "Provides complete documentation", "Matches tone of request" ] } ] } ``` --- ## Tips for Customizing Templates ### Making Tests Realistic **Bad:** ``` "Convert CSV to JSON" ``` **Good:** ``` "Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format." ``` ### Assertions Should Be Checkable **Vague:** ```json "assertions": [ "Output looks good" ] ``` **Specific:** ```json "assertions": [ "JSON file exists at ./output/sales.json", "All dates are in ISO 8601 format", "Numeric fields are numbers, not strings" ] ``` ### Edge Cases to Consider 1. **Large data** - Files with 10,000+ rows 2. **Special characters** - Emojis, Unicode, unusual symbols 3. **Missing data** - Empty cells, null values 4. **Wrong format** - CSV with inconsistent columns 5. **Permissions** - Read/write errors 6. **Conflicts** - Port conflicts, file already exists 7. **Empty input** - Zero rows, blank files 8. **Unexpected state** - Dirty git worktree, missing dependencies ### Variation Ideas 1. **Casual language** - "Hey, can you...", "I need to..." 2. **Abbreviated** - Minimal words, assumes context 3. **Urgent** - "ASAP", "quickly", "fastest way" 4. **Uncertain** - "I think", "maybe", "probably" 5. **Brief** - Single sentence, minimal details 6. **Verbose** - Extra context, backstory --- ## Validation Checklist Before using your evals.json: - [ ] Contains exactly 3 test cases (common, edge, variation) - [ ] Each test has unique id (1, 2, 3) - [ ] Each test has descriptive name - [ ] Prompts are realistic with specific details - [ ] Expected outputs are clear - [ ] Assertions are objectively checkable - [ ] JSON is valid (run through jq or json linter) - [ ] File is saved as evals/evals.json in skill directory Run this to validate: ```bash jq . evals/evals.json > /dev/null && echo "Valid JSON" || echo "Invalid JSON" ```