Add skills
This commit is contained in:
@@ -0,0 +1,615 @@
|
||||
# Test Case Templates
|
||||
|
||||
Copy-paste templates for creating test cases based on skill type.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [File Transform Skills](#file-transform-skills)
|
||||
2. [Code Generation Skills](#code-generation-skills)
|
||||
3. [Workflow Skills](#workflow-skills)
|
||||
4. [Tool Integration Skills](#tool-integration-skills)
|
||||
5. [Documentation Skills](#documentation-skills)
|
||||
|
||||
---
|
||||
|
||||
## File Transform Skills
|
||||
|
||||
For skills that convert, parse, or reformat files.
|
||||
|
||||
### Example: CSV to JSON Converter
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "csv-to-json",
|
||||
"description": "Converts CSV files to JSON format with data type handling",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format.",
|
||||
"expected_output": "A JSON file at ./output/sales.json containing an array of objects, each representing a row from the CSV with properly formatted dates.",
|
||||
"assertions": [
|
||||
"JSON file exists at ./output/sales.json",
|
||||
"File contains valid JSON array",
|
||||
"All dates are in ISO 8601 format (YYYY-MM-DD)",
|
||||
"Numeric fields (quantity, price) are numbers, not strings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case-large-file",
|
||||
"type": "edge",
|
||||
"prompt": "Convert a CSV file with 50,000 rows at ./data/export.csv to JSON. The file contains some rows with missing values in the 'email' column and special characters (emojis) in the 'notes' column.",
|
||||
"expected_output": "JSON file is created successfully, handling missing values as null or empty strings, and preserving special characters correctly.",
|
||||
"assertions": [
|
||||
"Large file is processed without memory errors",
|
||||
"Missing email values are handled (null or empty string)",
|
||||
"Special characters and emojis are preserved in output",
|
||||
"All 50,000 rows are converted"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing-casual",
|
||||
"type": "variation",
|
||||
"prompt": "Hey, I've got this spreadsheet data/export.csv that I need to turn into JSON. Can you do that for me? Also, the dates are in MM/DD/YYYY format right now - can you make them proper ISO format?",
|
||||
"expected_output": "Same as common case: JSON file with ISO 8601 dates",
|
||||
"assertions": [
|
||||
"Skill triggers with casual language ('turn into', 'proper format')",
|
||||
"Implicit date formatting requirement is handled",
|
||||
"Output matches common case results"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Template: File Transform Skill
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "YOUR_SKILL_NAME",
|
||||
"description": "[Describe what file transformations this skill does]",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "[Realistic request with specific file paths and formats]",
|
||||
"expected_output": "[What the transformed file should contain]",
|
||||
"assertions": [
|
||||
"Output file exists at expected location",
|
||||
"File is valid [format: JSON, XML, etc.]",
|
||||
"Data is correctly transformed",
|
||||
"Specific requirements met (encoding, formatting, etc.)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case",
|
||||
"type": "edge",
|
||||
"prompt": "[Large files, special characters, missing data, malformed input]",
|
||||
"expected_output": "[Graceful handling or error with helpful message]",
|
||||
"assertions": [
|
||||
"Edge case is handled appropriately",
|
||||
"No data loss or corruption",
|
||||
"Error messages are helpful if transformation fails"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing",
|
||||
"type": "variation",
|
||||
"prompt": "[Same request with casual language, different wording]",
|
||||
"expected_output": "[Same as common case]",
|
||||
"assertions": [
|
||||
"Skill triggers with varied phrasing",
|
||||
"Output matches common case",
|
||||
"Implicit requirements are understood"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Generation Skills
|
||||
|
||||
For skills that generate code, scripts, or templates.
|
||||
|
||||
### Example: Python Script Generator
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "python-script-gen",
|
||||
"description": "Generates Python scripts for file operations and data processing",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "Create a Python script that recursively finds all .log files in /var/log, compresses them with gzip if they're older than 30 days, and moves them to /archive. Handle permission errors gracefully and log all actions to cleanup.log.",
|
||||
"expected_output": "A Python script that implements the log cleanup functionality with proper error handling, logging, and follows Python best practices.",
|
||||
"assertions": [
|
||||
"Script is syntactically valid Python",
|
||||
"Implements recursive file search",
|
||||
"Compresses files older than 30 days",
|
||||
"Handles permission errors gracefully",
|
||||
"Logs actions to cleanup.log"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case-empty-directory",
|
||||
"type": "edge",
|
||||
"prompt": "Create a Python script that processes all CSV files in ./data and generates a summary report. What should it do if the directory is empty or doesn't exist?",
|
||||
"expected_output": "Script handles empty/non-existent directories gracefully with informative error messages and doesn't crash.",
|
||||
"assertions": [
|
||||
"Script checks if directory exists before processing",
|
||||
"Handles empty directory case gracefully",
|
||||
"Provides informative error message",
|
||||
"Returns appropriate exit code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing-brief",
|
||||
"type": "variation",
|
||||
"prompt": "Write me a python script to backup my photos. It should copy everything from ~/Pictures to ~/Backups/photos with today's date in the folder name. Skip duplicates if possible.",
|
||||
"expected_output": "Python backup script with timestamped folder and duplicate detection",
|
||||
"assertions": [
|
||||
"Skill works with brief, casual description",
|
||||
"Generates complete script without requiring clarification",
|
||||
"Handles date formatting for folder name",
|
||||
"Implements duplicate detection"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Template: Code Generation Skill
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "YOUR_SKILL_NAME",
|
||||
"description": "[Describe what code this skill generates]",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "[Detailed request with specific requirements and constraints]",
|
||||
"expected_output": "[Description of the generated code]",
|
||||
"assertions": [
|
||||
"Code is syntactically valid",
|
||||
"Implements all requested features",
|
||||
"Follows language best practices",
|
||||
"Includes error handling where appropriate",
|
||||
"Is well-structured and readable"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case",
|
||||
"type": "edge",
|
||||
"prompt": "[Ambiguous requirements, missing data, error conditions]",
|
||||
"expected_output": "[Code handles edge cases or asks for clarification]",
|
||||
"assertions": [
|
||||
"Edge cases are handled appropriately",
|
||||
"Error conditions are managed",
|
||||
"Code doesn't crash on unexpected input"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing",
|
||||
"type": "variation",
|
||||
"prompt": "[Same request with minimal details or casual language]",
|
||||
"expected_output": "[Same quality as common case]",
|
||||
"assertions": [
|
||||
"Skill fills in reasonable defaults",
|
||||
"Generates complete solution",
|
||||
"Output quality matches detailed request"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Skills
|
||||
|
||||
For skills that guide multi-step processes.
|
||||
|
||||
### Example: Release Workflow
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "release-workflow",
|
||||
"description": "Guides the complete software release process",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "I need to create a new release for my Node.js project. The repo is at ~/projects/myapp. We're currently at version 1.2.3 and this is a minor feature release (1.3.0). I need to update the version, create a changelog entry, commit, tag, and push to GitHub.",
|
||||
"expected_output": "Step-by-step guide covering: version bump in package.json, CHANGELOG.md update, commit creation, annotated tag, and push commands. Should provide copy-pasteable commands.",
|
||||
"assertions": [
|
||||
"All release steps are covered",
|
||||
"Commands are copy-pasteable",
|
||||
"Version numbers are consistent",
|
||||
"Validation steps are included",
|
||||
"Provides rollback guidance"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case-dirty-worktree",
|
||||
"type": "edge",
|
||||
"prompt": "Help me release version 2.0.0 of my project. By the way, I have some uncommitted changes in my working directory that I'm not sure about.",
|
||||
"expected_output": "Workflow detects dirty worktree, suggests stashing or committing changes before proceeding with release. Provides commands to handle the situation.",
|
||||
"assertions": [
|
||||
"Detects uncommitted changes",
|
||||
"Warns about dirty worktree",
|
||||
"Provides options: stash, commit, or abort",
|
||||
"Doesn't proceed without addressing the issue"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing-urgent",
|
||||
"type": "variation",
|
||||
"prompt": "Need to push out v2.1.0 ASAP. Hotfix for critical bug. What's the fastest way to get this released?",
|
||||
"expected_output": "Accelerated workflow prioritizing speed while maintaining essential safety checks",
|
||||
"assertions": [
|
||||
"Recognizes urgency from language ('ASAP', 'fastest')",
|
||||
"Still includes critical safety checks",
|
||||
"Prioritizes speed without skipping validation",
|
||||
"Provides streamlined command sequence"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Template: Workflow Skill
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "YOUR_SKILL_NAME",
|
||||
"description": "[Describe what workflow this skill guides]",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "[Standard workflow request with clear requirements]",
|
||||
"expected_output": "[Complete step-by-step guide with all necessary steps]",
|
||||
"assertions": [
|
||||
"All workflow steps are included",
|
||||
"Steps are in logical order",
|
||||
"Validation points are provided",
|
||||
"Commands are copy-pasteable where applicable",
|
||||
"Clear success criteria defined"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case",
|
||||
"type": "edge",
|
||||
"prompt": "[Workflow with complications, errors, or unusual state]",
|
||||
"expected_output": "[Workflow detects issues and provides guidance]",
|
||||
"assertions": [
|
||||
"Detects unusual states or errors",
|
||||
"Provides recovery options",
|
||||
"Doesn't proceed blindly",
|
||||
"Offers rollback or alternative paths"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing",
|
||||
"type": "variation",
|
||||
"prompt": "[Workflow request with urgency, casual language, or minimal details]",
|
||||
"expected_output": "[Same workflow adapted to context]",
|
||||
"assertions": [
|
||||
"Adapts to urgency level",
|
||||
"Works with minimal context",
|
||||
"Still provides complete guidance"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tool Integration Skills
|
||||
|
||||
For skills that wrap command-line tools.
|
||||
|
||||
### Example: Docker Helper
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "docker-helper",
|
||||
"description": "Streamlines Docker container and image management",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "I need to deploy my Node.js app using Docker. The Dockerfile is in ~/projects/myapp. Build an image tagged as myapp:v1.0, then run a container named 'myapp-prod' that maps port 3000 to the host. Make sure it restarts automatically if it crashes.",
|
||||
"expected_output": "Docker commands to build the image and run the container with specified configuration, including restart policy.",
|
||||
"assertions": [
|
||||
"Provides correct build command with tag",
|
||||
"Provides correct run command with port mapping",
|
||||
"Includes restart policy (--restart unless-stopped or always)",
|
||||
"Sets container name correctly",
|
||||
"Commands are copy-pasteable"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case-port-conflict",
|
||||
"type": "edge",
|
||||
"prompt": "Run my Docker container on port 3000, but I think something might already be using that port on my machine. How do I check and handle this?",
|
||||
"expected_output": "Commands to check port usage, offer solutions (kill process, use different port, or map to different host port), and proceed accordingly.",
|
||||
"assertions": [
|
||||
"Detects potential port conflict",
|
||||
"Provides command to check port usage",
|
||||
"Offers multiple solutions",
|
||||
"Explains trade-offs of each option"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing-cleanup",
|
||||
"type": "variation",
|
||||
"prompt": "Docker is taking up too much space. Clean up old stuff for me?",
|
||||
"expected_output": "Commands to clean up stopped containers, unused images, and build cache",
|
||||
"assertions": [
|
||||
"Understands implicit request from context",
|
||||
"Provides safe cleanup commands",
|
||||
"Warns about data loss where applicable",
|
||||
"Shows space savings after cleanup"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Template: Tool Integration Skill
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "YOUR_SKILL_NAME",
|
||||
"description": "[Describe what tool this skill wraps]",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "[Standard tool usage with specific options and requirements]",
|
||||
"expected_output": "[Correct command(s) with proper flags]",
|
||||
"assertions": [
|
||||
"Command syntax is correct",
|
||||
"All required flags are included",
|
||||
"Best practices are followed",
|
||||
"Commands are copy-pasteable",
|
||||
"Explains what each part does"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case",
|
||||
"type": "edge",
|
||||
"prompt": "[Tool usage with errors, conflicts, or unusual requirements]",
|
||||
"expected_output": "[Troubleshooting steps and solutions]",
|
||||
"assertions": [
|
||||
"Detects potential issues",
|
||||
"Provides diagnostic commands",
|
||||
"Offers multiple solutions",
|
||||
"Explains risks of each approach"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing",
|
||||
"type": "variation",
|
||||
"prompt": "[Casual request with vague requirements]",
|
||||
"expected_output": "[Tool commands with reasonable defaults]",
|
||||
"assertions": [
|
||||
"Fills in reasonable defaults",
|
||||
"Provides complete solution",
|
||||
"Explains assumptions made"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Documentation Skills
|
||||
|
||||
For skills that help create or review documentation.
|
||||
|
||||
### Example: README Generator
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "readme-generator",
|
||||
"description": "Creates comprehensive README files for projects",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "Create a README for my Python project. It's a CLI tool called 'file-organizer' that sorts files by type and date. The repo is at ~/projects/file-organizer. It supports Python 3.8+, uses click for CLI, and has features for: organizing by extension, organizing by date, dry-run mode, and config file support.",
|
||||
"expected_output": "Complete README with: title, description, installation, usage examples, features list, configuration, and contributing sections. Formatted in Markdown.",
|
||||
"assertions": [
|
||||
"README is well-structured with clear headings",
|
||||
"All requested sections are included",
|
||||
"Installation instructions are clear",
|
||||
"Usage examples show actual commands",
|
||||
"Features are listed comprehensively"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case-minimal-info",
|
||||
"type": "edge",
|
||||
"prompt": "Write a README for my project. It's called 'utils' and it does some stuff with files.",
|
||||
"expected_output": "README template with placeholder sections and prompts for missing information. Asks clarifying questions or provides generic placeholders.",
|
||||
"assertions": [
|
||||
"Creates template structure despite minimal info",
|
||||
"Uses placeholders for missing details",
|
||||
"Suggests what information to add",
|
||||
"Doesn't invent features or functionality"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing-casual",
|
||||
"type": "variation",
|
||||
"prompt": "Hey can you write a readme for my new js library? It's on npm as 'async-queue'. It helps manage async tasks with a queue so you don't overwhelm APIs. Pretty simple but useful.",
|
||||
"expected_output": "README with npm installation, basic usage example, and API overview",
|
||||
"assertions": [
|
||||
"Understands from package name and casual description",
|
||||
"Includes npm install instructions",
|
||||
"Provides JavaScript usage examples",
|
||||
"Explains the problem it solves"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Template: Documentation Skill
|
||||
|
||||
```json
|
||||
{
|
||||
"skill_name": "YOUR_SKILL_NAME",
|
||||
"description": "[Describe what documentation this skill helps with]",
|
||||
"evals": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "common-case",
|
||||
"type": "common",
|
||||
"prompt": "[Detailed request with project information and requirements]",
|
||||
"expected_output": "[Complete, well-structured documentation]",
|
||||
"assertions": [
|
||||
"Documentation is well-organized",
|
||||
"All required sections are included",
|
||||
"Examples are clear and relevant",
|
||||
"Formatting is consistent",
|
||||
"Language is clear and concise"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "edge-case",
|
||||
"type": "edge",
|
||||
"prompt": "[Vague request with minimal information]",
|
||||
"expected_output": "[Template with placeholders or clarifying questions]",
|
||||
"assertions": [
|
||||
"Creates structure despite limited info",
|
||||
"Uses appropriate placeholders",
|
||||
"Identifies missing information",
|
||||
"Doesn't invent false details"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "varied-phrasing",
|
||||
"type": "variation",
|
||||
"prompt": "[Casual request with implied requirements]",
|
||||
"expected_output": "[Documentation meeting implicit needs]",
|
||||
"assertions": [
|
||||
"Understands implicit requirements",
|
||||
"Provides complete documentation",
|
||||
"Matches tone of request"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tips for Customizing Templates
|
||||
|
||||
### Making Tests Realistic
|
||||
|
||||
**Bad:**
|
||||
```
|
||||
"Convert CSV to JSON"
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```
|
||||
"Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format."
|
||||
```
|
||||
|
||||
### Assertions Should Be Checkable
|
||||
|
||||
**Vague:**
|
||||
```json
|
||||
"assertions": [
|
||||
"Output looks good"
|
||||
]
|
||||
```
|
||||
|
||||
**Specific:**
|
||||
```json
|
||||
"assertions": [
|
||||
"JSON file exists at ./output/sales.json",
|
||||
"All dates are in ISO 8601 format",
|
||||
"Numeric fields are numbers, not strings"
|
||||
]
|
||||
```
|
||||
|
||||
### Edge Cases to Consider
|
||||
|
||||
1. **Large data** - Files with 10,000+ rows
|
||||
2. **Special characters** - Emojis, Unicode, unusual symbols
|
||||
3. **Missing data** - Empty cells, null values
|
||||
4. **Wrong format** - CSV with inconsistent columns
|
||||
5. **Permissions** - Read/write errors
|
||||
6. **Conflicts** - Port conflicts, file already exists
|
||||
7. **Empty input** - Zero rows, blank files
|
||||
8. **Unexpected state** - Dirty git worktree, missing dependencies
|
||||
|
||||
### Variation Ideas
|
||||
|
||||
1. **Casual language** - "Hey, can you...", "I need to..."
|
||||
2. **Abbreviated** - Minimal words, assumes context
|
||||
3. **Urgent** - "ASAP", "quickly", "fastest way"
|
||||
4. **Uncertain** - "I think", "maybe", "probably"
|
||||
5. **Brief** - Single sentence, minimal details
|
||||
6. **Verbose** - Extra context, backstory
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before using your evals.json:
|
||||
|
||||
- [ ] Contains exactly 3 test cases (common, edge, variation)
|
||||
- [ ] Each test has unique id (1, 2, 3)
|
||||
- [ ] Each test has descriptive name
|
||||
- [ ] Prompts are realistic with specific details
|
||||
- [ ] Expected outputs are clear
|
||||
- [ ] Assertions are objectively checkable
|
||||
- [ ] JSON is valid (run through jq or json linter)
|
||||
- [ ] File is saved as evals/evals.json in skill directory
|
||||
|
||||
Run this to validate:
|
||||
```bash
|
||||
jq . evals/evals.json > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
|
||||
```
|
||||
Reference in New Issue
Block a user