Add skills

This commit is contained in:
2026-03-22 23:21:49 +02:00
parent 4cbbbae1ef
commit c09d9151ca
104 changed files with 23879 additions and 0 deletions

View File

@@ -0,0 +1,615 @@
# Test Case Templates
Copy-paste templates for creating test cases based on skill type.
## Table of Contents
1. [File Transform Skills](#file-transform-skills)
2. [Code Generation Skills](#code-generation-skills)
3. [Workflow Skills](#workflow-skills)
4. [Tool Integration Skills](#tool-integration-skills)
5. [Documentation Skills](#documentation-skills)
---
## File Transform Skills
For skills that convert, parse, or reformat files.
### Example: CSV to JSON Converter
```json
{
"skill_name": "csv-to-json",
"description": "Converts CSV files to JSON format with data type handling",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format.",
"expected_output": "A JSON file at ./output/sales.json containing an array of objects, each representing a row from the CSV with properly formatted dates.",
"assertions": [
"JSON file exists at ./output/sales.json",
"File contains valid JSON array",
"All dates are in ISO 8601 format (YYYY-MM-DD)",
"Numeric fields (quantity, price) are numbers, not strings"
]
},
{
"id": 2,
"name": "edge-case-large-file",
"type": "edge",
"prompt": "Convert a CSV file with 50,000 rows at ./data/export.csv to JSON. The file contains some rows with missing values in the 'email' column and special characters (emojis) in the 'notes' column.",
"expected_output": "JSON file is created successfully, handling missing values as null or empty strings, and preserving special characters correctly.",
"assertions": [
"Large file is processed without memory errors",
"Missing email values are handled (null or empty string)",
"Special characters and emojis are preserved in output",
"All 50,000 rows are converted"
]
},
{
"id": 3,
"name": "varied-phrasing-casual",
"type": "variation",
"prompt": "Hey, I've got this spreadsheet data/export.csv that I need to turn into JSON. Can you do that for me? Also, the dates are in MM/DD/YYYY format right now - can you make them proper ISO format?",
"expected_output": "Same as common case: JSON file with ISO 8601 dates",
"assertions": [
"Skill triggers with casual language ('turn into', 'proper format')",
"Implicit date formatting requirement is handled",
"Output matches common case results"
]
}
]
}
```
### Template: File Transform Skill
```json
{
"skill_name": "YOUR_SKILL_NAME",
"description": "[Describe what file transformations this skill does]",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "[Realistic request with specific file paths and formats]",
"expected_output": "[What the transformed file should contain]",
"assertions": [
"Output file exists at expected location",
"File is valid [format: JSON, XML, etc.]",
"Data is correctly transformed",
"Specific requirements met (encoding, formatting, etc.)"
]
},
{
"id": 2,
"name": "edge-case",
"type": "edge",
"prompt": "[Large files, special characters, missing data, malformed input]",
"expected_output": "[Graceful handling or error with helpful message]",
"assertions": [
"Edge case is handled appropriately",
"No data loss or corruption",
"Error messages are helpful if transformation fails"
]
},
{
"id": 3,
"name": "varied-phrasing",
"type": "variation",
"prompt": "[Same request with casual language, different wording]",
"expected_output": "[Same as common case]",
"assertions": [
"Skill triggers with varied phrasing",
"Output matches common case",
"Implicit requirements are understood"
]
}
]
}
```
---
## Code Generation Skills
For skills that generate code, scripts, or templates.
### Example: Python Script Generator
```json
{
"skill_name": "python-script-gen",
"description": "Generates Python scripts for file operations and data processing",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "Create a Python script that recursively finds all .log files in /var/log, compresses them with gzip if they're older than 30 days, and moves them to /archive. Handle permission errors gracefully and log all actions to cleanup.log.",
"expected_output": "A Python script that implements the log cleanup functionality with proper error handling, logging, and follows Python best practices.",
"assertions": [
"Script is syntactically valid Python",
"Implements recursive file search",
"Compresses files older than 30 days",
"Handles permission errors gracefully",
"Logs actions to cleanup.log"
]
},
{
"id": 2,
"name": "edge-case-empty-directory",
"type": "edge",
"prompt": "Create a Python script that processes all CSV files in ./data and generates a summary report. What should it do if the directory is empty or doesn't exist?",
"expected_output": "Script handles empty/non-existent directories gracefully with informative error messages and doesn't crash.",
"assertions": [
"Script checks if directory exists before processing",
"Handles empty directory case gracefully",
"Provides informative error message",
"Returns appropriate exit code"
]
},
{
"id": 3,
"name": "varied-phrasing-brief",
"type": "variation",
"prompt": "Write me a python script to backup my photos. It should copy everything from ~/Pictures to ~/Backups/photos with today's date in the folder name. Skip duplicates if possible.",
"expected_output": "Python backup script with timestamped folder and duplicate detection",
"assertions": [
"Skill works with brief, casual description",
"Generates complete script without requiring clarification",
"Handles date formatting for folder name",
"Implements duplicate detection"
]
}
]
}
```
### Template: Code Generation Skill
```json
{
"skill_name": "YOUR_SKILL_NAME",
"description": "[Describe what code this skill generates]",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "[Detailed request with specific requirements and constraints]",
"expected_output": "[Description of the generated code]",
"assertions": [
"Code is syntactically valid",
"Implements all requested features",
"Follows language best practices",
"Includes error handling where appropriate",
"Is well-structured and readable"
]
},
{
"id": 2,
"name": "edge-case",
"type": "edge",
"prompt": "[Ambiguous requirements, missing data, error conditions]",
"expected_output": "[Code handles edge cases or asks for clarification]",
"assertions": [
"Edge cases are handled appropriately",
"Error conditions are managed",
"Code doesn't crash on unexpected input"
]
},
{
"id": 3,
"name": "varied-phrasing",
"type": "variation",
"prompt": "[Same request with minimal details or casual language]",
"expected_output": "[Same quality as common case]",
"assertions": [
"Skill fills in reasonable defaults",
"Generates complete solution",
"Output quality matches detailed request"
]
}
]
}
```
---
## Workflow Skills
For skills that guide multi-step processes.
### Example: Release Workflow
```json
{
"skill_name": "release-workflow",
"description": "Guides the complete software release process",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "I need to create a new release for my Node.js project. The repo is at ~/projects/myapp. We're currently at version 1.2.3 and this is a minor feature release (1.3.0). I need to update the version, create a changelog entry, commit, tag, and push to GitHub.",
"expected_output": "Step-by-step guide covering: version bump in package.json, CHANGELOG.md update, commit creation, annotated tag, and push commands. Should provide copy-pasteable commands.",
"assertions": [
"All release steps are covered",
"Commands are copy-pasteable",
"Version numbers are consistent",
"Validation steps are included",
"Provides rollback guidance"
]
},
{
"id": 2,
"name": "edge-case-dirty-worktree",
"type": "edge",
"prompt": "Help me release version 2.0.0 of my project. By the way, I have some uncommitted changes in my working directory that I'm not sure about.",
"expected_output": "Workflow detects dirty worktree, suggests stashing or committing changes before proceeding with release. Provides commands to handle the situation.",
"assertions": [
"Detects uncommitted changes",
"Warns about dirty worktree",
"Provides options: stash, commit, or abort",
"Doesn't proceed without addressing the issue"
]
},
{
"id": 3,
"name": "varied-phrasing-urgent",
"type": "variation",
"prompt": "Need to push out v2.1.0 ASAP. Hotfix for critical bug. What's the fastest way to get this released?",
"expected_output": "Accelerated workflow prioritizing speed while maintaining essential safety checks",
"assertions": [
"Recognizes urgency from language ('ASAP', 'fastest')",
"Still includes critical safety checks",
"Prioritizes speed without skipping validation",
"Provides streamlined command sequence"
]
}
]
}
```
### Template: Workflow Skill
```json
{
"skill_name": "YOUR_SKILL_NAME",
"description": "[Describe what workflow this skill guides]",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "[Standard workflow request with clear requirements]",
"expected_output": "[Complete step-by-step guide with all necessary steps]",
"assertions": [
"All workflow steps are included",
"Steps are in logical order",
"Validation points are provided",
"Commands are copy-pasteable where applicable",
"Clear success criteria defined"
]
},
{
"id": 2,
"name": "edge-case",
"type": "edge",
"prompt": "[Workflow with complications, errors, or unusual state]",
"expected_output": "[Workflow detects issues and provides guidance]",
"assertions": [
"Detects unusual states or errors",
"Provides recovery options",
"Doesn't proceed blindly",
"Offers rollback or alternative paths"
]
},
{
"id": 3,
"name": "varied-phrasing",
"type": "variation",
"prompt": "[Workflow request with urgency, casual language, or minimal details]",
"expected_output": "[Same workflow adapted to context]",
"assertions": [
"Adapts to urgency level",
"Works with minimal context",
"Still provides complete guidance"
]
}
]
}
```
---
## Tool Integration Skills
For skills that wrap command-line tools.
### Example: Docker Helper
```json
{
"skill_name": "docker-helper",
"description": "Streamlines Docker container and image management",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "I need to deploy my Node.js app using Docker. The Dockerfile is in ~/projects/myapp. Build an image tagged as myapp:v1.0, then run a container named 'myapp-prod' that maps port 3000 to the host. Make sure it restarts automatically if it crashes.",
"expected_output": "Docker commands to build the image and run the container with specified configuration, including restart policy.",
"assertions": [
"Provides correct build command with tag",
"Provides correct run command with port mapping",
"Includes restart policy (--restart unless-stopped or always)",
"Sets container name correctly",
"Commands are copy-pasteable"
]
},
{
"id": 2,
"name": "edge-case-port-conflict",
"type": "edge",
"prompt": "Run my Docker container on port 3000, but I think something might already be using that port on my machine. How do I check and handle this?",
"expected_output": "Commands to check port usage, offer solutions (kill process, use different port, or map to different host port), and proceed accordingly.",
"assertions": [
"Detects potential port conflict",
"Provides command to check port usage",
"Offers multiple solutions",
"Explains trade-offs of each option"
]
},
{
"id": 3,
"name": "varied-phrasing-cleanup",
"type": "variation",
"prompt": "Docker is taking up too much space. Clean up old stuff for me?",
"expected_output": "Commands to clean up stopped containers, unused images, and build cache",
"assertions": [
"Understands implicit request from context",
"Provides safe cleanup commands",
"Warns about data loss where applicable",
"Shows space savings after cleanup"
]
}
]
}
```
### Template: Tool Integration Skill
```json
{
"skill_name": "YOUR_SKILL_NAME",
"description": "[Describe what tool this skill wraps]",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "[Standard tool usage with specific options and requirements]",
"expected_output": "[Correct command(s) with proper flags]",
"assertions": [
"Command syntax is correct",
"All required flags are included",
"Best practices are followed",
"Commands are copy-pasteable",
"Explains what each part does"
]
},
{
"id": 2,
"name": "edge-case",
"type": "edge",
"prompt": "[Tool usage with errors, conflicts, or unusual requirements]",
"expected_output": "[Troubleshooting steps and solutions]",
"assertions": [
"Detects potential issues",
"Provides diagnostic commands",
"Offers multiple solutions",
"Explains risks of each approach"
]
},
{
"id": 3,
"name": "varied-phrasing",
"type": "variation",
"prompt": "[Casual request with vague requirements]",
"expected_output": "[Tool commands with reasonable defaults]",
"assertions": [
"Fills in reasonable defaults",
"Provides complete solution",
"Explains assumptions made"
]
}
]
}
```
---
## Documentation Skills
For skills that help create or review documentation.
### Example: README Generator
```json
{
"skill_name": "readme-generator",
"description": "Creates comprehensive README files for projects",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "Create a README for my Python project. It's a CLI tool called 'file-organizer' that sorts files by type and date. The repo is at ~/projects/file-organizer. It supports Python 3.8+, uses click for CLI, and has features for: organizing by extension, organizing by date, dry-run mode, and config file support.",
"expected_output": "Complete README with: title, description, installation, usage examples, features list, configuration, and contributing sections. Formatted in Markdown.",
"assertions": [
"README is well-structured with clear headings",
"All requested sections are included",
"Installation instructions are clear",
"Usage examples show actual commands",
"Features are listed comprehensively"
]
},
{
"id": 2,
"name": "edge-case-minimal-info",
"type": "edge",
"prompt": "Write a README for my project. It's called 'utils' and it does some stuff with files.",
"expected_output": "README template with placeholder sections and prompts for missing information. Asks clarifying questions or provides generic placeholders.",
"assertions": [
"Creates template structure despite minimal info",
"Uses placeholders for missing details",
"Suggests what information to add",
"Doesn't invent features or functionality"
]
},
{
"id": 3,
"name": "varied-phrasing-casual",
"type": "variation",
"prompt": "Hey can you write a readme for my new js library? It's on npm as 'async-queue'. It helps manage async tasks with a queue so you don't overwhelm APIs. Pretty simple but useful.",
"expected_output": "README with npm installation, basic usage example, and API overview",
"assertions": [
"Understands from package name and casual description",
"Includes npm install instructions",
"Provides JavaScript usage examples",
"Explains the problem it solves"
]
}
]
}
```
### Template: Documentation Skill
```json
{
"skill_name": "YOUR_SKILL_NAME",
"description": "[Describe what documentation this skill helps with]",
"evals": [
{
"id": 1,
"name": "common-case",
"type": "common",
"prompt": "[Detailed request with project information and requirements]",
"expected_output": "[Complete, well-structured documentation]",
"assertions": [
"Documentation is well-organized",
"All required sections are included",
"Examples are clear and relevant",
"Formatting is consistent",
"Language is clear and concise"
]
},
{
"id": 2,
"name": "edge-case",
"type": "edge",
"prompt": "[Vague request with minimal information]",
"expected_output": "[Template with placeholders or clarifying questions]",
"assertions": [
"Creates structure despite limited info",
"Uses appropriate placeholders",
"Identifies missing information",
"Doesn't invent false details"
]
},
{
"id": 3,
"name": "varied-phrasing",
"type": "variation",
"prompt": "[Casual request with implied requirements]",
"expected_output": "[Documentation meeting implicit needs]",
"assertions": [
"Understands implicit requirements",
"Provides complete documentation",
"Matches tone of request"
]
}
]
}
```
---
## Tips for Customizing Templates
### Making Tests Realistic
**Bad:**
```
"Convert CSV to JSON"
```
**Good:**
```
"Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json. The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format."
```
### Assertions Should Be Checkable
**Vague:**
```json
"assertions": [
"Output looks good"
]
```
**Specific:**
```json
"assertions": [
"JSON file exists at ./output/sales.json",
"All dates are in ISO 8601 format",
"Numeric fields are numbers, not strings"
]
```
### Edge Cases to Consider
1. **Large data** - Files with 10,000+ rows
2. **Special characters** - Emojis, Unicode, unusual symbols
3. **Missing data** - Empty cells, null values
4. **Wrong format** - CSV with inconsistent columns
5. **Permissions** - Read/write errors
6. **Conflicts** - Port conflicts, file already exists
7. **Empty input** - Zero rows, blank files
8. **Unexpected state** - Dirty git worktree, missing dependencies
### Variation Ideas
1. **Casual language** - "Hey, can you...", "I need to..."
2. **Abbreviated** - Minimal words, assumes context
3. **Urgent** - "ASAP", "quickly", "fastest way"
4. **Uncertain** - "I think", "maybe", "probably"
5. **Brief** - Single sentence, minimal details
6. **Verbose** - Extra context, backstory
---
## Validation Checklist
Before using your evals.json:
- [ ] Contains exactly 3 test cases (common, edge, variation)
- [ ] Each test has unique id (1, 2, 3)
- [ ] Each test has descriptive name
- [ ] Prompts are realistic with specific details
- [ ] Expected outputs are clear
- [ ] Assertions are objectively checkable
- [ ] JSON is valid (run through jq or json linter)
- [ ] File is saved as evals/evals.json in skill directory
Run this to validate:
```bash
jq . evals/evals.json > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
```

View File

@@ -0,0 +1,607 @@
# Grading Guide
Comprehensive guide for evaluating skill outputs and identifying improvements.
## Table of Contents
1. [The Evaluation Framework](#the-evaluation-framework)
2. [Correctness](#correctness)
3. [Completeness](#completeness)
4. [Format](#format)
5. [Triggering](#triggering)
6. [Efficiency](#efficiency)
7. [Identifying Patterns](#identifying-patterns)
8. [Decision Framework](#decision-framework)
9. [Common Issues and Solutions](#common-issues-and-solutions)
10. [When to Stop Iterating](#when-to-stop-iterating)
---
## The Evaluation Framework
Evaluate skill outputs across five dimensions:
1. **Correctness** - Is the output accurate and correct?
2. **Completeness** - Are all requirements met?
3. **Format** - Does it follow expected structure?
4. **Triggering** - Does it activate appropriately?
5. **Efficiency** - Is it concise yet complete?
Each dimension has specific criteria to check. Use the grade-output.sh script for systematic evaluation.
---
## Correctness
### What to Check
**Output matches expected result:**
- For file transforms: Output file contains correct data
- For code generation: Code works as described
- For workflows: All steps lead to desired outcome
- For documentation: Information is accurate
**No factual errors:**
- Commands use correct syntax
- File paths are accurate
- Versions are correct
- Technical details are right
**Logic is sound:**
- Reasoning makes sense
- Steps follow logically
- No contradictions
- Edge cases considered
**Edge cases handled appropriately:**
- Empty inputs don't crash
- Large inputs work
- Special characters preserved
- Errors handled gracefully
### Examples
**Good (Correct):**
```bash
# Command actually works
docker run -p 3000:3000 --name myapp myimage:v1.0
```
**Bad (Incorrect):**
```bash
# Wrong flag syntax
docker run --port 3000:3000 myimage # Should be -p or --publish
```
**Good (Handles Edge Cases):**
```python
import os
if os.path.exists(filename):
process_file(filename)
else:
print(f"Error: {filename} not found")
```
**Bad (No Edge Case Handling):**
```python
process_file(filename) # Will crash if file doesn't exist
```
---
## Completeness
### What to Check
**All requested tasks completed:**
- Every requirement from prompt addressed
- No steps forgotten
- All files generated
- All transformations applied
**No steps skipped:**
- Validation steps included
- Cleanup steps not forgotten
- Confirmation steps present
- Rollback guidance provided
**Appropriate level of detail:**
- Not too brief (missing context)
- Not too verbose (information overload)
- Explains "why" not just "what"
- Provides examples where helpful
**Relevant context included:**
- Prerequisites mentioned
- Dependencies noted
- Error scenarios covered
- Alternative approaches offered
### Examples
**Good (Complete):**
```
To release version 1.3.0:
1. Update version in package.json: "version": "1.3.0"
2. Add entry to CHANGELOG.md with today's date
3. Commit: git commit -am "chore(release): bump version to 1.3.0"
4. Create tag: git tag -a v1.3.0 -m "Release 1.3.0"
5. Push: git push origin main && git push origin v1.3.0
6. Create GitHub release: gh release create v1.3.0 --notes-from-tag
Prerequisites:
- All tests passing
- CHANGELOG.md updated
- You have push access to repository
```
**Bad (Incomplete):**
```
To release:
1. Update version
2. Create tag
3. Push
```
---
## Format
### What to Check
**Output follows specified format:**
- If skill defines a template, output matches it
- Markdown formatting is correct
- Code blocks use proper syntax highlighting
- Tables are properly formatted
**Consistent with examples in skill:**
- Format matches examples in SKILL.md
- Style is consistent
- Terminology is consistent
- Structure follows patterns
**Easy to read and understand:**
- Clear headings and sections
- Good use of whitespace
- Logical organization
- No wall of text
### Examples
**Good (Well-Formatted):**
```markdown
## Deployment Steps
### 1. Build Docker Image
```bash
docker build -t myapp:v1.0 .
```
### 2. Run Container
```bash
docker run -d -p 3000:3000 --name myapp myapp:v1.0
```
### 3. Verify Deployment
```bash
curl http://localhost:3000/health
```
```
**Bad (Poor Formatting):**
```
step 1: build the image with docker build -t myapp:v1.0 . then step 2 run the container with docker run -d -p 3000:3000 --name myapp myapp:v1.0 and step 3 verify with curl
```
---
## Triggering
### What to Check
**Skill activates when appropriate:**
- Relevant triggers work
- Similar phrasings activate it
- Context clues are recognized
- Doesn't require exact keywords
**Doesn't activate when inappropriate:**
- Adjacent domains don't trigger it
- Ambiguous queries don't wrongly trigger
- Different tools aren't confused
- Scope is respected
### Testing Triggering
Test with 8 queries (4 should-trigger, 4 should-not-trigger):
**Example for Docker skill:**
Should trigger:
1. "Create a docker container for my Node.js app"
2. "Build an image from this Dockerfile"
3. "How do I compose up my services?"
4. "I need to deploy this using containers"
Should not trigger:
1. "Install Docker on my machine" (installation vs usage)
2. "What is containerization?" (education vs hands-on)
3. "Show me Kubernetes commands" (different tool)
4. "Write a fibonacci function" (completely unrelated)
### Examples
**Good (Appropriate Triggering):**
- User: "containerize my app"
- Skill: Docker helper activates ✓
**Bad (Missed Trigger):**
- User: "I need to run my app in containers"
- Skill: Doesn't activate (too narrow description) ✗
**Bad (False Trigger):**
- User: "Install Docker on Ubuntu"
- Skill: Activates but only handles usage, not installation ✗
---
## Efficiency
### What to Check
**No unnecessary steps:**
- Doesn't do redundant work
- Doesn't suggest unnecessary commands
- Shortcuts used where safe
- Process is streamlined
**Reasonable response length:**
- Not excessively verbose
- Doesn't repeat information
- No filler content
- Every sentence adds value
**Not overly verbose:**
- Commands not over-explained
- Doesn't explain obvious steps
- Concise but complete
- Respects user's expertise
### Examples
**Good (Efficient):**
```bash
# Clean up Docker resources
docker system prune -f
```
**Bad (Verbose):**
```bash
# Clean up Docker resources
# First, we need to run the docker command
# Then we use the system subcommand
# Then we use the prune subcommand
# The -f flag means force
docker system prune -f
# This will remove stopped containers, unused networks, and dangling images
```
---
## Identifying Patterns
### Single Test Failure
**Characteristics:**
- Only one test case fails
- Issue is unique to that scenario
- Other tests pass
**Action:**
- Add targeted instruction
- Include example for that edge case
- Fix the specific issue
- Don't generalize prematurely
**Example:**
```
Issue: Large file processing fails
Solution: Add instruction about memory-efficient processing
for files over 10,000 rows
```
### Multiple Test Failures (Same Issue)
**Characteristics:**
- Same problem across 2+ tests
- Pattern suggests root cause
- Symptom appears in different forms
**Action:**
- Fix the root cause
- Add general principle, not specific fix
- Consider extracting to script
- Explain the "why"
**Example:**
```
Issue: Tests 1 and 3 both have incorrect date formatting
Solution: Add general instruction about ISO 8601 date format
rather than fixing each case individually
```
### Multiple Test Failures (Different Issues)
**Characteristics:**
- Different problems in each test
- No clear pattern
- Skill may be too broad
**Action:**
- Clarify scope in description
- Add more specific instructions
- Consider splitting into multiple skills
- Focus on core functionality first
**Example:**
```
Issue: Test 1 fails on format, Test 2 fails on logic, Test 3
never triggers skill
Solution: Clarify what the skill does/doesn't do in description
Add validation steps
```
---
## Decision Framework
### Fix Specific Case When:
- Unique edge case not covered
- One-time issue unlikely to recur
- Fix is simple and doesn't add complexity
- Doesn't indicate systemic problem
**Example:**
```
Test: CSV with Unicode characters fails
Fix: Add note about UTF-8 encoding for international characters
(Don't rewrite entire CSV handling)
```
### Generalize Solution When:
- Same issue in 2+ tests
- Pattern suggests broader applicability
- Fix would benefit similar requests
- Explains underlying principle
**Example:**
```
Tests: Both test 1 and 2 have incorrect error handling
Fix: Add general principle about validating inputs before processing
with examples of common validation checks
```
### Extract to Script When:
- Same multi-step process repeated
- Deterministic operation (not creative)
- Would save time on every invocation
- Logic is complex or error-prone
**Example:**
```
Pattern: All three tests require converting dates to ISO format
Action: Create scripts/convert-dates.py
Include in SKILL.md: "Use scripts/convert-dates.py for date formatting"
```
---
## Common Issues and Solutions
### Issue: Skill Doesn't Trigger
**Symptoms:**
- User says relevant phrase
- Skill doesn't appear in available skills
- Model handles request without skill
**Solutions:**
1. Add specific trigger phrases to description
2. Use "pushy" language: "Make sure to use this skill whenever..."
3. Include synonyms and variations
4. Test with 8 trigger queries
### Issue: Output Format Inconsistent
**Symptoms:**
- Sometimes follows template, sometimes doesn't
- Format varies between similar requests
- Missing sections or elements
**Solutions:**
1. Add explicit format template in SKILL.md
2. Provide multiple examples
3. Explain why format matters
4. Use imperative: "ALWAYS use this template"
### Issue: Edge Cases Not Handled
**Symptoms:**
- Large files cause errors
- Empty inputs crash
- Special characters corrupted
- Missing data causes failures
**Solutions:**
1. Add validation step instructions
2. Include error handling examples
3. Create helper script for complex validation
4. Document known limitations
### Issue: Too Verbose
**Symptoms:**
- Responses are walls of text
- Over-explains obvious steps
- Repeats information
- Takes too long to get to point
**Solutions:**
1. Remove redundant explanations
2. Trust user's expertise
3. Move details to references/
4. Use progressive disclosure
### Issue: Incomplete Responses
**Symptoms:**
- Misses steps in workflow
- Forgets validation
- Doesn't mention prerequisites
- No error handling
**Solutions:**
1. Add comprehensive checklist in SKILL.md
2. Include validation at each step
3. Document prerequisites upfront
4. Add error handling examples
### Issue: Incorrect Commands
**Symptoms:**
- Commands don't work when copied
- Syntax errors
- Wrong flags or options
- Outdated versions
**Solutions:**
1. Test all commands before including
2. Specify version requirements
3. Add validation steps
4. Include error messages to expect
---
## When to Stop Iterating
### Stop When:
1. **User is satisfied**
- User says "this works for me"
- User stops requesting changes
- User starts using skill regularly
2. **Outputs meet expectations**
- All test cases pass
- Common scenarios work
- Edge cases handled reasonably
- No critical issues remain
3. **No meaningful progress**
- Last 2-3 iterations didn't improve results
- Changes are cosmetic only
- Diminishing returns on effort
- 90% solution is good enough
### Don't Stop When:
- **Perfect is the enemy of good** - 90% working is better than endlessly tweaking
- **Edge cases are theoretical** - Don't over-engineer for unlikely scenarios
- **User wants to keep iterating** - Follow user's lead
- **New issues emerge** - Fix real problems as they're found
### The 90% Rule
A skill that works well for 90% of cases is sufficient. Users can handle:
- Rare edge cases manually
- Unique situations with custom guidance
- Complex scenarios with multiple steps
**Focus on:**
- Common use cases (80% of value)
- Clear failure messages (10% of value)
- Good documentation (10% of value)
**Don't focus on:**
- Every possible edge case (diminishing returns)
- Perfect formatting (good enough is fine)
- Handling every possible error (major errors are enough)
---
## Grading Checklist
Use this checklist for each test case:
### Correctness
- [ ] Output matches expected result
- [ ] No factual errors
- [ ] Logic is sound
- [ ] Edge cases handled appropriately
### Completeness
- [ ] All requested tasks completed
- [ ] No steps skipped
- [ ] Appropriate level of detail
- [ ] Relevant context included
### Format
- [ ] Output follows specified format
- [ ] Consistent with examples
- [ ] Easy to read and understand
### Triggering
- [ ] Skill activated when appropriate
- [ ] Did not activate when inappropriate
### Efficiency
- [ ] No unnecessary steps
- [ ] Reasonable response length
- [ ] Not overly verbose
### Overall
- [ ] Would use this skill again
- [ ] Would recommend to others
- [ ] Saves time vs manual approach
- [ ] Output quality meets needs
---
## Recording Results
After grading, record:
```json
{
"test_id": 1,
"result": "pass|fail|partial",
"issues": [
"Description of issue 1",
"Description of issue 2"
],
"suggested_fix": "Brief description of improvement",
"extract_script": false,
"priority": "high|medium|low"
}
```
Use the grade-output.sh script to generate this structure interactively.
---
## Next Steps After Grading
1. **Review all results** - Look for patterns
2. **Prioritize fixes** - High priority first
3. **Update SKILL.md** - Based on issues found
4. **Create scripts** - Extract repeated work
5. **Re-run tests** - Verify improvements
6. **Repeat** - Until satisfied or good enough

View File

@@ -0,0 +1,888 @@
# Skill Templates
Copy-paste templates for common skill patterns. Use these as starting points when creating new skills.
## Table of Contents
1. [Simple Skill (Knowledge Only)](#simple-skill)
2. [Tool Integration Skill](#tool-integration-skill)
3. [Workflow Skill](#workflow-skill)
4. [Documentation Skill](#documentation-skill)
5. [Testing Your Skill](#testing-your-skill)
---
## Simple Skill
For skills that provide knowledge without scripts or complex resources.
**Use when:** The skill provides guidance, best practices, or reference information that can be explained in text.
### Directory Structure
```
simple-skill/
└── SKILL.md
```
### SKILL.md Template
```markdown
---
name: simple-skill
description: This skill should be used when the user asks to "TRIGGER_PHRASE_1", "TRIGGER_PHRASE_2", or needs help with SPECIFIC_TASK
license: MIT
compatibility: opencode
metadata:
category: CATEGORY
version: "1.0.0"
---
# Simple Skill Name
Brief description of what this skill covers.
## Overview
High-level explanation of the domain or topic.
## Common Tasks
### Task 1: Description
Step-by-step instructions:
1. First step with code example
2. Second step with code example
3. Third step
### Task 2: Description
Step-by-step instructions:
1. First step
2. Second step
## Best Practices
- Bullet point 1
- Bullet point 2
- Bullet point 3
## Important Notes
- Critical caveat 1
- Critical caveat 2
```
### Example: Git Commit Guidelines
```markdown
---
name: git-commit-guide
description: This skill should be used when the user asks to "write a commit message", "format a commit", "conventional commits", or needs help with Git commit standards
license: MIT
compatibility: opencode
---
# Git Commit Guidelines
Follow conventional commits specification for clear, structured commit messages.
## Format
```
<type>(<scope>): <description>
[optional body]
[optional footer]
```
## Types
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation only
- `style`: Code style (formatting, semicolons, etc)
- `refactor`: Code refactoring
- `test`: Adding or updating tests
- `chore`: Maintenance tasks
## Examples
```bash
# Feature
feat(auth): add OAuth2 login support
# Fix with scope
fix(api): resolve null pointer in user endpoint
# Breaking change
feat(db)!: migrate to PostgreSQL
```
## Rules
- Use imperative mood: "add" not "added"
- Don't capitalize first letter
- No period at end
- Keep first line under 72 characters
```
---
## Tool Integration Skill
For skills that interact with command-line tools and include automation scripts.
**Use when:** The skill wraps a CLI tool, provides automation, or requires executable scripts.
### Directory Structure
```
tool-skill/
├── SKILL.md
├── scripts/
│ └── helper.sh
├── references/
│ └── advanced-usage.md
└── evals/
└── evals.json
```
### SKILL.md Template
```markdown
---
name: tool-skill
description: This skill should be used when the user asks to "TRIGGER_PHRASE_1", "run TOOL_NAME", "TOOL_NAME commands", or work with TOOL_NAME
license: MIT
compatibility: opencode
metadata:
category: tools
version: "1.0.0"
---
# Tool Name Integration
Interact with TOOL_NAME for SPECIFIC_PURPOSE.
## Quick Start
Basic usage:
```bash
tool-name --option value
```
## Common Operations
### Operation 1: Description
```bash
# Example command
tool-name command --flag
```
Explanation of what this does.
### Operation 2: Description
```bash
# Example command
tool-name another-command
```
## Scripts
Use helper scripts in `scripts/`:
- `scripts/helper.sh` - What this script does
## Advanced Usage
For detailed options and edge cases, see `references/advanced-usage.md`.
## Important Notes
- Requirement 1
- Requirement 2
- Common pitfall
```
### scripts/helper.sh Template
```bash
#!/bin/bash
#
# Helper script for TOOL_NAME operations
#
set -euo pipefail
# Configuration
DEFAULT_OPTION="value"
# Functions
show_help() {
cat << EOF
Usage: $0 [OPTIONS]
Helper script for TOOL_NAME
Options:
-h, --help Show this help message
-v, --verbose Enable verbose output
-o, --option Specify option value
Examples:
$0 --verbose
$0 --option custom-value
EOF
}
# Parse arguments
VERBOSE=false
OPTION="$DEFAULT_OPTION"
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help
exit 0
;;
-v|--verbose)
VERBOSE=true
shift
;;
-o|--option)
OPTION="$2"
shift 2
;;
*)
echo "Unknown option: $1"
show_help
exit 1
;;
esac
done
# Main logic
main() {
[[ "$VERBOSE" == true ]] && echo "Running with option: $OPTION"
# Your tool integration logic here
echo "Helper script executed successfully"
}
main "$@"
```
### Example: Docker Helper
```markdown
---
name: docker-helper
description: This skill should be used when the user asks to "manage docker containers", "build docker images", "docker compose operations", or work with Docker workflows
license: MIT
compatibility: opencode
---
# Docker Helper
Streamline Docker container and image management.
## Quick Commands
### List Running Containers
```bash
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
```
### Clean Up System
```bash
# Remove stopped containers, unused networks, dangling images
docker system prune -f
```
## Scripts
Use helper scripts:
- `scripts/container-logs.sh` - View and follow container logs
- `scripts/cleanup.sh` - Safe cleanup of unused resources
## Docker Compose
### Start Services
```bash
docker compose up -d SERVICE_NAME
```
### View Logs
```bash
docker compose logs -f SERVICE_NAME
```
## References
- `references/compose-patterns.md` - Advanced compose configurations
- `references/troubleshooting.md` - Common issues and solutions
```
---
## Workflow Skill
For skills that guide multi-step processes or complex procedures.
**Use when:** The skill orchestrates a sequence of steps, decisions, or collaborative tasks.
### Directory Structure
```
workflow-skill/
├── SKILL.md
├── scripts/
│ ├── step-validator.sh
│ └── automation.sh
├── references/
│ ├── decision-matrix.md
│ └── examples/
│ └── sample-workflow.md
└── evals/
└── evals.json
```
### SKILL.md Template
```markdown
---
name: workflow-skill
description: This skill should be used when the user asks to "WORKFLOW_NAME", "perform WORKFLOW_TASK", "WORKFLOW_VERB process", or needs help with WORKFLOW_DOMAIN
license: MIT
compatibility: opencode
metadata:
category: workflow
version: "1.0.0"
---
# Workflow Name
Guide the WORKFLOW_PROCESS from start to finish with validation at each step.
## Prerequisites
Before starting, ensure:
1. Requirement 1
2. Requirement 2
3. Requirement 3
## Workflow Overview
```
Step 1 → Step 2 → Step 3 → Completion
↓ ↓ ↓
Validate Validate Validate
```
## Step-by-Step Process
### Step 1: Preparation
**Objective:** What to accomplish in this step
**Actions:**
1. Action item 1
2. Action item 2
**Validation:**
- [ ] Check item 1
- [ ] Check item 2
**Script:** `scripts/step-validator.sh --step 1`
### Step 2: Execution
**Objective:** What to accomplish in this step
**Actions:**
1. Action item 1
2. Action item 2
**Validation:**
- [ ] Check item 1
- [ ] Check item 2
**Decision Point:**
If CONDITION_A:
- Follow path A (see `references/path-a.md`)
If CONDITION_B:
- Follow path B (see `references/path-b.md`)
### Step 3: Verification
**Objective:** Final checks and completion
**Actions:**
1. Run verification command
2. Review output
**Validation:**
- [ ] All checks pass
- [ ] No errors in logs
## Automation
For automated execution:
```bash
scripts/automation.sh --full
```
## Troubleshooting
See `references/troubleshooting.md` for common issues.
## Examples
Complete workflow examples in `references/examples/`.
```
### Example: Release Workflow
```markdown
---
name: release-workflow
description: This skill should be used when the user asks to "create a release", "publish a new version", "tag a release", or prepare software for deployment
license: MIT
compatibility: opencode
---
# Release Workflow
Guide the complete release process from version bump to deployment.
## Prerequisites
- [ ] All tests passing
- [ ] CHANGELOG.md updated
- [ ] Version number decided
## Workflow
### Step 1: Pre-Release Checks
**Actions:**
1. Run full test suite: `npm test`
2. Verify CHANGELOG.md is current
3. Check for uncommitted changes
**Validation:**
```bash
scripts/pre-release-check.sh
```
### Step 2: Version Bump
**Actions:**
1. Update version in package.json
2. Update CHANGELOG.md with release date
3. Commit with message: `chore(release): bump version to X.Y.Z`
**Validation:**
- [ ] Version updated in all files
- [ ] CHANGELOG.md has date
- [ ] Commit created
### Step 3: Create Tag
**Actions:**
1. Create annotated tag: `git tag -a vX.Y.Z -m "Release X.Y.Z"`
2. Push tag: `git push origin vX.Y.Z`
**Validation:**
```bash
scripts/verify-tag.sh vX.Y.Z
```
### Step 4: GitHub Release
**Actions:**
1. Draft release notes from CHANGELOG
2. Create GitHub release
3. Attach artifacts if needed
**Command:**
```bash
gh release create vX.Y.Z --notes-from-tag
```
## Decision Matrix
See `references/decision-matrix.md` for:
- Version numbering (semver vs calver)
- Pre-release vs stable
- Hotfix procedures
```
---
## Documentation Skill
For skills focused on writing, reviewing, or managing documentation.
**Use when:** The skill helps create documentation, enforces standards, or provides writing guidelines.
### Directory Structure
```
docs-skill/
├── SKILL.md
├── references/
│ ├── style-guide.md
│ ├── templates/
│ │ ├── README-template.md
│ │ └── API-doc-template.md
│ └── examples/
│ └── sample-docs/
├── assets/
│ └── diagram-template.png
└── evals/
└── evals.json
```
### SKILL.md Template
```markdown
---
name: docs-skill
description: This skill should be used when the user asks to "write documentation", "create a README", "document this code", "review docs", or needs help with technical writing
license: MIT
compatibility: opencode
metadata:
category: documentation
version: "1.0.0"
---
# Documentation Helper
Create clear, consistent, and effective technical documentation.
## Quick Start
### Create README
Start with the template:
```bash
cp references/templates/README-template.md ./README.md
```
Then fill in:
1. Project name and description
2. Installation steps
3. Usage examples
4. API reference (if applicable)
## Documentation Types
### README.md
Essential sections:
- Title and description
- Installation
- Quick start
- Usage examples
- API reference (if applicable)
- Contributing
- License
Use template: `references/templates/README-template.md`
### API Documentation
Structure:
- Endpoint description
- Request format
- Response format
- Error codes
- Examples
Use template: `references/templates/API-doc-template.md`
### Code Comments
Follow style guide in `references/style-guide.md`:
- JSDoc for JavaScript
- Docstrings for Python
- Rustdoc for Rust
## Style Guide
See `references/style-guide.md` for:
- Tone and voice
- Formatting rules
- Terminology
- Code example standards
## Review Checklist
Before finalizing documentation:
- [ ] Clear and concise
- [ ] No typos or grammar errors
- [ ] Code examples work
- [ ] All links functional
- [ ] Proper heading hierarchy
- [ ] Consistent formatting
## Examples
Review well-documented projects in `references/examples/`.
## Assets
- `assets/diagram-template.png` - Template for architecture diagrams
```
### Example: API Documentation Helper
```markdown
---
name: api-docs-helper
description: This skill should be used when the user asks to "document an API", "create API documentation", "write endpoint docs", or needs help with REST API documentation
license: MIT
compatibility: opencode
---
# API Documentation Helper
Create comprehensive REST API documentation following OpenAPI standards.
## Endpoint Documentation Template
For each endpoint, document:
### 1. Overview
```markdown
## POST /api/v1/resource
Brief description of what this endpoint does.
**Authorization:** Required (Bearer token)
**Rate Limit:** 100 requests/minute
```
### 2. Request
```markdown
### Headers
| Header | Value | Required |
|--------|-------|----------|
| Content-Type | application/json | Yes |
| Authorization | Bearer {token} | Yes |
### Body
```json
{
"field1": "string (required) - Description",
"field2": "number (optional) - Description"
}
```
```
### 3. Response
```markdown
### Success (200 OK)
```json
{
"id": "string",
"created_at": "ISO 8601 datetime",
"status": "active"
}
```
### Error Responses
- `400 Bad Request` - Invalid input
- `401 Unauthorized` - Missing/invalid token
- `404 Not Found` - Resource doesn't exist
```
### 4. Examples
Provide curl and client library examples.
## Standards
Follow conventions in `references/api-standards.md`:
- Use JSON for request/response bodies
- ISO 8601 for dates
- snake_case for field names
- Include pagination metadata for lists
## Tools
Validate documentation:
- `scripts/validate-openapi.sh` - Check OpenAPI spec
- `scripts/check-examples.sh` - Verify code examples work
```
---
## Testing Your Skill
After creating a skill from these templates, follow this testing workflow:
### Step 1: Create Test Cases
Generate test cases for your skill:
```bash
~/.config/opencode/skills/skill-builder/scripts/create-tests.sh ~/.config/opencode/skills/your-skill
```
This creates `evals/evals.json` with 3 template test cases:
1. **Common Case** - Typical usage scenario
2. **Edge Case** - Tricky or unusual situation
3. **Varied Phrasing** - Same intent, different words
### Step 2: Customize Test Prompts
Edit `evals/evals.json` and replace placeholder text with realistic test prompts:
**Good test prompt:**
```
Convert the CSV file at ./data/sales.csv to JSON and save it as ./output/sales.json.
The CSV has headers: date, product, quantity, price. Make sure dates are in ISO 8601 format.
```
**Bad test prompt:**
```
Convert CSV to JSON
```
See `eval-templates.md` for copy-paste templates for different skill types.
### Step 3: Run Tests
Execute the test workflow:
```bash
~/.config/opencode/skills/skill-builder/scripts/run-tests.sh ~/.config/opencode/skills/your-skill
```
This will:
- Display each test prompt
- Guide you through manual testing in opencode
- Record your evaluations (pass/fail/skip)
- Save results to `evals/test-results.json`
### Step 4: Grade Results
Use the grading checklist for systematic evaluation:
```bash
~/.config/opencode/skills/skill-builder/scripts/grade-output.sh ~/.config/opencode/skills/your-skill
```
This evaluates:
- Correctness
- Completeness
- Format
- Triggering
- Efficiency
And generates `evals/grading-report.json` with structured feedback.
### Step 5: Iterate
Based on test results, improve your skill:
1. Review issues in grading report
2. Update SKILL.md with fixes
3. Re-run tests
4. Repeat until satisfied
See `grading-guide.md` for detailed evaluation criteria and decision frameworks.
### Testing by Skill Type
**Simple Skills:**
- May not need formal test cases
- Test manually with 2-3 realistic prompts
- Verify outputs are helpful and accurate
**Tool Integration Skills:**
- Must test all commands work when copied
- Verify edge cases (errors, conflicts)
- Test with different tool versions if possible
**Workflow Skills:**
- Test complete end-to-end flow
- Verify each step produces correct result
- Test decision points and branches
- Check error recovery paths
**Code Generation Skills:**
- Verify generated code is syntactically valid
- Test that code actually runs
- Check edge cases (empty input, large data)
- Validate error handling
### When to Stop Testing
Stop iterating when:
- All critical test cases pass
- User is satisfied with quality
- Common scenarios work reliably
- No longer making meaningful improvements
Remember: A skill that works well for 90% of cases is sufficient. Perfection is the enemy of good.
---
## Tips for All Templates
1. **Replace placeholders immediately** - Search for `TRIGGER_PHRASE`, `TOOL_NAME`, etc.
2. **Write description first** - Include 3-5 specific trigger phrases in quotes
3. **Keep examples working** - Test all code examples before finalizing
4. **Validate early and often** - Run validation after every significant change
5. **Progressive disclosure** - Move details to references/ as the skill grows
6. **Be specific** - Generic skills don't trigger well. Make descriptions concrete.
7. **Test triggers** - After creating, try phrases from your description to verify activation
## Validation Checklist for New Skills
Before considering a skill complete:
- [ ] SKILL.md created with proper frontmatter
- [ ] `name` matches directory name
- [ ] `description` is 20-1024 characters with trigger phrases
- [ ] No second-person writing in body
- [ ] All scripts are executable
- [ ] Referenced files exist and are mentioned in SKILL.md
- [ ] Validation passes: `validate-skill.sh /path/to/skill`
- [ ] Test cases created in `evals/evals.json`
- [ ] Tests pass or issues documented
- [ ] Tested with actual trigger phrases