Add skills

2026-03-22 23:21:49 +02:00
parent 4cbbbae1ef
commit c09d9151ca
104 changed files with 23879 additions and 0 deletions
--- a/.config/opencode/skills/.skill-builder.disabled/references/grading-guide.md
+++ b/.config/opencode/skills/.skill-builder.disabled/references/grading-guide.md
@@ -0,0 +1,607 @@
+# Grading Guide
+
+Comprehensive guide for evaluating skill outputs and identifying improvements.
+
+## Table of Contents
+
+1. [The Evaluation Framework](#the-evaluation-framework)
+2. [Correctness](#correctness)
+3. [Completeness](#completeness)
+4. [Format](#format)
+5. [Triggering](#triggering)
+6. [Efficiency](#efficiency)
+7. [Identifying Patterns](#identifying-patterns)
+8. [Decision Framework](#decision-framework)
+9. [Common Issues and Solutions](#common-issues-and-solutions)
+10. [When to Stop Iterating](#when-to-stop-iterating)
+
+---
+
+## The Evaluation Framework
+
+Evaluate skill outputs across five dimensions:
+
+1. **Correctness** - Is the output accurate and correct?
+2. **Completeness** - Are all requirements met?
+3. **Format** - Does it follow expected structure?
+4. **Triggering** - Does it activate appropriately?
+5. **Efficiency** - Is it concise yet complete?
+
+Each dimension has specific criteria to check. Use the grade-output.sh script for systematic evaluation.
+
+---
+
+## Correctness
+
+### What to Check
+
+**Output matches expected result:**
+- For file transforms: Output file contains correct data
+- For code generation: Code works as described
+- For workflows: All steps lead to desired outcome
+- For documentation: Information is accurate
+
+**No factual errors:**
+- Commands use correct syntax
+- File paths are accurate
+- Versions are correct
+- Technical details are right
+
+**Logic is sound:**
+- Reasoning makes sense
+- Steps follow logically
+- No contradictions
+- Edge cases considered
+
+**Edge cases handled appropriately:**
+- Empty inputs don't crash
+- Large inputs work
+- Special characters preserved
+- Errors handled gracefully
+
+### Examples
+
+**Good (Correct):**
+```bash
+# Command actually works
+docker run -p 3000:3000 --name myapp myimage:v1.0
+```
+
+**Bad (Incorrect):**
+```bash
+# Wrong flag syntax
+docker run --port 3000:3000 myimage  # Should be -p or --publish
+```
+
+**Good (Handles Edge Cases):**
+```python
+import os
+if os.path.exists(filename):
+    process_file(filename)
+else:
+    print(f"Error: {filename} not found")
+```
+
+**Bad (No Edge Case Handling):**
+```python
+process_file(filename)  # Will crash if file doesn't exist
+```
+
+---
+
+## Completeness
+
+### What to Check
+
+**All requested tasks completed:**
+- Every requirement from prompt addressed
+- No steps forgotten
+- All files generated
+- All transformations applied
+
+**No steps skipped:**
+- Validation steps included
+- Cleanup steps not forgotten
+- Confirmation steps present
+- Rollback guidance provided
+
+**Appropriate level of detail:**
+- Not too brief (missing context)
+- Not too verbose (information overload)
+- Explains "why" not just "what"
+- Provides examples where helpful
+
+**Relevant context included:**
+- Prerequisites mentioned
+- Dependencies noted
+- Error scenarios covered
+- Alternative approaches offered
+
+### Examples
+
+**Good (Complete):**
+```
+To release version 1.3.0:
+
+1. Update version in package.json: "version": "1.3.0"
+2. Add entry to CHANGELOG.md with today's date
+3. Commit: git commit -am "chore(release): bump version to 1.3.0"
+4. Create tag: git tag -a v1.3.0 -m "Release 1.3.0"
+5. Push: git push origin main && git push origin v1.3.0
+6. Create GitHub release: gh release create v1.3.0 --notes-from-tag
+
+Prerequisites:
+- All tests passing
+- CHANGELOG.md updated
+- You have push access to repository
+```
+
+**Bad (Incomplete):**
+```
+To release:
+1. Update version
+2. Create tag
+3. Push
+```
+
+---
+
+## Format
+
+### What to Check
+
+**Output follows specified format:**
+- If skill defines a template, output matches it
+- Markdown formatting is correct
+- Code blocks use proper syntax highlighting
+- Tables are properly formatted
+
+**Consistent with examples in skill:**
+- Format matches examples in SKILL.md
+- Style is consistent
+- Terminology is consistent
+- Structure follows patterns
+
+**Easy to read and understand:**
+- Clear headings and sections
+- Good use of whitespace
+- Logical organization
+- No wall of text
+
+### Examples
+
+**Good (Well-Formatted):**
+```markdown
+## Deployment Steps
+
+### 1. Build Docker Image
+
+```bash
+docker build -t myapp:v1.0 .
+```
+
+### 2. Run Container
+
+```bash
+docker run -d -p 3000:3000 --name myapp myapp:v1.0
+```
+
+### 3. Verify Deployment
+
+```bash
+curl http://localhost:3000/health
+```
+```
+
+**Bad (Poor Formatting):**
+```
+step 1: build the image with docker build -t myapp:v1.0 . then step 2 run the container with docker run -d -p 3000:3000 --name myapp myapp:v1.0 and step 3 verify with curl
+```
+
+---
+
+## Triggering
+
+### What to Check
+
+**Skill activates when appropriate:**
+- Relevant triggers work
+- Similar phrasings activate it
+- Context clues are recognized
+- Doesn't require exact keywords
+
+**Doesn't activate when inappropriate:**
+- Adjacent domains don't trigger it
+- Ambiguous queries don't wrongly trigger
+- Different tools aren't confused
+- Scope is respected
+
+### Testing Triggering
+
+Test with 8 queries (4 should-trigger, 4 should-not-trigger):
+
+**Example for Docker skill:**
+
+Should trigger:
+1. "Create a docker container for my Node.js app"
+2. "Build an image from this Dockerfile"
+3. "How do I compose up my services?"
+4. "I need to deploy this using containers"
+
+Should not trigger:
+1. "Install Docker on my machine" (installation vs usage)
+2. "What is containerization?" (education vs hands-on)
+3. "Show me Kubernetes commands" (different tool)
+4. "Write a fibonacci function" (completely unrelated)
+
+### Examples
+
+**Good (Appropriate Triggering):**
+- User: "containerize my app"
+- Skill: Docker helper activates ✓
+
+**Bad (Missed Trigger):**
+- User: "I need to run my app in containers"
+- Skill: Doesn't activate (too narrow description) ✗
+
+**Bad (False Trigger):**
+- User: "Install Docker on Ubuntu"
+- Skill: Activates but only handles usage, not installation ✗
+
+---
+
+## Efficiency
+
+### What to Check
+
+**No unnecessary steps:**
+- Doesn't do redundant work
+- Doesn't suggest unnecessary commands
+- Shortcuts used where safe
+- Process is streamlined
+
+**Reasonable response length:**
+- Not excessively verbose
+- Doesn't repeat information
+- No filler content
+- Every sentence adds value
+
+**Not overly verbose:**
+- Commands not over-explained
+- Doesn't explain obvious steps
+- Concise but complete
+- Respects user's expertise
+
+### Examples
+
+**Good (Efficient):**
+```bash
+# Clean up Docker resources
+docker system prune -f
+```
+
+**Bad (Verbose):**
+```bash
+# Clean up Docker resources
+# First, we need to run the docker command
+# Then we use the system subcommand
+# Then we use the prune subcommand
+# The -f flag means force
+docker system prune -f
+# This will remove stopped containers, unused networks, and dangling images
+```
+
+---
+
+## Identifying Patterns
+
+### Single Test Failure
+
+**Characteristics:**
+- Only one test case fails
+- Issue is unique to that scenario
+- Other tests pass
+
+**Action:**
+- Add targeted instruction
+- Include example for that edge case
+- Fix the specific issue
+- Don't generalize prematurely
+
+**Example:**
+```
+Issue: Large file processing fails
+Solution: Add instruction about memory-efficient processing
+for files over 10,000 rows
+```
+
+### Multiple Test Failures (Same Issue)
+
+**Characteristics:**
+- Same problem across 2+ tests
+- Pattern suggests root cause
+- Symptom appears in different forms
+
+**Action:**
+- Fix the root cause
+- Add general principle, not specific fix
+- Consider extracting to script
+- Explain the "why"
+
+**Example:**
+```
+Issue: Tests 1 and 3 both have incorrect date formatting
+Solution: Add general instruction about ISO 8601 date format
+rather than fixing each case individually
+```
+
+### Multiple Test Failures (Different Issues)
+
+**Characteristics:**
+- Different problems in each test
+- No clear pattern
+- Skill may be too broad
+
+**Action:**
+- Clarify scope in description
+- Add more specific instructions
+- Consider splitting into multiple skills
+- Focus on core functionality first
+
+**Example:**
+```
+Issue: Test 1 fails on format, Test 2 fails on logic, Test 3 
+never triggers skill
+Solution: Clarify what the skill does/doesn't do in description
+Add validation steps
+```
+
+---
+
+## Decision Framework
+
+### Fix Specific Case When:
+
+- Unique edge case not covered
+- One-time issue unlikely to recur
+- Fix is simple and doesn't add complexity
+- Doesn't indicate systemic problem
+
+**Example:**
+```
+Test: CSV with Unicode characters fails
+Fix: Add note about UTF-8 encoding for international characters
+(Don't rewrite entire CSV handling)
+```
+
+### Generalize Solution When:
+
+- Same issue in 2+ tests
+- Pattern suggests broader applicability
+- Fix would benefit similar requests
+- Explains underlying principle
+
+**Example:**
+```
+Tests: Both test 1 and 2 have incorrect error handling
+Fix: Add general principle about validating inputs before processing
+with examples of common validation checks
+```
+
+### Extract to Script When:
+
+- Same multi-step process repeated
+- Deterministic operation (not creative)
+- Would save time on every invocation
+- Logic is complex or error-prone
+
+**Example:**
+```
+Pattern: All three tests require converting dates to ISO format
+Action: Create scripts/convert-dates.py
+Include in SKILL.md: "Use scripts/convert-dates.py for date formatting"
+```
+
+---
+
+## Common Issues and Solutions
+
+### Issue: Skill Doesn't Trigger
+
+**Symptoms:**
+- User says relevant phrase
+- Skill doesn't appear in available skills
+- Model handles request without skill
+
+**Solutions:**
+1. Add specific trigger phrases to description
+2. Use "pushy" language: "Make sure to use this skill whenever..."
+3. Include synonyms and variations
+4. Test with 8 trigger queries
+
+### Issue: Output Format Inconsistent
+
+**Symptoms:**
+- Sometimes follows template, sometimes doesn't
+- Format varies between similar requests
+- Missing sections or elements
+
+**Solutions:**
+1. Add explicit format template in SKILL.md
+2. Provide multiple examples
+3. Explain why format matters
+4. Use imperative: "ALWAYS use this template"
+
+### Issue: Edge Cases Not Handled
+
+**Symptoms:**
+- Large files cause errors
+- Empty inputs crash
+- Special characters corrupted
+- Missing data causes failures
+
+**Solutions:**
+1. Add validation step instructions
+2. Include error handling examples
+3. Create helper script for complex validation
+4. Document known limitations
+
+### Issue: Too Verbose
+
+**Symptoms:**
+- Responses are walls of text
+- Over-explains obvious steps
+- Repeats information
+- Takes too long to get to point
+
+**Solutions:**
+1. Remove redundant explanations
+2. Trust user's expertise
+3. Move details to references/
+4. Use progressive disclosure
+
+### Issue: Incomplete Responses
+
+**Symptoms:**
+- Misses steps in workflow
+- Forgets validation
+- Doesn't mention prerequisites
+- No error handling
+
+**Solutions:**
+1. Add comprehensive checklist in SKILL.md
+2. Include validation at each step
+3. Document prerequisites upfront
+4. Add error handling examples
+
+### Issue: Incorrect Commands
+
+**Symptoms:**
+- Commands don't work when copied
+- Syntax errors
+- Wrong flags or options
+- Outdated versions
+
+**Solutions:**
+1. Test all commands before including
+2. Specify version requirements
+3. Add validation steps
+4. Include error messages to expect
+
+---
+
+## When to Stop Iterating
+
+### Stop When:
+
+1. **User is satisfied**
+   - User says "this works for me"
+   - User stops requesting changes
+   - User starts using skill regularly
+
+2. **Outputs meet expectations**
+   - All test cases pass
+   - Common scenarios work
+   - Edge cases handled reasonably
+   - No critical issues remain
+
+3. **No meaningful progress**
+   - Last 2-3 iterations didn't improve results
+   - Changes are cosmetic only
+   - Diminishing returns on effort
+   - 90% solution is good enough
+
+### Don't Stop When:
+
+- **Perfect is the enemy of good** - 90% working is better than endlessly tweaking
+- **Edge cases are theoretical** - Don't over-engineer for unlikely scenarios
+- **User wants to keep iterating** - Follow user's lead
+- **New issues emerge** - Fix real problems as they're found
+
+### The 90% Rule
+
+A skill that works well for 90% of cases is sufficient. Users can handle:
+- Rare edge cases manually
+- Unique situations with custom guidance
+- Complex scenarios with multiple steps
+
+**Focus on:**
+- Common use cases (80% of value)
+- Clear failure messages (10% of value)
+- Good documentation (10% of value)
+
+**Don't focus on:**
+- Every possible edge case (diminishing returns)
+- Perfect formatting (good enough is fine)
+- Handling every possible error (major errors are enough)
+
+---
+
+## Grading Checklist
+
+Use this checklist for each test case:
+
+### Correctness
+- [ ] Output matches expected result
+- [ ] No factual errors
+- [ ] Logic is sound
+- [ ] Edge cases handled appropriately
+
+### Completeness
+- [ ] All requested tasks completed
+- [ ] No steps skipped
+- [ ] Appropriate level of detail
+- [ ] Relevant context included
+
+### Format
+- [ ] Output follows specified format
+- [ ] Consistent with examples
+- [ ] Easy to read and understand
+
+### Triggering
+- [ ] Skill activated when appropriate
+- [ ] Did not activate when inappropriate
+
+### Efficiency
+- [ ] No unnecessary steps
+- [ ] Reasonable response length
+- [ ] Not overly verbose
+
+### Overall
+- [ ] Would use this skill again
+- [ ] Would recommend to others
+- [ ] Saves time vs manual approach
+- [ ] Output quality meets needs
+
+---
+
+## Recording Results
+
+After grading, record:
+
+```json
+{
+  "test_id": 1,
+  "result": "pass|fail|partial",
+  "issues": [
+    "Description of issue 1",
+    "Description of issue 2"
+  ],
+  "suggested_fix": "Brief description of improvement",
+  "extract_script": false,
+  "priority": "high|medium|low"
+}
+```
+
+Use the grade-output.sh script to generate this structure interactively.
+
+---
+
+## Next Steps After Grading
+
+1. **Review all results** - Look for patterns
+2. **Prioritize fixes** - High priority first
+3. **Update SKILL.md** - Based on issues found
+4. **Create scripts** - Extract repeated work
+5. **Re-run tests** - Verify improvements
+6. **Repeat** - Until satisfied or good enough