.dotfiles/.config/opencode/skills/tavily-best-practices/references/search.md

# Search API Reference

## Table of Contents

- [Query Optimization](#query-optimization)
- [Search Depth](#search-depth)
- [Key Parameters](#key-parameters)
- [Basic Usage](#basic-usage)
- [Filtering Results](#filtering-results)
- [Async Patterns](#async-patterns)
- [Response Fields](#response-fields)
- [Post-Filtering Strategies](#post-filtering-strategies)

---

## Query Optimization

**Keep queries under 400 characters.** Think search query, not long-form prompt.

**Break complex queries into sub-queries:**
```python
# Instead of one massive query, break it down:
queries = [
    "Competitors of company ABC",
    "Financial performance of company ABC",
    "Recent developments of company ABC"
]
responses = await asyncio.gather(*(client.search(q) for q in queries))
```

## Search Depth

Controls the latency vs. relevance tradeoff:

| Depth | Latency | Relevance | Content Type |
|-------|---------|-----------|--------------|
| `ultra-fast` | Lowest | Lower | Content (NLP summary) |
| `fast` | Low | Good | Chunks |
| `basic` | Medium | High | Content (NLP summary) |
| `advanced` | Higher | Highest | Chunks |

**Content types:**
- **Content**: NLP-based summary of the page, providing general context
- **Chunks**: Short snippets (max 500 chars) reranked by relevance to your query

**When to use each:**
- `ultra-fast`: Latency-critical (real-time chat, autocomplete)
- `fast`: Need chunks but latency matters
- `basic`: General-purpose, balanced relevance and latency
- `advanced`: Specific information queries, precision matters - default (Still fast and suitable for almost all use cases)

## Key Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | string | Required | Search query (keep under 400 chars) |
| `search_depth` | enum | `"basic"` | `"ultra-fast"`, `"fast"`, `"basic"`, `"advanced"` |
| `topic` | enum | `"general"` | `"general"`, `"news"`, `"finance"` |
| `chunks_per_source` | integer | 3 | Chunks per source (advanced/fast depth only) |
| `max_results` | integer | 5 | Maximum results (0-20) |
| `time_range` | enum | null | `"day"`, `"week"`, `"month"`, `"year"` |
| `start_date` | string | null | Results after date (YYYY-MM-DD) |
| `end_date` | string | null | Results before date (YYYY-MM-DD) |
| `include_domains` | array | [] | Domains to include (max 300, supports wildcards like `*.com`) |
| `exclude_domains` | array | [] | Domains to exclude (max 150) |
| `country` | enum | null | Boost results from country |
| `include_answer` | bool/enum | false | `true`/`"basic"` or `"advanced"` for LLM answer |
| `include_raw_content` | bool/enum | false | `true`/`"markdown"` or `"text"` for full page |
| `include_images` | boolean | false | Include image results |
| `include_image_descriptions` | boolean | false | AI descriptions for images |
| `include_favicon` | boolean | false | Favicon URL per result |
| `auto_parameters` | boolean | false | Auto-configure based on query intent |
| `include_usage` | boolean | false | Include credit usage info |

**Notes:**

- **`include_answer`**: Only use if you don't want to bring your own LLM. Most users bring their own model.

- **`auto_parameters`**: May set `search_depth="advanced"` (2 credits). Set `search_depth` manually to control cost.


## Basic Usage

```python
from tavily import TavilyClient

client = TavilyClient()

response = client.search(
    query="latest developments in quantum computing",
    max_results=10,
    search_depth="advanced",
    chunks_per_source=5
)

for result in response["results"]:
    print(f"{result['title']}: {result['url']}")
    print(f"Score: {result['score']}")
```


## Filtering Results

### By domain

```python
# Only search trusted sources
response = client.search(
    query="machine learning best practices",
    include_domains=["arxiv.org", "github.com", "pytorch.org"],
)

# Exclude specific domains
response = client.search(
    query="openai product reviews",
    exclude_domains=["reddit.com", "quora.com"]
)

# Restrict to LinkedIn profiles
response = client.search(
    query="CEO background at Google",
    include_domains=["linkedin.com/in"]
)
```

### By date

```python
# Relative time range
response = client.search(query="latest ML trends", time_range="month")

# Specific date range
response = client.search(
    query="AI news",
    start_date="2025-01-01",
    end_date="2025-02-01"
)
```

### By country

```python
# Boost results from specific country
response = client.search(query="tech startup funding", country="united states")
```

## Async Patterns

Leveraging the async client enables scaled search with higher breadth and reach by running multiple queries in parallel. This is the best practice for agentic systems where you need to gather comprehensive information quickly before passing it to a model for analysis.

```python
import asyncio
from tavily import AsyncTavilyClient

# Initialize Tavily client
tavily_client = AsyncTavilyClient("tvly-YOUR_API_KEY")

async def fetch_and_gather():
    queries = ["latest AI trends", "future of quantum computing"]

    # Perform search and continue even if one query fails (using return_exceptions=True)
    try:
        responses = await asyncio.gather(*(tavily_client.search(q) for q in queries), return_exceptions=True)

        # Handle responses and print
        for response in responses:
            if isinstance(response, Exception):
                print(f"Search query failed: {response}")
            else:
                print(response)

    except Exception as e:
        print(f"Error during search queries: {e}")

# Run the function
asyncio.run(fetch_and_gather())
```


## Response Fields

**Top-level response:**

| Field | Description |
|-------|-------------|
| `query` | The original search query |
| `answer` | AI-generated answer (if `include_answer` enabled) |
| `results` | Array of search result objects |
| `images` | Array of image results (if `include_images=True`) |

**Each result object:**

| Field | Description |
|-------|-------------|
| `title` | Page title |
| `url` | Source URL |
| `content` | Extracted text snippet(s) |
| `score` | Semantic relevance score (0-1) |
| `raw_content` | Full page content (if `include_raw_content` enabled) |
| `favicon` | Favicon URL (if `include_favicon=True`) |

**Top-level response also includes:**

| Field | Description |
|-------|-------------|
| `request_id` | Unique identifier for support reference |
| `response_time` | Response time in seconds |

**Each image object (if `include_images=True`):**

| Field | Description |
|-------|-------------|
| `url` | Image URL |
| `description` | AI-generated description (if `include_image_descriptions=True`) |

---

## Post-Filtering Strategies

Since Tavily provides raw web data, you have full configurability to implement filtering and post-processing to meet your specific requirements.

The `score` field measures query relevance, but doesn't guarantee the result matches specific criteria (e.g., correct person, exact product, specific company). Use post-filtering to validate results against strict requirements.

### Score-Based Filtering

Simple threshold filtering based on relevance score:

```python
results = response["results"]

# Filter by score threshold
high_quality = [r for r in results if r["score"] > 0.7]

# Sort by score
sorted_results = sorted(results, key=lambda x: x["score"], reverse=True)

# Top N above threshold
top_relevant = sorted(
    [r for r in results if r["score"] > 0.5],
    key=lambda x: x["score"],
    reverse=True
)[:3]
```

**Limitation:** Score indicates relevance to query, not accuracy of match to specific criteria.

### Regex Filtering

Fast, deterministic filtering using pattern matching. Use for:
- URL pattern validation
- Required keywords/phrases
- Structural requirements

```python
import re

def regex_filter(result, criteria: dict) -> dict:
    """
    Filter a search result using regex checks.

    Args:
        result: Search result dict with url, content, title, raw_content
        criteria: Dict with patterns to match:
            - url_pattern: Regex for URL validation
            - required_terms: List of terms that must appear in content
            - excluded_terms: List of terms that must NOT appear

    Returns:
        dict with check results and validity
    """
    url = result.get("url", "")
    content = result.get("content", "") or ""
    title = result.get("title", "") or ""
    raw_content = result.get("raw_content", "") or ""

    full_text = f"{content} {title} {raw_content}".lower()

    checks = {}

    # URL pattern check
    if "url_pattern" in criteria:
        checks["url_valid"] = bool(re.search(criteria["url_pattern"], url.lower()))

    # Required terms check
    if "required_terms" in criteria:
        checks["required_found"] = all(
            re.search(re.escape(term.lower()), full_text)
            for term in criteria["required_terms"]
        )

    # Excluded terms check
    if "excluded_terms" in criteria:
        checks["excluded_absent"] = not any(
            re.search(re.escape(term.lower()), full_text)
            for term in criteria["excluded_terms"]
        )

    # Valid if all checks pass
    is_valid = all(checks.values()) if checks else True

    return {"checks": checks, "is_valid": is_valid, "url": url}
```

**Example: LinkedIn Profile Search**

```python
criteria = {
    "url_pattern": r"linkedin\.com/in/",  # Profile URL, not company page
    "required_terms": ["Jane Smith", "Acme Corp"],
    "excluded_terms": ["job posting", "careers"]
}

for result in response["results"]:
    validation = regex_filter(result, criteria)
    if validation["is_valid"]:
        print(f"Valid: {validation['url']}")
```

**Example: GitHub Repository Search**

```python
criteria = {
    "url_pattern": r"github\.com/[\w-]+/[\w-]+$",  # Repo URL, not file
    "required_terms": ["MIT License"],
    "excluded_terms": ["archived", "deprecated"]
}
```

### LLM Verification

Semantic validation using an LLM. Use for:
- Synonym/abbreviation matching ("FDE" = "Forward Deployed Engineer")
- Context-aware validation
- Confidence scoring with reasoning

```python
from openai import OpenAI
import json

def llm_verify(result, target_description: str, validation_criteria: list[str]) -> dict:
    """
    Use LLM to verify if a search result matches target criteria.

    Args:
        result: Search result dict
        target_description: What you're looking for
        validation_criteria: List of criteria to check

    Returns:
        dict with is_match, confidence (high/medium/low), reasoning
    """
    content = result.get("content", "") or ""
    title = result.get("title", "") or ""
    url = result.get("url", "")

    criteria_text = "\n".join(f"- {c}" for c in validation_criteria)

    prompt = f"""Verify if this search result matches the target.

Target: {target_description}

Validation Criteria:
{criteria_text}

Search Result:
URL: {url}
Title: {title}
Content: {content}

Does this result match ALL criteria?

Respond with JSON only:
{{"is_match": true/false, "confidence": "high/medium/low", "reasoning": "brief explanation"}}"""

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)
```

**Example: Profile Verification**

```python
result = llm_verify(
    result=search_result,
    target_description="Jane Smith, Software Engineer at Acme Corp",
    validation_criteria=[
        "Name matches Jane Smith",
        "Currently works at Acme Corp (or recently)",
        "Role is software engineering related",
        "Professional customer-facing experience"
    ]
)

if result["is_match"] and result["confidence"] in ["high", "medium"]:
    print(f"Verified: {result['reasoning']}")
```

For more details, please read the [full API reference](https://docs.tavily.com/documentation/api-reference/endpoint/search)