giteadmin/.dotfiles

Fork 0

Files

Jonathan Agmon c09d9151ca Add skills

2026-03-22 23:21:49 +02:00

12 KiB

Raw Blame History

Search API Reference

Query Optimization
Search Depth
Key Parameters
Basic Usage
Filtering Results
Async Patterns
Response Fields
Post-Filtering Strategies

Query Optimization

Keep queries under 400 characters. Think search query, not long-form prompt.

Break complex queries into sub-queries:

# Instead of one massive query, break it down:
queries = [
    "Competitors of company ABC",
    "Financial performance of company ABC",
    "Recent developments of company ABC"
]
responses = await asyncio.gather(*(client.search(q) for q in queries))

Search Depth

Controls the latency vs. relevance tradeoff:

Depth	Latency	Relevance	Content Type
`ultra-fast`	Lowest	Lower	Content (NLP summary)
`fast`	Low	Good	Chunks
`basic`	Medium	High	Content (NLP summary)
`advanced`	Higher	Highest	Chunks

Content types:

Content: NLP-based summary of the page, providing general context
Chunks: Short snippets (max 500 chars) reranked by relevance to your query

When to use each:

ultra-fast: Latency-critical (real-time chat, autocomplete)
fast: Need chunks but latency matters
basic: General-purpose, balanced relevance and latency
advanced: Specific information queries, precision matters - default (Still fast and suitable for almost all use cases)

Key Parameters

Parameter	Type	Default	Description
`query`	string	Required	Search query (keep under 400 chars)
`search_depth`	enum	`"basic"`	`"ultra-fast"`, `"fast"`, `"basic"`, `"advanced"`
`topic`	enum	`"general"`	`"general"`, `"news"`, `"finance"`
`chunks_per_source`	integer	3	Chunks per source (advanced/fast depth only)
`max_results`	integer	5	Maximum results (0-20)
`time_range`	enum	null	`"day"`, `"week"`, `"month"`, `"year"`
`start_date`	string	null	Results after date (YYYY-MM-DD)
`end_date`	string	null	Results before date (YYYY-MM-DD)
`include_domains`	array	[]	Domains to include (max 300, supports wildcards like `*.com`)
`exclude_domains`	array	[]	Domains to exclude (max 150)
`country`	enum	null	Boost results from country
`include_answer`	bool/enum	false	`true`/`"basic"` or `"advanced"` for LLM answer
`include_raw_content`	bool/enum	false	`true`/`"markdown"` or `"text"` for full page
`include_images`	boolean	false	Include image results
`include_image_descriptions`	boolean	false	AI descriptions for images
`include_favicon`	boolean	false	Favicon URL per result
`auto_parameters`	boolean	false	Auto-configure based on query intent
`include_usage`	boolean	false	Include credit usage info

Notes:

include_answer: Only use if you don't want to bring your own LLM. Most users bring their own model.
auto_parameters: May set search_depth="advanced" (2 credits). Set search_depth manually to control cost.

Basic Usage

from tavily import TavilyClient

client = TavilyClient()

response = client.search(
    query="latest developments in quantum computing",
    max_results=10,
    search_depth="advanced",
    chunks_per_source=5
)

for result in response["results"]:
    print(f"{result['title']}: {result['url']}")
    print(f"Score: {result['score']}")

Filtering Results

By domain

# Only search trusted sources
response = client.search(
    query="machine learning best practices",
    include_domains=["arxiv.org", "github.com", "pytorch.org"],
)

# Exclude specific domains
response = client.search(
    query="openai product reviews",
    exclude_domains=["reddit.com", "quora.com"]
)

# Restrict to LinkedIn profiles
response = client.search(
    query="CEO background at Google",
    include_domains=["linkedin.com/in"]
)

By date

# Relative time range
response = client.search(query="latest ML trends", time_range="month")

# Specific date range
response = client.search(
    query="AI news",
    start_date="2025-01-01",
    end_date="2025-02-01"
)

By country

# Boost results from specific country
response = client.search(query="tech startup funding", country="united states")

Async Patterns

Leveraging the async client enables scaled search with higher breadth and reach by running multiple queries in parallel. This is the best practice for agentic systems where you need to gather comprehensive information quickly before passing it to a model for analysis.

import asyncio
from tavily import AsyncTavilyClient

# Initialize Tavily client
tavily_client = AsyncTavilyClient("tvly-YOUR_API_KEY")

async def fetch_and_gather():
    queries = ["latest AI trends", "future of quantum computing"]

    # Perform search and continue even if one query fails (using return_exceptions=True)
    try:
        responses = await asyncio.gather(*(tavily_client.search(q) for q in queries), return_exceptions=True)

        # Handle responses and print
        for response in responses:
            if isinstance(response, Exception):
                print(f"Search query failed: {response}")
            else:
                print(response)

    except Exception as e:
        print(f"Error during search queries: {e}")

# Run the function
asyncio.run(fetch_and_gather())

Response Fields

Top-level response:

Field	Description
`query`	The original search query
`answer`	AI-generated answer (if `include_answer` enabled)
`results`	Array of search result objects
`images`	Array of image results (if `include_images=True`)

Each result object:

Field	Description
`title`	Page title
`url`	Source URL
`content`	Extracted text snippet(s)
`score`	Semantic relevance score (0-1)
`raw_content`	Full page content (if `include_raw_content` enabled)
`favicon`	Favicon URL (if `include_favicon=True`)

Top-level response also includes:

Field	Description
`request_id`	Unique identifier for support reference
`response_time`	Response time in seconds

Each image object (if include_images=True):

Field	Description
`url`	Image URL
`description`	AI-generated description (if `include_image_descriptions=True`)

Post-Filtering Strategies

Since Tavily provides raw web data, you have full configurability to implement filtering and post-processing to meet your specific requirements.

The score field measures query relevance, but doesn't guarantee the result matches specific criteria (e.g., correct person, exact product, specific company). Use post-filtering to validate results against strict requirements.

Score-Based Filtering

Simple threshold filtering based on relevance score:

results = response["results"]

# Filter by score threshold
high_quality = [r for r in results if r["score"] > 0.7]

# Sort by score
sorted_results = sorted(results, key=lambda x: x["score"], reverse=True)

# Top N above threshold
top_relevant = sorted(
    [r for r in results if r["score"] > 0.5],
    key=lambda x: x["score"],
    reverse=True
)[:3]

Limitation: Score indicates relevance to query, not accuracy of match to specific criteria.

Regex Filtering

Fast, deterministic filtering using pattern matching. Use for:

URL pattern validation
Required keywords/phrases
Structural requirements

import re

def regex_filter(result, criteria: dict) -> dict:
    """
    Filter a search result using regex checks.

    Args:
        result: Search result dict with url, content, title, raw_content
        criteria: Dict with patterns to match:
            - url_pattern: Regex for URL validation
            - required_terms: List of terms that must appear in content
            - excluded_terms: List of terms that must NOT appear

    Returns:
        dict with check results and validity
    """
    url = result.get("url", "")
    content = result.get("content", "") or ""
    title = result.get("title", "") or ""
    raw_content = result.get("raw_content", "") or ""

    full_text = f"{content} {title} {raw_content}".lower()

    checks = {}

    # URL pattern check
    if "url_pattern" in criteria:
        checks["url_valid"] = bool(re.search(criteria["url_pattern"], url.lower()))

    # Required terms check
    if "required_terms" in criteria:
        checks["required_found"] = all(
            re.search(re.escape(term.lower()), full_text)
            for term in criteria["required_terms"]
        )

    # Excluded terms check
    if "excluded_terms" in criteria:
        checks["excluded_absent"] = not any(
            re.search(re.escape(term.lower()), full_text)
            for term in criteria["excluded_terms"]
        )

    # Valid if all checks pass
    is_valid = all(checks.values()) if checks else True

    return {"checks": checks, "is_valid": is_valid, "url": url}

Example: LinkedIn Profile Search

criteria = {
    "url_pattern": r"linkedin\.com/in/",  # Profile URL, not company page
    "required_terms": ["Jane Smith", "Acme Corp"],
    "excluded_terms": ["job posting", "careers"]
}

for result in response["results"]:
    validation = regex_filter(result, criteria)
    if validation["is_valid"]:
        print(f"Valid: {validation['url']}")

Example: GitHub Repository Search

criteria = {
    "url_pattern": r"github\.com/[\w-]+/[\w-]+$",  # Repo URL, not file
    "required_terms": ["MIT License"],
    "excluded_terms": ["archived", "deprecated"]
}

LLM Verification

Semantic validation using an LLM. Use for:

Synonym/abbreviation matching ("FDE" = "Forward Deployed Engineer")
Context-aware validation
Confidence scoring with reasoning

from openai import OpenAI
import json

def llm_verify(result, target_description: str, validation_criteria: list[str]) -> dict:
    """
    Use LLM to verify if a search result matches target criteria.

    Args:
        result: Search result dict
        target_description: What you're looking for
        validation_criteria: List of criteria to check

    Returns:
        dict with is_match, confidence (high/medium/low), reasoning
    """
    content = result.get("content", "") or ""
    title = result.get("title", "") or ""
    url = result.get("url", "")

    criteria_text = "\n".join(f"- {c}" for c in validation_criteria)

    prompt = f"""Verify if this search result matches the target.

Target: {target_description}

Validation Criteria:
{criteria_text}

Search Result:
URL: {url}
Title: {title}
Content: {content}

Does this result match ALL criteria?

Respond with JSON only:
{{"is_match": true/false, "confidence": "high/medium/low", "reasoning": "brief explanation"}}"""

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)

Example: Profile Verification

result = llm_verify(
    result=search_result,
    target_description="Jane Smith, Software Engineer at Acme Corp",
    validation_criteria=[
        "Name matches Jane Smith",
        "Currently works at Acme Corp (or recently)",
        "Role is software engineering related",
        "Professional customer-facing experience"
    ]
)

if result["is_match"] and result["confidence"] in ["high", "medium"]:
    print(f"Verified: {result['reasoning']}")

For more details, please read the full API reference

12 KiB Raw Blame History