Add skills
This commit is contained in:
144
.config/opencode/skills/tavily-best-practices/SKILL.md
Normal file
144
.config/opencode/skills/tavily-best-practices/SKILL.md
Normal file
@@ -0,0 +1,144 @@
|
||||
---
|
||||
name: tavily-best-practices
|
||||
description: "Build production-ready Tavily integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web search, content extraction, crawling, and research in agentic workflows, RAG systems, or autonomous agents."
|
||||
---
|
||||
|
||||
# Tavily
|
||||
|
||||
Tavily is a search API designed for LLMs, enabling AI applications to access real-time web data.
|
||||
|
||||
## Installation
|
||||
|
||||
**Python:**
|
||||
```bash
|
||||
pip install tavily-python
|
||||
```
|
||||
|
||||
**JavaScript:**
|
||||
```bash
|
||||
npm install @tavily/core
|
||||
```
|
||||
|
||||
See **[references/sdk.md](references/sdk.md)** for complete SDK reference.
|
||||
|
||||
## Client Initialization
|
||||
|
||||
```python
|
||||
from tavily import TavilyClient
|
||||
|
||||
# Uses TAVILY_API_KEY env var (recommended)
|
||||
client = TavilyClient()
|
||||
|
||||
#With project tracking (for usage organization)
|
||||
client = TavilyClient(project_id="your-project-id")
|
||||
|
||||
# Async client for parallel queries
|
||||
from tavily import AsyncTavilyClient
|
||||
async_client = AsyncTavilyClient()
|
||||
```
|
||||
|
||||
## Choosing the Right Method
|
||||
|
||||
**For custom agents/workflows:**
|
||||
|
||||
| Need | Method |
|
||||
|------|--------|
|
||||
| Web search results | `search()` |
|
||||
| Content from specific URLs | `extract()` |
|
||||
| Content from entire site | `crawl()` |
|
||||
| URL discovery from site | `map()` |
|
||||
|
||||
**For out-of-the-box research:**
|
||||
|
||||
| Need | Method |
|
||||
|------|--------|
|
||||
| End-to-end research with AI synthesis | `research()` |
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### search() - Web Search
|
||||
|
||||
```python
|
||||
response = client.search(
|
||||
query="quantum computing breakthroughs", # Keep under 400 chars
|
||||
max_results=10,
|
||||
search_depth="advanced"
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
Key parameters: `query`, `max_results`, `search_depth` (ultra-fast/fast/basic/advanced), `include_domains`, `exclude_domains`, `time_range`
|
||||
|
||||
See **[references/search.md](references/search.md)** for complete search reference.
|
||||
|
||||
### extract() - URL Content Extraction
|
||||
|
||||
```python
|
||||
# Simple one-step extraction
|
||||
response = client.extract(
|
||||
urls=["https://docs.example.com"],
|
||||
extract_depth="advanced"
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
Key parameters: `urls` (max 20), `extract_depth`, `query`, `chunks_per_source` (1-5)
|
||||
|
||||
See **[references/extract.md](references/extract.md)** for complete extract reference.
|
||||
|
||||
### crawl() - Site-Wide Extraction
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://docs.example.com",
|
||||
instructions="Find API documentation pages", # Semantic focus
|
||||
extract_depth="advanced"
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
Key parameters: `url`, `max_depth`, `max_breadth`, `limit`, `instructions`, `chunks_per_source`, `select_paths`, `exclude_paths`
|
||||
|
||||
See **[references/crawl.md](references/crawl.md)** for complete crawl reference.
|
||||
|
||||
### map() - URL Discovery
|
||||
|
||||
```python
|
||||
response = client.map(
|
||||
url="https://docs.example.com"
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### research() - AI-Powered Research
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
# For comprehensive multi-topic research
|
||||
result = client.research(
|
||||
input="Analyze competitive landscape for X in SMB market",
|
||||
model="pro" # or "mini" for focused queries, "auto" when unsure
|
||||
)
|
||||
request_id = result["request_id"]
|
||||
|
||||
# Poll until completed
|
||||
response = client.get_research(request_id)
|
||||
while response["status"] not in ["completed", "failed"]:
|
||||
time.sleep(10)
|
||||
response = client.get_research(request_id)
|
||||
|
||||
print(response["content"]) # The research report
|
||||
```
|
||||
|
||||
Key parameters: `input`, `model` ("mini"/"pro"/"auto"), `stream`, `output_schema`, `citation_format`
|
||||
|
||||
See **[references/research.md](references/research.md)** for complete research reference.
|
||||
|
||||
## Detailed Guides
|
||||
|
||||
For complete parameters, response fields, patterns, and examples:
|
||||
|
||||
- **[references/sdk.md](references/sdk.md)** - Python & JavaScript SDK reference, async patterns, Hybrid RAG
|
||||
- **[references/search.md](references/search.md)** - Query optimization, search depth selection, domain filtering, async patterns, post-filtering
|
||||
- **[references/extract.md](references/extract.md)** - One-step vs two-step extraction, query/chunks for targeting, advanced mode
|
||||
- **[references/crawl.md](references/crawl.md)** - Crawl vs Map, instructions for semantic focus, use cases, Map-then-Extract pattern
|
||||
- **[references/research.md](references/research.md)** - Prompting best practices, model selection, streaming, structured output schemas
|
||||
- **[references/integrations.md](references/integrations.md)** - LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and framework integrations
|
||||
@@ -0,0 +1,357 @@
|
||||
# Crawl & Map API Reference
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Crawl vs Map](#crawl-vs-map)
|
||||
- [Key Parameters](#key-parameters)
|
||||
- [Instructions and Chunks](#instructions-and-chunks)
|
||||
- [Path and Domain Filtering](#path-and-domain-filtering)
|
||||
- [Use Cases](#use-cases)
|
||||
- [Map then Extract Pattern](#map-then-extract-pattern)
|
||||
- [Performance Optimization](#performance-optimization)
|
||||
- [Common Pitfalls](#common-pitfalls)
|
||||
- [Response Fields](#response-fields)
|
||||
- [Summary](#summary)
|
||||
|
||||
---
|
||||
|
||||
## Crawl vs Map
|
||||
|
||||
| Feature | Crawl | Map |
|
||||
|---------|-------|-----|
|
||||
| **Returns** | Full content | URLs only |
|
||||
| **Speed** | Slower | Faster |
|
||||
| **Best for** | RAG, deep analysis, documentation | Site structure discovery, URL collection |
|
||||
|
||||
**Use Crawl when:**
|
||||
- Full content extraction needed
|
||||
- Building RAG systems
|
||||
- Processing paginated/nested content
|
||||
- Integration with knowledge bases
|
||||
|
||||
**Use Map when:**
|
||||
- Quick site structure discovery
|
||||
- URL collection without content
|
||||
- Planning before crawling
|
||||
- Sitemap generation
|
||||
|
||||
---
|
||||
|
||||
## Key Parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `url` | string | Required | Root URL to begin |
|
||||
| `max_depth` | integer | 1 | Levels deep to crawl (1-5). **Start with 1-2** |
|
||||
| `max_breadth` | integer | 20 | Links per page. 50-100 for focused crawls |
|
||||
| `limit` | integer | 50 | Total pages cap |
|
||||
| `instructions` | string | null | Natural language guidance (2 credits/10 pages) |
|
||||
| `chunks_per_source` | integer | 3 | Chunks per page (1-5). Only with `instructions` |
|
||||
| `extract_depth` | enum | `"basic"` | `"basic"` (1 credit/5 URLs) or `"advanced"` (2 credits/5 URLs) |
|
||||
| `format` | enum | `"markdown"` | `"markdown"` or `"text"` |
|
||||
| `select_paths` | array | null | Regex patterns to include |
|
||||
| `exclude_paths` | array | null | Regex patterns to exclude |
|
||||
| `select_domains` | array | null | Regex for domains to include |
|
||||
| `exclude_domains` | array | null | Regex for domains to exclude |
|
||||
| `allow_external` | boolean | true (crawl) / false (map) | Include external domain links |
|
||||
| `include_images` | boolean | false | Include images (crawl only) |
|
||||
| `include_favicon` | boolean | false | Include favicon URL (crawl only) |
|
||||
| `include_usage` | boolean | false | Include credit usage info |
|
||||
| `timeout` | float | 150 | Max wait (10-150 seconds) |
|
||||
|
||||
---
|
||||
|
||||
## Instructions and Chunks
|
||||
|
||||
Use `instructions` and `chunks_per_source` for semantic focus and token optimization:
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://docs.example.com",
|
||||
max_depth=2,
|
||||
instructions="Find all documentation about authentication and security",
|
||||
chunks_per_source=3 # Only top 3 relevant chunks per page
|
||||
)
|
||||
```
|
||||
|
||||
**Key benefits:**
|
||||
- `instructions` guides crawler semantically, focusing on relevant content
|
||||
- `chunks_per_source` returns only relevant snippets (max 500 chars each)
|
||||
- Prevents context window explosion in agentic use cases
|
||||
- Chunks appear in `raw_content` as: `<chunk 1> [...] <chunk 2> [...] <chunk 3>`
|
||||
|
||||
**Note:** `chunks_per_source` only works when `instructions` is provided.
|
||||
|
||||
---
|
||||
|
||||
## Path and Domain Filtering
|
||||
|
||||
### Path patterns (regex)
|
||||
|
||||
```python
|
||||
# Target specific sections
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
select_paths=["/docs/.*", "/api/.*", "/guides/.*"],
|
||||
exclude_paths=["/blog/.*", "/changelog/.*", "/private/.*"]
|
||||
)
|
||||
|
||||
# Paginated content
|
||||
response = client.crawl(
|
||||
url="https://example.com/blog",
|
||||
max_depth=2,
|
||||
select_paths=["/blog/.*", "/blog/page/.*"],
|
||||
exclude_paths=["/blog/tag/.*"]
|
||||
)
|
||||
```
|
||||
|
||||
### Domain control (regex)
|
||||
|
||||
```python
|
||||
# Stay within subdomain
|
||||
response = client.crawl(
|
||||
url="https://docs.example.com",
|
||||
select_domains=["^docs.example.com$"],
|
||||
max_depth=2
|
||||
)
|
||||
|
||||
# Exclude specific domains
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
exclude_domains=["^ads.example.com$", "^tracking.example.com$"]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Deep/Unlinked Content
|
||||
Deeply nested pages, paginated archives, internal search-only content.
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
max_depth=3,
|
||||
max_breadth=50,
|
||||
limit=200,
|
||||
select_paths=["/blog/.*", "/changelog/.*"],
|
||||
exclude_paths=["/private/.*", "/admin/.*"]
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Documentation/Structured Content
|
||||
Documentation, changelogs, FAQs with nonstandard markup.
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://docs.example.com",
|
||||
max_depth=2,
|
||||
extract_depth="advanced",
|
||||
select_paths=["/docs/.*"]
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Multi-modal/Cross-referencing
|
||||
Combining information from multiple sections.
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
max_depth=2,
|
||||
instructions="Find all documentation pages that link to API reference docs",
|
||||
extract_depth="advanced"
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Rapidly Changing Content
|
||||
API docs, product announcements, news sections.
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://api.example.com",
|
||||
max_depth=1,
|
||||
max_breadth=100
|
||||
)
|
||||
```
|
||||
|
||||
### 5. RAG/Knowledge Base Integration
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://docs.example.com",
|
||||
max_depth=2,
|
||||
extract_depth="advanced",
|
||||
include_images=True,
|
||||
instructions="Extract all technical documentation and code examples"
|
||||
)
|
||||
```
|
||||
|
||||
### 6. Compliance/Auditing
|
||||
Comprehensive content analysis for legal checks.
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
max_depth=3,
|
||||
max_breadth=100,
|
||||
limit=1000,
|
||||
extract_depth="advanced",
|
||||
instructions="Find all mentions of GDPR and data protection policies"
|
||||
)
|
||||
```
|
||||
|
||||
### 7. Known URL Patterns
|
||||
Sitemap-based crawling, section-specific extraction.
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
max_depth=1,
|
||||
select_paths=["/docs/.*", "/api/.*", "/guides/.*"],
|
||||
exclude_paths=["/private/.*", "/admin/.*"]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Map then Extract Pattern
|
||||
|
||||
Consider using Map before Crawl/Extract to plan your strategy:
|
||||
|
||||
1. **Use Map** to get site structure
|
||||
2. **Analyze** paths and patterns
|
||||
3. **Configure** Crawl or Extract with discovered paths
|
||||
4. **Execute** focused extraction
|
||||
|
||||
```python
|
||||
# Step 1: Map to discover structure
|
||||
map_result = client.map(
|
||||
url="https://docs.example.com",
|
||||
max_depth=2,
|
||||
instructions="Find all API docs and guides"
|
||||
)
|
||||
|
||||
# Step 2: Filter discovered URLs
|
||||
api_docs = [url for url in map_result["results"] if "/api/" in url]
|
||||
guides = [url for url in map_result["results"] if "/guides/" in url]
|
||||
print(f"Found {len(api_docs)} API docs, {len(guides)} guides")
|
||||
|
||||
# Step 3: Extract from filtered URLs
|
||||
target_urls = api_docs + guides
|
||||
response = client.extract(
|
||||
urls=target_urls[:20], # Max 20 per extract call
|
||||
extract_depth="advanced",
|
||||
query="API endpoints and usage examples",
|
||||
chunks_per_source=3
|
||||
)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Discover site structure before committing to full crawl
|
||||
- Identify relevant path patterns
|
||||
- Avoid unnecessary extraction
|
||||
- More control over what gets extracted
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Depth vs Performance
|
||||
|
||||
Each depth level increases crawl time exponentially:
|
||||
|
||||
| Depth | Typical Pages | Time |
|
||||
|-------|---------------|------|
|
||||
| 1 | 10-50 | Seconds |
|
||||
| 2 | 50-500 | Minutes |
|
||||
| 3 | 500-5000 | Many minutes |
|
||||
|
||||
**Best practices:**
|
||||
- Start with `max_depth=1` and increase only if needed
|
||||
- Use `max_breadth` to control horizontal expansion
|
||||
- Set appropriate `limit` to prevent excessive crawling
|
||||
- Process results incrementally rather than waiting for full crawl
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
- Respect site's robots.txt
|
||||
- Monitor API usage and limits
|
||||
- Use appropriate error handling for rate limits
|
||||
- Consider delays between large crawl operations
|
||||
|
||||
### Conservative vs Comprehensive
|
||||
|
||||
```python
|
||||
# Conservative (start here)
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
max_depth=1,
|
||||
max_breadth=20,
|
||||
limit=20
|
||||
)
|
||||
|
||||
# Comprehensive (use carefully)
|
||||
response = client.crawl(
|
||||
url="https://example.com",
|
||||
max_depth=3,
|
||||
max_breadth=100,
|
||||
limit=500
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
| Problem | Impact | Solution |
|
||||
|---------|--------|----------|
|
||||
| Excessive depth (`max_depth=4+`) | Exponential time, unnecessary pages | Start with 1-2, increase if needed |
|
||||
| Unfocused crawling | Wasted resources, irrelevant content, context explosion | Use `instructions` to focus semantically |
|
||||
| Missing limits | Runaway crawls, unexpected costs | Always set reasonable `limit` value |
|
||||
| Ignoring `failed_results` | Incomplete data, missed content | Monitor and adjust parameters |
|
||||
| Full content without chunks | Context window explosion | Use `instructions` + `chunks_per_source` |
|
||||
|
||||
---
|
||||
|
||||
## Response Fields
|
||||
|
||||
### Crawl Response
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `base_url` | The URL you started the crawl from |
|
||||
| `results` | List of crawled pages |
|
||||
| `results[].url` | Page URL |
|
||||
| `results[].raw_content` | Extracted content (or chunks if instructions provided) |
|
||||
| `results[].images` | Image URLs extracted from the page |
|
||||
| `results[].favicon` | Favicon URL (if `include_favicon=True`) |
|
||||
| `response_time` | Time in seconds |
|
||||
| `request_id` | Unique identifier for support reference |
|
||||
|
||||
### Map Response
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `base_url` | The URL you started the mapping from |
|
||||
| `results` | List of discovered URLs |
|
||||
| `response_time` | Time in seconds |
|
||||
| `request_id` | Unique identifier for support reference |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
1. **Use instructions and chunks_per_source** for focused, relevant results in agentic use cases
|
||||
2. **Start conservative** (`max_depth=1`, `max_breadth=20`) and scale up as needed
|
||||
3. **Use path patterns** to focus crawling on relevant content
|
||||
4. **Choose appropriate extract_depth** based on content complexity
|
||||
5. **Always set a limit** to prevent runaway crawls and unexpected costs
|
||||
6. **Monitor failed_results** and adjust patterns accordingly
|
||||
7. **Use Map first** to understand site structure before committing to full crawl
|
||||
8. **Implement error handling** for rate limits and failures
|
||||
9. **Respect robots.txt** and site policies
|
||||
|
||||
> Crawling is powerful but resource-intensive. Focus your crawls, start small, monitor results, and scale gradually based on actual needs.
|
||||
|
||||
For more details, see the [full API reference](https://docs.tavily.com/documentation/api-reference/endpoint/crawl)
|
||||
@@ -0,0 +1,249 @@
|
||||
# Extract API Reference
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Extraction Approaches](#extraction-approaches)
|
||||
- [Key Parameters](#key-parameters)
|
||||
- [Query and Chunks](#query-and-chunks)
|
||||
- [Extract Depth](#extract-depth)
|
||||
- [Advanced Filtering Strategies](#advanced-filtering-strategies)
|
||||
- [Response Fields](#response-fields)
|
||||
- [Summary](#summary)
|
||||
|
||||
---
|
||||
|
||||
## Extraction Approaches
|
||||
|
||||
### Search with include_raw_content
|
||||
|
||||
Get search results and content in one call:
|
||||
|
||||
```python
|
||||
response = client.search(
|
||||
query="AI healthcare applications",
|
||||
include_raw_content=True,
|
||||
max_results=5
|
||||
)
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Quick prototyping
|
||||
- Simple queries where search results are likely relevant
|
||||
- Single API call convenience
|
||||
|
||||
### Direct Extract API (Recommended)
|
||||
|
||||
Two-step pattern for more control:
|
||||
|
||||
```python
|
||||
# Step 1: Search
|
||||
search_results = client.search(
|
||||
query="Python async best practices",
|
||||
max_results=10
|
||||
)
|
||||
|
||||
# Step 2: Filter by relevance score
|
||||
relevant_urls = [
|
||||
r["url"] for r in search_results["results"]
|
||||
if r["score"] > 0.5
|
||||
]
|
||||
|
||||
# Step 3: Extract with targeting
|
||||
extracted = client.extract(
|
||||
urls=relevant_urls[:20],
|
||||
query="async patterns and concurrency", # Reranks chunks
|
||||
chunks_per_source=3 # Prevents context explosion
|
||||
)
|
||||
|
||||
for item in extracted["results"]:
|
||||
print(f"URL: {item['url']}")
|
||||
print(f"Content: {item['raw_content'][:500]}...")
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- You want control over which URLs to extract
|
||||
- You need to filter/curate URLs before extraction
|
||||
- You want targeted extraction with query and chunks_per_source
|
||||
|
||||
---
|
||||
|
||||
## Key Parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `urls` | string/array | Required | Single URL or list (max 20) |
|
||||
| `extract_depth` | enum | `"basic"` | `"basic"` or `"advanced"` (for complex/JS pages) |
|
||||
| `query` | string | null | Reranks chunks by relevance to this query |
|
||||
| `chunks_per_source` | integer | 3 | Chunks per source (1-5, max 500 chars each). Only with `query` |
|
||||
| `format` | enum | `"markdown"` | Output: `"markdown"` or `"text"` |
|
||||
| `include_images` | boolean | false | Include image URLs |
|
||||
| `include_favicon` | boolean | false | Include favicon URL |
|
||||
| `include_usage` | boolean | false | Include credit consumption data in response |
|
||||
| `timeout` | float | varies | Max wait time (1.0-60.0 seconds) |
|
||||
|
||||
---
|
||||
|
||||
## Query and Chunks
|
||||
|
||||
Use `query` and `chunks_per_source` to get only relevant content and prevent context window explosion:
|
||||
|
||||
```python
|
||||
extracted = client.extract(
|
||||
urls=[
|
||||
"https://example.com/ml-healthcare",
|
||||
"https://example.com/ai-diagnostics",
|
||||
"https://example.com/medical-ai"
|
||||
],
|
||||
query="AI diagnostic tools accuracy",
|
||||
chunks_per_source=2 # 2 most relevant chunks per URL
|
||||
)
|
||||
```
|
||||
|
||||
**When to use query:**
|
||||
- To extract only relevant portions of long documents
|
||||
- When you need focused content instead of full page extraction
|
||||
- For targeted information retrieval from specific URLs
|
||||
|
||||
**Key benefits of chunks_per_source:**
|
||||
- Returns only relevant snippets (max 500 chars each) instead of full page
|
||||
- Chunks appear in `raw_content` as: `<chunk 1> [...] <chunk 2> [...] <chunk 3>`
|
||||
- Prevents context window from exploding in agentic use cases
|
||||
|
||||
**Note:** `chunks_per_source` only works when `query` is provided.
|
||||
|
||||
---
|
||||
|
||||
## Extract Depth
|
||||
|
||||
| Depth | When to use |
|
||||
|-------|-------------|
|
||||
| `basic` (default) | Simple text extraction, faster |
|
||||
| `advanced` | Dynamic/JS-rendered pages, tables, structured data, embedded media |
|
||||
|
||||
```python
|
||||
# For complex pages
|
||||
extracted = client.extract(
|
||||
urls=["https://example.com/complex-page"],
|
||||
extract_depth="advanced"
|
||||
)
|
||||
```
|
||||
|
||||
**Fallback strategy:** If `basic` fails, retry with `advanced`:
|
||||
|
||||
```python
|
||||
result = client.extract(urls=[url], extract_depth="basic")
|
||||
if url in [f["url"] for f in result.get("failed_results", [])]:
|
||||
result = client.extract(urls=[url], extract_depth="advanced")
|
||||
```
|
||||
|
||||
**Timeout tuning:** If latency isn't critical, set `timeout=60.0` for better success on slow pages.
|
||||
|
||||
---
|
||||
|
||||
## Advanced Filtering Strategies
|
||||
|
||||
Beyond query-based filtering, consider these approaches before extraction:
|
||||
|
||||
| Strategy | When to use |
|
||||
|----------|-------------|
|
||||
| Score-based | Filter search results by relevance score |
|
||||
| Domain-based | Filter by trusted domains |
|
||||
| Re-ranking | Use dedicated re-ranking models for precision |
|
||||
| LLM-based | Let an LLM assess relevance before extraction |
|
||||
| Clustering | Group similar documents, extract from clusters |
|
||||
|
||||
### Optimal Workflow
|
||||
|
||||
1. **Search** to discover relevant URLs
|
||||
2. **Filter** by relevance score, domain, or content snippet
|
||||
3. **Re-rank** if needed using specialized models
|
||||
4. **Extract** from top-ranked sources with query and chunks_per_source
|
||||
5. **Validate** extracted content quality
|
||||
6. **Process** for your AI application
|
||||
|
||||
### Example: Complete Pipeline
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from tavily import AsyncTavilyClient
|
||||
|
||||
client = AsyncTavilyClient()
|
||||
|
||||
async def content_pipeline(topic):
|
||||
# 1. Search with sub-queries for breadth
|
||||
queries = [
|
||||
f"{topic} overview",
|
||||
f"{topic} best practices",
|
||||
f"{topic} recent developments"
|
||||
]
|
||||
responses = await asyncio.gather(
|
||||
*(client.search(q, search_depth="advanced", max_results=10) for q in queries)
|
||||
)
|
||||
|
||||
# 2. Filter and aggregate by score
|
||||
urls = []
|
||||
for response in responses:
|
||||
urls.extend([
|
||||
r['url'] for r in response['results']
|
||||
if r['score'] > 0.5
|
||||
])
|
||||
|
||||
# 3. Deduplicate
|
||||
urls = list(set(urls))[:20]
|
||||
|
||||
# 4. Extract with error handling
|
||||
extracted = await asyncio.gather(
|
||||
*(client.extract(urls=[url], query=topic, extract_depth="advanced")
|
||||
for url in urls),
|
||||
return_exceptions=True
|
||||
)
|
||||
|
||||
# 5. Filter successful extractions
|
||||
return [e for e in extracted if not isinstance(e, Exception)]
|
||||
|
||||
asyncio.run(content_pipeline("machine learning in healthcare"))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Response Fields
|
||||
|
||||
**Top-level response:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `results` | Array of successfully extracted content |
|
||||
| `failed_results` | Array of URLs that failed extraction |
|
||||
| `response_time` | Time in seconds |
|
||||
| `request_id` | Unique identifier for support reference |
|
||||
| `usage` | Credit usage info (if `include_usage=True`) |
|
||||
|
||||
**Each result object:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `url` | The URL extracted from |
|
||||
| `raw_content` | Full content, or top-ranked chunks joined by `[...]` when `query` provided |
|
||||
| `images` | Array of image URLs (if `include_images=true`) |
|
||||
| `favicon` | Favicon URL (if `include_favicon=true`) |
|
||||
|
||||
**Each failed_results object:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `url` | The URL that failed |
|
||||
| `error` | Error message |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
1. **Use query and chunks_per_source** for targeted, focused extraction
|
||||
2. **Choose Extract API** when you need control over which URLs to extract from
|
||||
3. **Filter URLs** before extraction using scores, re-ranking, or domain trust
|
||||
4. **Choose appropriate extract_depth** based on content complexity
|
||||
5. **Process URLs concurrently** with async operations for better performance
|
||||
6. **Implement error handling** to manage failed extractions gracefully
|
||||
7. **Validate extracted content** before downstream processing
|
||||
|
||||
For more details, see the [full API reference](https://docs.tavily.com/documentation/api-reference/endpoint/extract)
|
||||
@@ -0,0 +1,717 @@
|
||||
# Framework Integrations
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [LangChain](#langchain)
|
||||
- [Pydantic AI](#pydantic-ai)
|
||||
- [LlamaIndex](#llamaindex)
|
||||
- [Agno](#agno)
|
||||
- [OpenAI Function Calling](#openai-function-calling)
|
||||
- [Anthropic Tool Calling](#anthropic-tool-calling)
|
||||
- [Google ADK](#google-adk)
|
||||
- [Vercel AI SDK](#vercel-ai-sdk)
|
||||
- [CrewAI](#crewai)
|
||||
- [No-Code Platforms](#no-code-platforms)
|
||||
|
||||
---
|
||||
|
||||
## LangChain
|
||||
|
||||
We recommend the official `langchain-tavily` package for LangChain integrations.
|
||||
|
||||
> Warning: `langchain_community.tools.tavily_search.tool` is deprecated. Migrate to `langchain-tavily` for actively maintained Search, Extract, Map, Crawl, and Research tools.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install -U langchain-tavily
|
||||
```
|
||||
|
||||
### Credentials
|
||||
|
||||
```python
|
||||
import getpass
|
||||
import os
|
||||
|
||||
if not os.environ.get("TAVILY_API_KEY"):
|
||||
os.environ["TAVILY_API_KEY"] = getpass.getpass("Tavily API key:\n")
|
||||
```
|
||||
|
||||
### Tavily Search
|
||||
|
||||
**Available parameters**
|
||||
- `max_results` (default: `5`)
|
||||
- `topic` (`"general"`, `"news"`, `"finance"`)
|
||||
- `include_answer`
|
||||
- `include_raw_content`
|
||||
- `include_images`
|
||||
- `include_image_descriptions`
|
||||
- `search_depth` (`"basic"` or `"advanced"`)
|
||||
- `time_range` (`"day"`, `"week"`, `"month"`, `"year"`)
|
||||
- `start_date` (`YYYY-MM-DD`)
|
||||
- `end_date` (`YYYY-MM-DD`)
|
||||
- `include_domains`
|
||||
- `exclude_domains`
|
||||
- `include_usage`
|
||||
|
||||
**Instantiation**
|
||||
|
||||
```python
|
||||
from langchain_tavily import TavilySearch
|
||||
|
||||
tavily_search = TavilySearch(
|
||||
max_results=5,
|
||||
topic="general"
|
||||
)
|
||||
```
|
||||
|
||||
**Invoke directly with args**
|
||||
- Required: `query`
|
||||
- Can also be overridden at invocation: `include_images`, `search_depth`, `time_range`, `include_domains`, `exclude_domains`, `start_date`, `end_date`
|
||||
- `include_answer` and `include_raw_content` should be set at instantiation time for predictable response sizes
|
||||
|
||||
```python
|
||||
result = tavily_search.invoke({"query": "What happened at the last Wimbledon?"})
|
||||
```
|
||||
|
||||
**Use with agent**
|
||||
|
||||
```python
|
||||
from langchain.agents import create_agent
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
agent = create_agent(
|
||||
model=ChatOpenAI(model="gpt-5"),
|
||||
tools=[tavily_search],
|
||||
system_prompt="You are a helpful research assistant. Use web search to find accurate, up-to-date information.",
|
||||
)
|
||||
response = agent.invoke({
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": "What is the most popular sport in the world? Include only Wikipedia sources.",
|
||||
}]
|
||||
})
|
||||
```
|
||||
|
||||
Tip: include today's date in the system prompt for time-aware queries.
|
||||
|
||||
### Tavily Extract
|
||||
|
||||
**Available parameters**
|
||||
- `extract_depth` (`"basic"` or `"advanced"`)
|
||||
- `include_images`
|
||||
|
||||
```python
|
||||
from langchain_tavily import TavilyExtract
|
||||
|
||||
tavily_extract = TavilyExtract(
|
||||
extract_depth="basic", # or "advanced"
|
||||
# include_images=False,
|
||||
)
|
||||
|
||||
result = tavily_extract.invoke({
|
||||
"urls": ["https://en.wikipedia.org/wiki/Lionel_Messi"]
|
||||
})
|
||||
```
|
||||
|
||||
### Tavily Map/Crawl
|
||||
|
||||
```python
|
||||
from langchain_tavily import TavilyMap
|
||||
|
||||
tavily_map = TavilyMap()
|
||||
|
||||
result = tavily_map.invoke({
|
||||
"url": "https://docs.example.com",
|
||||
"instructions": "Find all documentation and tutorial pages"
|
||||
})
|
||||
# Returns: {"base_url": ..., "results": [urls...], "response_time": ...}
|
||||
```
|
||||
|
||||
```python
|
||||
from langchain_tavily import TavilyCrawl
|
||||
|
||||
tavily_crawl = TavilyCrawl()
|
||||
|
||||
result = tavily_crawl.invoke({
|
||||
"url": "https://docs.example.com",
|
||||
"instructions": "Extract API documentation and code examples"
|
||||
})
|
||||
# Returns: {"base_url": ..., "results": [{url, raw_content}...], "response_time": ...}
|
||||
```
|
||||
|
||||
### Tavily Research
|
||||
|
||||
**Available parameters**
|
||||
- `input` (required)
|
||||
- `model` (`"mini"`, `"pro"`, `"auto"`)
|
||||
- `output_schema`
|
||||
- `stream`
|
||||
- `citation_format` (`"numbered"`, `"mla"`, `"apa"`, `"chicago"`)
|
||||
|
||||
```python
|
||||
from langchain_tavily import TavilyResearch
|
||||
|
||||
tavily_research = TavilyResearch()
|
||||
|
||||
result = tavily_research.invoke({
|
||||
"input": "Research the latest developments in AI and summarize key trends.",
|
||||
"model": "mini",
|
||||
"citation_format": "apa"
|
||||
})
|
||||
```
|
||||
|
||||
### Tavily Get Research
|
||||
|
||||
```python
|
||||
from langchain_tavily import TavilyGetResearch
|
||||
|
||||
tavily_get_research = TavilyGetResearch()
|
||||
final = tavily_get_research.invoke({"request_id": result["request_id"]})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pydantic AI
|
||||
|
||||
Tavily is available for integration through Pydantic AI.
|
||||
|
||||
### Introduction
|
||||
|
||||
Integrate Tavily with Pydantic AI to enhance your AI agents with powerful web search capabilities. Pydantic AI provides a framework for building AI agents with tools, making it easy to incorporate real-time web search and data extraction into your applications.
|
||||
|
||||
### Step-by-Step Integration Guide
|
||||
|
||||
#### Step 1: Install Required Packages
|
||||
|
||||
Install the necessary Python packages:
|
||||
|
||||
```bash
|
||||
pip install "pydantic-ai-slim[tavily]"
|
||||
```
|
||||
|
||||
#### Step 2: Set Up API Keys
|
||||
|
||||
- Tavily API Key: [Get your Tavily API key](https://app.tavily.com/home)
|
||||
|
||||
Set this as an environment variable:
|
||||
|
||||
```bash
|
||||
export TAVILY_API_KEY=your_tavily_api_key
|
||||
```
|
||||
|
||||
#### Step 3: Initialize Pydantic AI Agent with Tavily Tools
|
||||
|
||||
```python
|
||||
import os
|
||||
from pydantic_ai.agent import Agent
|
||||
from pydantic_ai.common_tools.tavily import tavily_search_tool
|
||||
|
||||
# Get API key from environment
|
||||
api_key = os.getenv("TAVILY_API_KEY")
|
||||
assert api_key is not None
|
||||
|
||||
# Initialize the agent with Tavily tools
|
||||
agent = Agent(
|
||||
"openai:o3-mini",
|
||||
tools=[tavily_search_tool(api_key)],
|
||||
system_prompt="Search Tavily for the given query and return the results.",
|
||||
)
|
||||
```
|
||||
|
||||
#### Step 4: Example Use Cases
|
||||
|
||||
```python
|
||||
# Example 1: Basic search for news
|
||||
result = agent.run_sync("Tell me the top news in the GenAI world, give me links.")
|
||||
print(result.output)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## LlamaIndex
|
||||
|
||||
```python
|
||||
from llama_index.tools.tavily_research import TavilyToolSpec
|
||||
|
||||
# Initialize tools
|
||||
tavily_tool = TavilyToolSpec(api_key="tvly-YOUR_API_KEY")
|
||||
tools = tavily_tool.to_tool_list()
|
||||
|
||||
# Use with agent
|
||||
from llama_index.agent.openai import OpenAIAgent
|
||||
|
||||
agent = OpenAIAgent.from_tools(tools)
|
||||
response = agent.chat("What are the latest AI developments?")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agno
|
||||
|
||||
Tavily is available for integration through Agno, a lightweight framework for building agents with tools, memory, and reasoning.
|
||||
|
||||
### Introduction
|
||||
|
||||
Integrate Tavily with Agno to enhance your AI agents with powerful web search capabilities. Agno makes it easy to incorporate real-time web search and data extraction into your AI applications.
|
||||
|
||||
### Step-by-Step Integration Guide
|
||||
|
||||
#### Step 1: Install Required Packages
|
||||
|
||||
```bash
|
||||
pip install agno tavily-python
|
||||
```
|
||||
|
||||
#### Step 2: Set Up API Keys
|
||||
|
||||
- Tavily API Key: [Get your Tavily API key](https://app.tavily.com/home)
|
||||
- OpenAI API Key: [Get your OpenAI API key](https://platform.openai.com/api-keys)
|
||||
|
||||
Set these as environment variables:
|
||||
|
||||
```bash
|
||||
export TAVILY_API_KEY=your_tavily_api_key
|
||||
export OPENAI_API_KEY=your_openai_api_key
|
||||
```
|
||||
|
||||
#### Step 3: Initialize Agno Agent with Tavily Tools
|
||||
|
||||
```python
|
||||
from agno.agent import Agent
|
||||
from agno.tools.tavily import TavilyTools
|
||||
|
||||
# Initialize the agent with Tavily tools
|
||||
agent = Agent(
|
||||
tools=[
|
||||
TavilyTools(
|
||||
search=True, # Enable search functionality
|
||||
max_tokens=8000, # Increase max tokens for detailed results
|
||||
search_depth="advanced", # Use advanced search for comprehensive results
|
||||
format="markdown", # Format results as markdown
|
||||
)
|
||||
],
|
||||
show_tool_calls=True,
|
||||
)
|
||||
```
|
||||
|
||||
#### Step 4: Example Use Cases
|
||||
|
||||
```python
|
||||
# Example 1: Basic search with default parameters
|
||||
agent.print_response("Latest developments in quantum computing", markdown=True)
|
||||
|
||||
# Example 2: Market research with multiple parameters
|
||||
agent.print_response(
|
||||
"Analyze the competitive landscape of AI-powered customer service solutions in 2026, "
|
||||
"focusing on market leaders and emerging trends",
|
||||
markdown=True,
|
||||
)
|
||||
|
||||
# Example 3: Technical documentation search
|
||||
agent.print_response(
|
||||
"Find the latest documentation and tutorials about Python async programming, "
|
||||
"focusing on asyncio and FastAPI",
|
||||
markdown=True,
|
||||
)
|
||||
|
||||
# Example 4: News aggregation
|
||||
agent.print_response(
|
||||
"Gather the latest news about artificial intelligence from tech news websites "
|
||||
"published in the last week",
|
||||
markdown=True,
|
||||
)
|
||||
```
|
||||
|
||||
### Additional Use Cases
|
||||
|
||||
- Content curation: Gather and organize information from multiple sources
|
||||
- Real-time data integration: Keep your AI agents up to date with the latest information
|
||||
- Technical documentation: Search and analyze technical documentation
|
||||
- Market analysis: Conduct comprehensive market research and analysis
|
||||
|
||||
---
|
||||
|
||||
## OpenAI Function Calling
|
||||
|
||||
Define Tavily as an OpenAI function:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
from tavily import TavilyClient
|
||||
import json
|
||||
|
||||
openai_client = OpenAI()
|
||||
tavily_client = TavilyClient()
|
||||
|
||||
tools = [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "web_search",
|
||||
"description": "Search the web for current information",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "The search query"
|
||||
}
|
||||
},
|
||||
"required": ["query"]
|
||||
}
|
||||
}
|
||||
}]
|
||||
|
||||
def handle_tool_call(tool_call):
|
||||
if tool_call.function.name == "web_search":
|
||||
args = json.loads(tool_call.function.arguments)
|
||||
return tavily_client.search(args["query"])
|
||||
|
||||
# Chat completion with tools
|
||||
response = openai_client.chat.completions.create(
|
||||
model="gpt-4",
|
||||
messages=[{"role": "user", "content": "What are the latest AI trends?"}],
|
||||
tools=tools
|
||||
)
|
||||
|
||||
if response.choices[0].message.tool_calls:
|
||||
tool_call = response.choices[0].message.tool_calls[0]
|
||||
search_results = handle_tool_call(tool_call)
|
||||
|
||||
# Continue conversation with results
|
||||
messages = [
|
||||
{"role": "user", "content": "What are the latest AI trends?"},
|
||||
response.choices[0].message,
|
||||
{"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(search_results)}
|
||||
]
|
||||
final = openai_client.chat.completions.create(
|
||||
model="gpt-4",
|
||||
messages=messages
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anthropic Tool Calling
|
||||
|
||||
Integrate Tavily with Anthropic Claude to add real-time web search in tool-calling workflows.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install anthropic tavily-python
|
||||
```
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="your-anthropic-api-key"
|
||||
export TAVILY_API_KEY="your-tavily-api-key"
|
||||
```
|
||||
|
||||
### Using Tavily With Anthropic Tool Calling
|
||||
|
||||
```python
|
||||
import json
|
||||
import os
|
||||
from anthropic import Anthropic
|
||||
from tavily import TavilyClient
|
||||
|
||||
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
|
||||
tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
|
||||
MODEL_NAME = "claude-sonnet"
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
#### System prompt
|
||||
|
||||
```python
|
||||
SYSTEM_PROMPT = (
|
||||
"You are a research assistant. Use the tavily_search tool when needed. "
|
||||
"After tools run and tool results are provided back to you, produce a concise, "
|
||||
"well-structured summary with key bullets and a Sources section listing URLs."
|
||||
)
|
||||
```
|
||||
|
||||
#### Tool schema
|
||||
|
||||
```python
|
||||
tools = [
|
||||
{
|
||||
"name": "tavily_search",
|
||||
"description": "Search the web using Tavily and return relevant links and summaries.",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {"type": "string", "description": "Search query string."},
|
||||
"max_results": {"type": "integer", "default": 5},
|
||||
"search_depth": {
|
||||
"type": "string",
|
||||
"enum": ["basic", "advanced"],
|
||||
"default": "basic",
|
||||
},
|
||||
},
|
||||
"required": ["query"],
|
||||
},
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
#### Tool execution
|
||||
|
||||
```python
|
||||
def tavily_search(**kwargs):
|
||||
return tavily_client.search(**kwargs)
|
||||
|
||||
def process_tool_call(name, args):
|
||||
if name == "tavily_search":
|
||||
return tavily_search(**args)
|
||||
raise ValueError(f"Unknown tool: {name}")
|
||||
```
|
||||
|
||||
#### Main chat function
|
||||
|
||||
```python
|
||||
def chat_with_claude(user_message: str):
|
||||
# Call 1: allow tool use
|
||||
initial_response = client.messages.create(
|
||||
model=MODEL_NAME,
|
||||
max_tokens=4096,
|
||||
system=SYSTEM_PROMPT,
|
||||
messages=[{"role": "user", "content": [{"type": "text", "text": user_message}]}],
|
||||
tools=tools,
|
||||
)
|
||||
|
||||
# If Claude answers without tools, return text directly
|
||||
if initial_response.stop_reason != "tool_use":
|
||||
return "".join(
|
||||
block.text for block in initial_response.content
|
||||
if getattr(block, "type", None) == "text"
|
||||
)
|
||||
|
||||
# Execute all requested tools
|
||||
tool_result_blocks = []
|
||||
for block in initial_response.content:
|
||||
if getattr(block, "type", None) == "tool_use":
|
||||
result = process_tool_call(block.name, block.input)
|
||||
tool_result_blocks.append(
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": json.dumps(result),
|
||||
}
|
||||
)
|
||||
|
||||
# Call 2: send tool results and ask Claude for final synthesis
|
||||
final_response = client.messages.create(
|
||||
model=MODEL_NAME,
|
||||
max_tokens=4096,
|
||||
system=SYSTEM_PROMPT,
|
||||
messages=[
|
||||
{"role": "user", "content": [{"type": "text", "text": user_message}]},
|
||||
{"role": "assistant", "content": initial_response.content},
|
||||
{"role": "user", "content": tool_result_blocks},
|
||||
{
|
||||
"role": "user",
|
||||
"content": [{
|
||||
"type": "text",
|
||||
"text": "Please synthesize the final answer now based on the tool results above. Include 3-7 bullets and a Sources section with URLs.",
|
||||
}],
|
||||
},
|
||||
],
|
||||
)
|
||||
|
||||
return "".join(
|
||||
block.text for block in final_response.content
|
||||
if getattr(block, "type", None) == "text"
|
||||
)
|
||||
```
|
||||
|
||||
### Usage example
|
||||
|
||||
```python
|
||||
chat_with_claude("What is trending now in the agents space in 2026?")
|
||||
```
|
||||
|
||||
Reference: https://docs.tavily.com/documentation/integrations/anthropic
|
||||
|
||||
---
|
||||
|
||||
## Google ADK
|
||||
|
||||
Google ADK can connect to Tavily through Tavily's remote MCP server, giving your Gemini-based agent live search, extraction, and site exploration capabilities.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.9+
|
||||
- Tavily API key: https://app.tavily.com/home
|
||||
- Gemini API key: https://aistudio.google.com/app/apikey
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install google-adk mcp
|
||||
```
|
||||
|
||||
### Agent Setup
|
||||
|
||||
```python
|
||||
import os
|
||||
from google.adk.agents import Agent
|
||||
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams
|
||||
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
|
||||
|
||||
tavily_api_key = os.getenv("TAVILY_API_KEY")
|
||||
|
||||
root_agent = Agent(
|
||||
model="gemini-2.5-pro",
|
||||
name="tavily_agent",
|
||||
instruction=(
|
||||
"You are a helpful assistant that uses Tavily to search the web, "
|
||||
"extract content, and explore websites. Use Tavily tools to provide "
|
||||
"up-to-date information."
|
||||
),
|
||||
tools=[
|
||||
MCPToolset(
|
||||
connection_params=StreamableHTTPServerParams(
|
||||
url="https://mcp.tavily.com/mcp/",
|
||||
headers={"Authorization": f"Bearer {tavily_api_key}"},
|
||||
)
|
||||
)
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
export GOOGLE_API_KEY="your_gemini_api_key_here"
|
||||
export TAVILY_API_KEY="your_tavily_api_key_here"
|
||||
```
|
||||
|
||||
### Run
|
||||
|
||||
```bash
|
||||
adk create my_agent
|
||||
adk run my_agent
|
||||
# Optional web UI:
|
||||
adk web --port 8000
|
||||
```
|
||||
|
||||
### Available Tavily MCP tools
|
||||
|
||||
- `tavily-search`
|
||||
- `tavily-extract`
|
||||
- `tavily-map`
|
||||
- `tavily-crawl`
|
||||
|
||||
Reference: https://docs.tavily.com/documentation/integrations/google-adk
|
||||
|
||||
---
|
||||
|
||||
## Vercel AI SDK
|
||||
|
||||
The `@tavily/ai-sdk` package provides pre-built tools for Vercel AI SDK v5.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
npm install ai @ai-sdk/openai @tavily/ai-sdk
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```typescript
|
||||
import { tavilySearch, tavilyCrawl } from "@tavily/ai-sdk";
|
||||
import { generateText } from "ai";
|
||||
import { openai } from "@ai-sdk/openai";
|
||||
|
||||
// Search
|
||||
const result = await generateText({
|
||||
model: openai("gpt-4"),
|
||||
prompt: "What are the latest AI developments?",
|
||||
tools: {
|
||||
tavilySearch: tavilySearch({
|
||||
maxResults: 5,
|
||||
searchDepth: "advanced",
|
||||
}),
|
||||
},
|
||||
});
|
||||
|
||||
// Crawl
|
||||
const crawlResult = await generateText({
|
||||
model: openai("gpt-4"),
|
||||
prompt: "Crawl tavily.com and summarize their features",
|
||||
tools: {
|
||||
tavilyCrawl: tavilyCrawl({
|
||||
maxDepth: 2,
|
||||
limit: 50,
|
||||
}),
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
**Available tools:** `tavilySearch`, `tavilyExtract`, `tavilyCrawl`, `tavilyMap`
|
||||
|
||||
---
|
||||
|
||||
## CrewAI
|
||||
|
||||
CrewAI provides built-in Tavily tools for multi-agent workflows.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
import os
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import TavilySearchTool, TavilyExtractTool
|
||||
|
||||
os.environ["TAVILY_API_KEY"] = "your-api-key"
|
||||
|
||||
# Search tool
|
||||
search_tool = TavilySearchTool()
|
||||
|
||||
# Create agent with Tavily
|
||||
researcher = Agent(
|
||||
role="Research Analyst",
|
||||
goal="Find and analyze information on given topics",
|
||||
tools=[search_tool],
|
||||
backstory="Expert at finding relevant information online"
|
||||
)
|
||||
|
||||
task = Task(
|
||||
description="Research the latest developments in quantum computing",
|
||||
expected_output="A comprehensive summary with sources",
|
||||
agent=researcher
|
||||
)
|
||||
|
||||
crew = Crew(agents=[researcher], tasks=[task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## No-Code Platforms
|
||||
|
||||
Tavily integrates with popular no-code automation platforms:
|
||||
|
||||
| Platform | Features | Best For |
|
||||
|----------|----------|----------|
|
||||
| **Zapier** | Search, Extract | CRM enrichment, automated research |
|
||||
| **Make** | Search, Extract | Complex workflows, multi-step automations |
|
||||
| **n8n** | Search, Extract, AI Agent tool | Self-hosted, AI agent workflows |
|
||||
| **Dify** | Search, Extract | No-code AI apps, chatflows |
|
||||
| **FlowiseAI** | Search | Visual LLM builders, RAG systems |
|
||||
| **Langflow** | Search, Extract | Visual agent building |
|
||||
|
||||
---
|
||||
|
||||
## Additional Integrations
|
||||
See the [full integrations documentation](https://docs.tavily.com/documentation/integrations) for complete guides.
|
||||
@@ -0,0 +1,315 @@
|
||||
# Research API Reference
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Prompting Best Practices](#prompting-best-practices)
|
||||
- [Model Selection](#model-selection)
|
||||
- [Key Parameters](#key-parameters)
|
||||
- [Basic Usage](#basic-usage)
|
||||
- [Streaming vs Polling](#streaming-vs-polling)
|
||||
- [Structured Output vs Report](#structured-output-vs-report)
|
||||
- [Response Fields](#response-fields)
|
||||
- [Summary](#summary)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Research API conducts comprehensive research on any topic with automatic source gathering, analysis, and response generation with citations. It's an end-to-end solution when you need AI-powered research without building your own pipeline.
|
||||
|
||||
---
|
||||
|
||||
## Prompting Best Practices
|
||||
|
||||
Define a **clear goal** with all **details** and **direction**.
|
||||
|
||||
**Guidelines:**
|
||||
- **Be specific when you can.** Include known details: target market, competitors, geography, constraints
|
||||
- **Stay open-ended only for discovery.** Make it explicit: "tell me about the most impactful AI innovations in healthcare in 2025"
|
||||
- **Avoid contradictions.** Don't include conflicting constraints or goals
|
||||
- **Share what's already known.** Include prior assumptions so research doesn't repeat existing knowledge
|
||||
- **Keep prompts clean and directed.** Clear task + essential context + desired output format
|
||||
|
||||
### Example Queries
|
||||
|
||||
**Company research:**
|
||||
```
|
||||
Research the company ____ and its 2026 outlook. Provide a brief overview
|
||||
of the company, its products, services, and market position.
|
||||
```
|
||||
|
||||
**Competitive analysis:**
|
||||
```
|
||||
Conduct a competitive analysis of ____ in 2026. Identify their main
|
||||
competitors, compare market positioning, and analyze key differentiators.
|
||||
```
|
||||
|
||||
**With prior context:**
|
||||
```
|
||||
We're evaluating Notion as a potential partner. We already know they
|
||||
primarily serve SMB and mid-market teams, expanded their AI features
|
||||
significantly in 2025, and most often compete with Confluence and ClickUp.
|
||||
Research Notion's 2026 outlook, including market position, growth risks,
|
||||
and where a partnership could be most valuable. Include citations.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Selection
|
||||
|
||||
| Model | Best For |
|
||||
|-------|----------|
|
||||
| `pro` | Comprehensive, multi-agent research for complex, multi-domain topics |
|
||||
| `mini` | Targeted, efficient research for narrow or well-scoped questions |
|
||||
| `auto` | When unsure how complex research will be (default) |
|
||||
|
||||
### Pro Model
|
||||
|
||||
Multi-agent research suited for complex topics spanning multiple subtopics or domains. Use for deeper analysis, thorough reports, or maximum accuracy.
|
||||
|
||||
```python
|
||||
result = client.research(
|
||||
input="Analyze the competitive landscape for ____ in the SMB market, "
|
||||
"including key competitors, positioning, pricing models, customer "
|
||||
"segments, recent product moves, and defensible advantages or risks "
|
||||
"over the next 2-3 years.",
|
||||
model="pro"
|
||||
)
|
||||
```
|
||||
|
||||
### Mini Model
|
||||
|
||||
Optimized for targeted, efficient research. Best for narrow or well-scoped questions where you still benefit from agentic searching and synthesis.
|
||||
|
||||
```python
|
||||
result = client.research(
|
||||
input="What are the top 5 competitors to ____ in the SMB market, and how do they differentiate?",
|
||||
model="mini"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Parameters
|
||||
|
||||
### research()
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `input` | string | Required | The research topic or question |
|
||||
| `model` | enum | `"auto"` | `"mini"`, `"pro"`, or `"auto"` |
|
||||
| `stream` | boolean | false | Enable streaming responses |
|
||||
| `output_schema` | object | null | JSON Schema for structured output |
|
||||
| `citation_format` | enum | `"numbered"` | `"numbered"`, `"mla"`, `"apa"`, `"chicago"` |
|
||||
|
||||
### get_research()
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `request_id` | string | Task ID from `research()` response |
|
||||
|
||||
---
|
||||
|
||||
## Basic Usage
|
||||
|
||||
Research tasks are two-step: initiate with `research()`, retrieve with `get_research()`.
|
||||
|
||||
```python
|
||||
import time
|
||||
from tavily import TavilyClient
|
||||
|
||||
client = TavilyClient()
|
||||
|
||||
# Step 1: Start research task
|
||||
result = client.research(
|
||||
input="Latest developments in quantum computing and their practical applications",
|
||||
model="pro"
|
||||
)
|
||||
request_id = result["request_id"]
|
||||
|
||||
# Step 2: Poll until completed
|
||||
response = client.get_research(request_id)
|
||||
while response["status"] not in ["completed", "failed"]:
|
||||
print(f"Status: {response['status']}... polling again in 10 seconds")
|
||||
time.sleep(10)
|
||||
response = client.get_research(request_id)
|
||||
|
||||
# Step 3: Handle result
|
||||
if response["status"] == "failed":
|
||||
raise RuntimeError(f"Research failed: {response.get('error', 'Unknown error')}")
|
||||
|
||||
report = response["content"]
|
||||
sources = response["sources"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Streaming vs Polling
|
||||
|
||||
**Streaming** — Best for user interfaces where you want real-time updates.
|
||||
**Polling** — Best for background processes where you check status periodically.
|
||||
|
||||
### Streaming
|
||||
|
||||
Enable real-time progress monitoring with `stream=True`.
|
||||
|
||||
```python
|
||||
stream = client.research(
|
||||
input="Latest developments in quantum computing",
|
||||
model="pro",
|
||||
stream=True
|
||||
)
|
||||
|
||||
for chunk in stream:
|
||||
print(chunk.decode('utf-8'))
|
||||
```
|
||||
|
||||
### Event Types
|
||||
|
||||
| Event Type | Description |
|
||||
|------------|-------------|
|
||||
| **Tool Call** | Agent initiates action (Planning, WebSearch, etc.) |
|
||||
| **Tool Response** | Results after tool execution with sources |
|
||||
| **Content** | Research report streamed as markdown (or JSON with `output_schema`) |
|
||||
| **Sources** | Complete list of sources, emitted after content |
|
||||
| **Done** | Signals completion |
|
||||
|
||||
### Tool Types
|
||||
|
||||
| Tool | Description | Models |
|
||||
|------|-------------|--------|
|
||||
| `Planning` | Initializes research strategy | mini, pro |
|
||||
| `WebSearch` | Executes web searches | mini, pro |
|
||||
| `Generating` | Creates final report | mini, pro |
|
||||
| `ResearchSubtopic` | Deep research on subtopics | pro only |
|
||||
|
||||
### Typical Flow
|
||||
|
||||
1. `Planning` tool_call → tool_response
|
||||
2. `WebSearch` tool_call → tool_response (with sources)
|
||||
3. `ResearchSubtopic` cycles (Pro mode only)
|
||||
4. `Generating` tool_call → tool_response
|
||||
5. `Content` chunks (markdown or structured JSON)
|
||||
6. `Sources` event
|
||||
7. `Done` event
|
||||
|
||||
See [streaming cookbook](https://github.com/tavily-ai/tavily-cookbook/blob/main/cookbooks/research/streaming.ipynb) and [polling cookbook](https://github.com/tavily-ai/tavily-cookbook/blob/main/cookbooks/research/polling.ipynb) for complete examples.
|
||||
|
||||
---
|
||||
|
||||
## Structured Output vs. Report
|
||||
|
||||
| Format | Best For |
|
||||
|--------|----------|
|
||||
| **Report** (default) | Reading, sharing, or displaying verbatim (chat interfaces, briefs, newsletters) |
|
||||
| **Structured Output** | Data enrichment, pipelines, or powering UIs with specific fields |
|
||||
|
||||
## Structured Output
|
||||
|
||||
Use `output_schema` to receive research in a predefined JSON structure.
|
||||
|
||||
```python
|
||||
schema = {
|
||||
"properties": {
|
||||
"summary": {
|
||||
"type": "string",
|
||||
"description": "Executive summary of findings"
|
||||
},
|
||||
"key_points": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Main takeaways from the research"
|
||||
},
|
||||
"metrics": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"market_size": {"type": "string", "description": "Total market size"},
|
||||
"growth_rate": {"type": "number", "description": "Annual growth percentage"}
|
||||
}
|
||||
}
|
||||
},
|
||||
"required": ["summary", "key_points"]
|
||||
}
|
||||
|
||||
result = client.research(
|
||||
input="Electric vehicle market analysis 2024",
|
||||
output_schema=schema
|
||||
)
|
||||
```
|
||||
|
||||
### Schema Best Practices
|
||||
|
||||
- **Write clear field descriptions.** 1-3 sentences explaining what the field should contain
|
||||
- **Match the structure you need.** Use arrays, objects, enums appropriately (e.g., `competitors: string[]`, not `"A, B, C"`)
|
||||
- **Avoid duplicate fields.** Keep each field unique and specific
|
||||
- **Use `required` arrays** to enforce mandatory fields at any nesting level
|
||||
|
||||
**Supported types:** `object`, `string`, `integer`, `number`, `array`
|
||||
|
||||
### Streaming with Structured Output
|
||||
|
||||
When `output_schema` is provided, content arrives as structured JSON:
|
||||
|
||||
```python
|
||||
stream = client.research(
|
||||
input="AI agent frameworks comparison",
|
||||
model="mini",
|
||||
stream=True,
|
||||
output_schema={
|
||||
"properties": {
|
||||
"summary": {"type": "string", "description": "Executive summary"},
|
||||
"key_points": {"type": "array", "items": {"type": "string"}}
|
||||
},
|
||||
"required": ["summary", "key_points"]
|
||||
}
|
||||
)
|
||||
|
||||
for chunk in stream:
|
||||
data = chunk.decode('utf-8')
|
||||
print(data) # Content chunks will be structured JSON
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Response Fields
|
||||
|
||||
### research() Response
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `request_id` | Unique identifier for tracking |
|
||||
| `created_at` | Timestamp when task was created |
|
||||
| `status` | Initial status |
|
||||
| `input` | The research topic submitted |
|
||||
| `model` | Model used by research agent |
|
||||
|
||||
### get_research() Response
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `status` | `"pending"`, `"processing"`, `"completed"`, `"failed"` |
|
||||
| `content` | Generated research report (when completed) |
|
||||
| `sources` | Array of source citations |
|
||||
| `response_time` | Time in seconds |
|
||||
|
||||
### Source Object
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `url` | Source URL |
|
||||
| `title` | Source title |
|
||||
| `citation` | Formatted citation string |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
1. **Be specific in prompts** — Include known details: target market, competitors, geography, constraints
|
||||
2. **Share prior context** — Include what you already know to avoid repetition
|
||||
3. **Choose the right model** — `mini` for focused queries, `pro` for comprehensive multi-domain analysis
|
||||
4. **Use streaming for UX** — Display real-time progress during long research tasks
|
||||
5. **Use structured output for pipelines** — Define schemas for consistent, parseable responses
|
||||
6. **Use reports for reading** — Default format is best for chat interfaces and sharing
|
||||
|
||||
For more examples, see the [Tavily Cookbook](https://github.com/tavily-ai/tavily-cookbook/tree/main/research) and [live demo](https://chat-research.tavily.com/).
|
||||
397
.config/opencode/skills/tavily-best-practices/references/sdk.md
Normal file
397
.config/opencode/skills/tavily-best-practices/references/sdk.md
Normal file
@@ -0,0 +1,397 @@
|
||||
# SDK Reference
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Python SDK](#python-sdk)
|
||||
- [JavaScript SDK](#javascript-sdk)
|
||||
- [Async Patterns](#async-patterns)
|
||||
- [Hybrid RAG](#hybrid-rag)
|
||||
|
||||
---
|
||||
|
||||
## Python SDK
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install tavily-python
|
||||
```
|
||||
|
||||
### Client Initialization
|
||||
|
||||
```python
|
||||
from tavily import TavilyClient
|
||||
|
||||
# Uses TAVILY_API_KEY env var (recommended)
|
||||
client = TavilyClient()
|
||||
|
||||
# Explicit API key
|
||||
client = TavilyClient(api_key="tvly-YOUR_API_KEY")
|
||||
|
||||
# With project tracking
|
||||
client = TavilyClient(api_key="tvly-YOUR_API_KEY", project_id="your-project-id")
|
||||
|
||||
# With proxies
|
||||
proxies = {"http": "<proxy>", "https": "<proxy>"}
|
||||
client = TavilyClient(api_key="tvly-YOUR_API_KEY", proxies=proxies)
|
||||
```
|
||||
|
||||
### Async Client
|
||||
|
||||
```python
|
||||
from tavily import AsyncTavilyClient
|
||||
|
||||
async_client = AsyncTavilyClient()
|
||||
|
||||
# Parallel queries
|
||||
import asyncio
|
||||
responses = await asyncio.gather(
|
||||
async_client.search("query 1"),
|
||||
async_client.search("query 2"),
|
||||
async_client.search("query 3")
|
||||
)
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### search()
|
||||
|
||||
```python
|
||||
response = client.search(
|
||||
query="quantum computing breakthroughs",
|
||||
search_depth="advanced", # "basic" | "advanced"
|
||||
topic="general", # "general" | "news" | "finance"
|
||||
max_results=10, # 0-20
|
||||
include_answer=False, # bool | "basic" | "advanced"
|
||||
include_raw_content=False, # bool | "markdown" | "text"
|
||||
include_images=False,
|
||||
time_range="week", # "day" | "week" | "month" | "year"
|
||||
include_domains=["arxiv.org"],
|
||||
exclude_domains=["reddit.com"],
|
||||
country="united states"
|
||||
)
|
||||
```
|
||||
|
||||
#### extract()
|
||||
|
||||
```python
|
||||
response = client.extract(
|
||||
urls=["https://example.com/page1", "https://example.com/page2"],
|
||||
extract_depth="basic", # "basic" | "advanced"
|
||||
format="markdown", # "markdown" | "text"
|
||||
include_images=False,
|
||||
query="focus query", # Reranks chunks by relevance
|
||||
chunks_per_source=3 # 1-5, requires query
|
||||
)
|
||||
```
|
||||
|
||||
#### crawl()
|
||||
|
||||
```python
|
||||
response = client.crawl(
|
||||
url="https://docs.example.com",
|
||||
max_depth=2, # 1-5
|
||||
max_breadth=20,
|
||||
limit=50,
|
||||
instructions="Find API documentation",
|
||||
chunks_per_source=3, # 1-5, requires instructions
|
||||
select_paths=["/docs/.*"],
|
||||
exclude_paths=["/blog/.*"],
|
||||
extract_depth="basic",
|
||||
format="markdown",
|
||||
allow_external=True
|
||||
)
|
||||
```
|
||||
|
||||
#### map()
|
||||
|
||||
```python
|
||||
response = client.map(
|
||||
url="https://docs.example.com",
|
||||
max_depth=2,
|
||||
max_breadth=20,
|
||||
limit=50,
|
||||
instructions="Find all API pages",
|
||||
select_paths=["/api/.*"],
|
||||
allow_external=False
|
||||
)
|
||||
```
|
||||
|
||||
#### research()
|
||||
|
||||
```python
|
||||
# Start research task
|
||||
result = client.research(
|
||||
input="Analyze competitive landscape for X",
|
||||
model="pro", # "mini" | "pro" | "auto"
|
||||
stream=False,
|
||||
output_schema=None, # JSON schema for structured output
|
||||
citation_format="numbered" # "numbered" | "mla" | "apa" | "chicago"
|
||||
)
|
||||
|
||||
# Poll for results
|
||||
import time
|
||||
response = client.get_research(result["request_id"])
|
||||
while response["status"] not in ["completed", "failed"]:
|
||||
time.sleep(10)
|
||||
response = client.get_research(result["request_id"])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JavaScript SDK
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
npm install @tavily/core
|
||||
```
|
||||
|
||||
### Client Initialization
|
||||
|
||||
```javascript
|
||||
const { tavily } = require("@tavily/core");
|
||||
|
||||
// Basic initialization
|
||||
const client = tavily({ apiKey: "tvly-YOUR_API_KEY" });
|
||||
|
||||
// With project tracking
|
||||
const client = tavily({
|
||||
apiKey: "tvly-YOUR_API_KEY",
|
||||
projectId: "your-project-id"
|
||||
});
|
||||
|
||||
// With proxies
|
||||
const client = tavily({
|
||||
apiKey: "tvly-YOUR_API_KEY",
|
||||
proxies: {
|
||||
http: "<proxy>",
|
||||
https: "<proxy>"
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### search()
|
||||
|
||||
```javascript
|
||||
const response = await client.search("quantum computing", {
|
||||
searchDepth: "advanced", // "basic" | "advanced"
|
||||
topic: "general", // "general" | "news" | "finance"
|
||||
maxResults: 10, // 0-20
|
||||
includeAnswer: false, // boolean | "basic" | "advanced"
|
||||
includeRawContent: false, // boolean | "markdown" | "text"
|
||||
includeImages: false,
|
||||
timeRange: "week", // "day" | "week" | "month" | "year"
|
||||
includeDomains: ["arxiv.org"],
|
||||
excludeDomains: ["reddit.com"],
|
||||
country: "united states"
|
||||
});
|
||||
```
|
||||
|
||||
#### extract()
|
||||
|
||||
```javascript
|
||||
const response = await client.extract([
|
||||
"https://example.com/page1",
|
||||
"https://example.com/page2"
|
||||
], {
|
||||
extractDepth: "basic", // "basic" | "advanced"
|
||||
format: "markdown", // "markdown" | "text"
|
||||
includeImages: false,
|
||||
query: "focus query" // Reranks chunks
|
||||
});
|
||||
```
|
||||
|
||||
#### crawl()
|
||||
|
||||
```javascript
|
||||
const response = await client.crawl("https://docs.example.com", {
|
||||
maxDepth: 2,
|
||||
maxBreadth: 20,
|
||||
limit: 50,
|
||||
instructions: "Find API documentation",
|
||||
selectPaths: ["/docs/.*"],
|
||||
excludePaths: ["/blog/.*"],
|
||||
extractDepth: "basic",
|
||||
format: "markdown"
|
||||
});
|
||||
```
|
||||
|
||||
#### map()
|
||||
|
||||
```javascript
|
||||
const response = await client.map("https://docs.example.com", {
|
||||
maxDepth: 2,
|
||||
maxBreadth: 20,
|
||||
limit: 50,
|
||||
instructions: "Find all API pages"
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async Patterns
|
||||
|
||||
### Python Parallel Queries
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from tavily import AsyncTavilyClient
|
||||
|
||||
client = AsyncTavilyClient()
|
||||
|
||||
async def parallel_search():
|
||||
queries = [
|
||||
"AI trends 2025",
|
||||
"machine learning best practices",
|
||||
"LLM deployment strategies"
|
||||
]
|
||||
|
||||
responses = await asyncio.gather(
|
||||
*(client.search(q, search_depth="advanced") for q in queries),
|
||||
return_exceptions=True
|
||||
)
|
||||
|
||||
for query, response in zip(queries, responses):
|
||||
if isinstance(response, Exception):
|
||||
print(f"Failed: {query}")
|
||||
else:
|
||||
print(f"{query}: {len(response['results'])} results")
|
||||
|
||||
asyncio.run(parallel_search())
|
||||
```
|
||||
|
||||
### JavaScript Parallel Queries
|
||||
|
||||
```javascript
|
||||
const queries = ["AI trends", "ML practices", "LLM strategies"];
|
||||
|
||||
const responses = await Promise.all(
|
||||
queries.map(q => client.search(q, { searchDepth: "advanced" }))
|
||||
);
|
||||
|
||||
responses.forEach((response, i) => {
|
||||
console.log(`${queries[i]}: ${response.results.length} results`);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hybrid RAG
|
||||
|
||||
Combine web search with local database retrieval.
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
from tavily import TavilyHybridClient
|
||||
from pymongo import MongoClient
|
||||
|
||||
# Connect to MongoDB
|
||||
db = MongoClient("mongodb+srv://URI")["DB_NAME"]
|
||||
|
||||
# Initialize hybrid client
|
||||
hybrid_client = TavilyHybridClient(
|
||||
api_key="tvly-YOUR_API_KEY",
|
||||
db_provider="mongodb",
|
||||
collection=db.get_collection("documents"),
|
||||
embeddings_field="embeddings",
|
||||
content_field="content"
|
||||
)
|
||||
|
||||
# Search across web + local DB
|
||||
results = hybrid_client.search(
|
||||
query="quantum computing advances",
|
||||
max_results=10,
|
||||
max_local=5, # Results from local DB
|
||||
max_foreign=5, # Results from web
|
||||
save_foreign=True # Store web results in DB
|
||||
)
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
- `TAVILY_PROJECT`: Default project ID
|
||||
- `TAVILY_HTTP_PROXY` / `TAVILY_HTTPS_PROXY`: Proxy configuration
|
||||
- `CO_API_KEY`: Cohere API key for embeddings
|
||||
|
||||
---
|
||||
|
||||
## Response Structures
|
||||
|
||||
### Search Response
|
||||
|
||||
```python
|
||||
{
|
||||
"query": str,
|
||||
"results": [
|
||||
{
|
||||
"title": str,
|
||||
"url": str,
|
||||
"content": str,
|
||||
"score": float,
|
||||
"favicon": str
|
||||
}
|
||||
],
|
||||
"response_time": float,
|
||||
"request_id": str,
|
||||
"answer": str, # if include_answer
|
||||
"images": list # if include_images
|
||||
}
|
||||
```
|
||||
|
||||
### Extract Response
|
||||
|
||||
```python
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"url": str,
|
||||
"raw_content": str,
|
||||
"images": list,
|
||||
"favicon": str
|
||||
}
|
||||
],
|
||||
"failed_results": [
|
||||
{"url": str, "error": str}
|
||||
],
|
||||
"response_time": float,
|
||||
"request_id": str
|
||||
}
|
||||
```
|
||||
|
||||
### Crawl Response
|
||||
|
||||
```python
|
||||
{
|
||||
"base_url": str,
|
||||
"results": [
|
||||
{
|
||||
"url": str,
|
||||
"raw_content": str,
|
||||
"images": list,
|
||||
"favicon": str
|
||||
}
|
||||
],
|
||||
"response_time": float,
|
||||
"request_id": str
|
||||
}
|
||||
```
|
||||
|
||||
### Map Response
|
||||
|
||||
```python
|
||||
{
|
||||
"base_url": str,
|
||||
"results": [str], # List of URLs
|
||||
"response_time": float,
|
||||
"request_id": str
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
For full API documentation, see:
|
||||
- [Python SDK Reference](https://docs.tavily.com/sdk/python/reference)
|
||||
- [JavaScript SDK Reference](https://docs.tavily.com/sdk/javascript/reference)
|
||||
@@ -0,0 +1,403 @@
|
||||
# Search API Reference
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Query Optimization](#query-optimization)
|
||||
- [Search Depth](#search-depth)
|
||||
- [Key Parameters](#key-parameters)
|
||||
- [Basic Usage](#basic-usage)
|
||||
- [Filtering Results](#filtering-results)
|
||||
- [Async Patterns](#async-patterns)
|
||||
- [Response Fields](#response-fields)
|
||||
- [Post-Filtering Strategies](#post-filtering-strategies)
|
||||
|
||||
---
|
||||
|
||||
## Query Optimization
|
||||
|
||||
**Keep queries under 400 characters.** Think search query, not long-form prompt.
|
||||
|
||||
**Break complex queries into sub-queries:**
|
||||
```python
|
||||
# Instead of one massive query, break it down:
|
||||
queries = [
|
||||
"Competitors of company ABC",
|
||||
"Financial performance of company ABC",
|
||||
"Recent developments of company ABC"
|
||||
]
|
||||
responses = await asyncio.gather(*(client.search(q) for q in queries))
|
||||
```
|
||||
|
||||
## Search Depth
|
||||
|
||||
Controls the latency vs. relevance tradeoff:
|
||||
|
||||
| Depth | Latency | Relevance | Content Type |
|
||||
|-------|---------|-----------|--------------|
|
||||
| `ultra-fast` | Lowest | Lower | Content (NLP summary) |
|
||||
| `fast` | Low | Good | Chunks |
|
||||
| `basic` | Medium | High | Content (NLP summary) |
|
||||
| `advanced` | Higher | Highest | Chunks |
|
||||
|
||||
**Content types:**
|
||||
- **Content**: NLP-based summary of the page, providing general context
|
||||
- **Chunks**: Short snippets (max 500 chars) reranked by relevance to your query
|
||||
|
||||
**When to use each:**
|
||||
- `ultra-fast`: Latency-critical (real-time chat, autocomplete)
|
||||
- `fast`: Need chunks but latency matters
|
||||
- `basic`: General-purpose, balanced relevance and latency
|
||||
- `advanced`: Specific information queries, precision matters - default (Still fast and suitable for almost all use cases)
|
||||
|
||||
## Key Parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `query` | string | Required | Search query (keep under 400 chars) |
|
||||
| `search_depth` | enum | `"basic"` | `"ultra-fast"`, `"fast"`, `"basic"`, `"advanced"` |
|
||||
| `topic` | enum | `"general"` | `"general"`, `"news"`, `"finance"` |
|
||||
| `chunks_per_source` | integer | 3 | Chunks per source (advanced/fast depth only) |
|
||||
| `max_results` | integer | 5 | Maximum results (0-20) |
|
||||
| `time_range` | enum | null | `"day"`, `"week"`, `"month"`, `"year"` |
|
||||
| `start_date` | string | null | Results after date (YYYY-MM-DD) |
|
||||
| `end_date` | string | null | Results before date (YYYY-MM-DD) |
|
||||
| `include_domains` | array | [] | Domains to include (max 300, supports wildcards like `*.com`) |
|
||||
| `exclude_domains` | array | [] | Domains to exclude (max 150) |
|
||||
| `country` | enum | null | Boost results from country |
|
||||
| `include_answer` | bool/enum | false | `true`/`"basic"` or `"advanced"` for LLM answer |
|
||||
| `include_raw_content` | bool/enum | false | `true`/`"markdown"` or `"text"` for full page |
|
||||
| `include_images` | boolean | false | Include image results |
|
||||
| `include_image_descriptions` | boolean | false | AI descriptions for images |
|
||||
| `include_favicon` | boolean | false | Favicon URL per result |
|
||||
| `auto_parameters` | boolean | false | Auto-configure based on query intent |
|
||||
| `include_usage` | boolean | false | Include credit usage info |
|
||||
|
||||
**Notes:**
|
||||
|
||||
- **`include_answer`**: Only use if you don't want to bring your own LLM. Most users bring their own model.
|
||||
|
||||
- **`auto_parameters`**: May set `search_depth="advanced"` (2 credits). Set `search_depth` manually to control cost.
|
||||
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from tavily import TavilyClient
|
||||
|
||||
client = TavilyClient()
|
||||
|
||||
response = client.search(
|
||||
query="latest developments in quantum computing",
|
||||
max_results=10,
|
||||
search_depth="advanced",
|
||||
chunks_per_source=5
|
||||
)
|
||||
|
||||
for result in response["results"]:
|
||||
print(f"{result['title']}: {result['url']}")
|
||||
print(f"Score: {result['score']}")
|
||||
```
|
||||
|
||||
|
||||
## Filtering Results
|
||||
|
||||
### By domain
|
||||
|
||||
```python
|
||||
# Only search trusted sources
|
||||
response = client.search(
|
||||
query="machine learning best practices",
|
||||
include_domains=["arxiv.org", "github.com", "pytorch.org"],
|
||||
)
|
||||
|
||||
# Exclude specific domains
|
||||
response = client.search(
|
||||
query="openai product reviews",
|
||||
exclude_domains=["reddit.com", "quora.com"]
|
||||
)
|
||||
|
||||
# Restrict to LinkedIn profiles
|
||||
response = client.search(
|
||||
query="CEO background at Google",
|
||||
include_domains=["linkedin.com/in"]
|
||||
)
|
||||
```
|
||||
|
||||
### By date
|
||||
|
||||
```python
|
||||
# Relative time range
|
||||
response = client.search(query="latest ML trends", time_range="month")
|
||||
|
||||
# Specific date range
|
||||
response = client.search(
|
||||
query="AI news",
|
||||
start_date="2025-01-01",
|
||||
end_date="2025-02-01"
|
||||
)
|
||||
```
|
||||
|
||||
### By country
|
||||
|
||||
```python
|
||||
# Boost results from specific country
|
||||
response = client.search(query="tech startup funding", country="united states")
|
||||
```
|
||||
|
||||
## Async Patterns
|
||||
|
||||
Leveraging the async client enables scaled search with higher breadth and reach by running multiple queries in parallel. This is the best practice for agentic systems where you need to gather comprehensive information quickly before passing it to a model for analysis.
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from tavily import AsyncTavilyClient
|
||||
|
||||
# Initialize Tavily client
|
||||
tavily_client = AsyncTavilyClient("tvly-YOUR_API_KEY")
|
||||
|
||||
async def fetch_and_gather():
|
||||
queries = ["latest AI trends", "future of quantum computing"]
|
||||
|
||||
# Perform search and continue even if one query fails (using return_exceptions=True)
|
||||
try:
|
||||
responses = await asyncio.gather(*(tavily_client.search(q) for q in queries), return_exceptions=True)
|
||||
|
||||
# Handle responses and print
|
||||
for response in responses:
|
||||
if isinstance(response, Exception):
|
||||
print(f"Search query failed: {response}")
|
||||
else:
|
||||
print(response)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error during search queries: {e}")
|
||||
|
||||
# Run the function
|
||||
asyncio.run(fetch_and_gather())
|
||||
```
|
||||
|
||||
|
||||
## Response Fields
|
||||
|
||||
**Top-level response:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `query` | The original search query |
|
||||
| `answer` | AI-generated answer (if `include_answer` enabled) |
|
||||
| `results` | Array of search result objects |
|
||||
| `images` | Array of image results (if `include_images=True`) |
|
||||
|
||||
**Each result object:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `title` | Page title |
|
||||
| `url` | Source URL |
|
||||
| `content` | Extracted text snippet(s) |
|
||||
| `score` | Semantic relevance score (0-1) |
|
||||
| `raw_content` | Full page content (if `include_raw_content` enabled) |
|
||||
| `favicon` | Favicon URL (if `include_favicon=True`) |
|
||||
|
||||
**Top-level response also includes:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `request_id` | Unique identifier for support reference |
|
||||
| `response_time` | Response time in seconds |
|
||||
|
||||
**Each image object (if `include_images=True`):**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `url` | Image URL |
|
||||
| `description` | AI-generated description (if `include_image_descriptions=True`) |
|
||||
|
||||
---
|
||||
|
||||
## Post-Filtering Strategies
|
||||
|
||||
Since Tavily provides raw web data, you have full configurability to implement filtering and post-processing to meet your specific requirements.
|
||||
|
||||
The `score` field measures query relevance, but doesn't guarantee the result matches specific criteria (e.g., correct person, exact product, specific company). Use post-filtering to validate results against strict requirements.
|
||||
|
||||
### Score-Based Filtering
|
||||
|
||||
Simple threshold filtering based on relevance score:
|
||||
|
||||
```python
|
||||
results = response["results"]
|
||||
|
||||
# Filter by score threshold
|
||||
high_quality = [r for r in results if r["score"] > 0.7]
|
||||
|
||||
# Sort by score
|
||||
sorted_results = sorted(results, key=lambda x: x["score"], reverse=True)
|
||||
|
||||
# Top N above threshold
|
||||
top_relevant = sorted(
|
||||
[r for r in results if r["score"] > 0.5],
|
||||
key=lambda x: x["score"],
|
||||
reverse=True
|
||||
)[:3]
|
||||
```
|
||||
|
||||
**Limitation:** Score indicates relevance to query, not accuracy of match to specific criteria.
|
||||
|
||||
### Regex Filtering
|
||||
|
||||
Fast, deterministic filtering using pattern matching. Use for:
|
||||
- URL pattern validation
|
||||
- Required keywords/phrases
|
||||
- Structural requirements
|
||||
|
||||
```python
|
||||
import re
|
||||
|
||||
def regex_filter(result, criteria: dict) -> dict:
|
||||
"""
|
||||
Filter a search result using regex checks.
|
||||
|
||||
Args:
|
||||
result: Search result dict with url, content, title, raw_content
|
||||
criteria: Dict with patterns to match:
|
||||
- url_pattern: Regex for URL validation
|
||||
- required_terms: List of terms that must appear in content
|
||||
- excluded_terms: List of terms that must NOT appear
|
||||
|
||||
Returns:
|
||||
dict with check results and validity
|
||||
"""
|
||||
url = result.get("url", "")
|
||||
content = result.get("content", "") or ""
|
||||
title = result.get("title", "") or ""
|
||||
raw_content = result.get("raw_content", "") or ""
|
||||
|
||||
full_text = f"{content} {title} {raw_content}".lower()
|
||||
|
||||
checks = {}
|
||||
|
||||
# URL pattern check
|
||||
if "url_pattern" in criteria:
|
||||
checks["url_valid"] = bool(re.search(criteria["url_pattern"], url.lower()))
|
||||
|
||||
# Required terms check
|
||||
if "required_terms" in criteria:
|
||||
checks["required_found"] = all(
|
||||
re.search(re.escape(term.lower()), full_text)
|
||||
for term in criteria["required_terms"]
|
||||
)
|
||||
|
||||
# Excluded terms check
|
||||
if "excluded_terms" in criteria:
|
||||
checks["excluded_absent"] = not any(
|
||||
re.search(re.escape(term.lower()), full_text)
|
||||
for term in criteria["excluded_terms"]
|
||||
)
|
||||
|
||||
# Valid if all checks pass
|
||||
is_valid = all(checks.values()) if checks else True
|
||||
|
||||
return {"checks": checks, "is_valid": is_valid, "url": url}
|
||||
```
|
||||
|
||||
**Example: LinkedIn Profile Search**
|
||||
|
||||
```python
|
||||
criteria = {
|
||||
"url_pattern": r"linkedin\.com/in/", # Profile URL, not company page
|
||||
"required_terms": ["Jane Smith", "Acme Corp"],
|
||||
"excluded_terms": ["job posting", "careers"]
|
||||
}
|
||||
|
||||
for result in response["results"]:
|
||||
validation = regex_filter(result, criteria)
|
||||
if validation["is_valid"]:
|
||||
print(f"Valid: {validation['url']}")
|
||||
```
|
||||
|
||||
**Example: GitHub Repository Search**
|
||||
|
||||
```python
|
||||
criteria = {
|
||||
"url_pattern": r"github\.com/[\w-]+/[\w-]+$", # Repo URL, not file
|
||||
"required_terms": ["MIT License"],
|
||||
"excluded_terms": ["archived", "deprecated"]
|
||||
}
|
||||
```
|
||||
|
||||
### LLM Verification
|
||||
|
||||
Semantic validation using an LLM. Use for:
|
||||
- Synonym/abbreviation matching ("FDE" = "Forward Deployed Engineer")
|
||||
- Context-aware validation
|
||||
- Confidence scoring with reasoning
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
import json
|
||||
|
||||
def llm_verify(result, target_description: str, validation_criteria: list[str]) -> dict:
|
||||
"""
|
||||
Use LLM to verify if a search result matches target criteria.
|
||||
|
||||
Args:
|
||||
result: Search result dict
|
||||
target_description: What you're looking for
|
||||
validation_criteria: List of criteria to check
|
||||
|
||||
Returns:
|
||||
dict with is_match, confidence (high/medium/low), reasoning
|
||||
"""
|
||||
content = result.get("content", "") or ""
|
||||
title = result.get("title", "") or ""
|
||||
url = result.get("url", "")
|
||||
|
||||
criteria_text = "\n".join(f"- {c}" for c in validation_criteria)
|
||||
|
||||
prompt = f"""Verify if this search result matches the target.
|
||||
|
||||
Target: {target_description}
|
||||
|
||||
Validation Criteria:
|
||||
{criteria_text}
|
||||
|
||||
Search Result:
|
||||
URL: {url}
|
||||
Title: {title}
|
||||
Content: {content}
|
||||
|
||||
Does this result match ALL criteria?
|
||||
|
||||
Respond with JSON only:
|
||||
{{"is_match": true/false, "confidence": "high/medium/low", "reasoning": "brief explanation"}}"""
|
||||
|
||||
client = OpenAI()
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
response_format={"type": "json_object"}
|
||||
)
|
||||
|
||||
return json.loads(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
**Example: Profile Verification**
|
||||
|
||||
```python
|
||||
result = llm_verify(
|
||||
result=search_result,
|
||||
target_description="Jane Smith, Software Engineer at Acme Corp",
|
||||
validation_criteria=[
|
||||
"Name matches Jane Smith",
|
||||
"Currently works at Acme Corp (or recently)",
|
||||
"Role is software engineering related",
|
||||
"Professional customer-facing experience"
|
||||
]
|
||||
)
|
||||
|
||||
if result["is_match"] and result["confidence"] in ["high", "medium"]:
|
||||
print(f"Verified: {result['reasoning']}")
|
||||
```
|
||||
|
||||
For more details, please read the [full API reference](https://docs.tavily.com/documentation/api-reference/endpoint/search)
|
||||
Reference in New Issue
Block a user