Bright Data
Search, Crawl and Scrape any site, at scale, without getting blocked
0.5.0Bright Data provides a developer toolkit for large-scale web search, crawling, and scraping, enabling reliable extraction of pages and structured data without getting blocked. It supports search queries, content-to-Markdown conversion, and configurable data feeds across many site types.
Designed for integration into data pipelines and analytics workflows with parameterized feeds and output formats.
Capabilities
- Scale-resistant crawling and scraping with anti-blocking behavior for sustained collection.
- Flexible search engine queries with advanced parameters across major engines.
- Transform pages into clean Markdown and emit structured JSON feeds for profiles, products, reviews, listings, and media.
- Configurable extraction parameters for batching, pagination, and media handling.
Secrets
- API key (BRIGHTDATA_API_KEY) and zone token (BRIGHTDATA_ZONE). Example values: BRIGHTDATA_API_KEY=sk_..., BRIGHTDATA_ZONE=zone123.
Available tools(3)
| Tool name | Description | Secrets | |
|---|---|---|---|
Scrape a webpage and return content in Markdown format using Bright Data.
Examples:
scrape_as_markdown("https://example.com") -> "# Example Page
Content..."
scrape_as_markdown("https://news.ycombinator.com") -> "# Hacker News
..."
| 2 | ||
Search using Google, Bing, or Yandex with advanced parameters using Bright Data.
Examples:
search_engine("climate change") -> "# Search Results
## Climate Change - Wikipedia
..."
search_engine("Python tutorials", engine="bing", num_results=5) -> "# Bing Results
..."
search_engine("cats", search_type="images", country_code="us") -> "# Image Results
..."
| 2 | ||
Extract structured data from various websites like LinkedIn, Amazon, Instagram, etc.
NEVER MADE UP LINKS - IF LINKS ARE NEEDED, EXECUTE search_engine FIRST.
Supported source types:
- amazon_product, amazon_product_reviews
- linkedin_person_profile, linkedin_company_profile
- zoominfo_company_profile
- instagram_profiles, instagram_posts, instagram_reels, instagram_comments
- facebook_posts, facebook_marketplace_listings, facebook_company_reviews
- x_posts
- zillow_properties_listing
- booking_hotel_listings
- youtube_videos
Examples:
web_data_feed("amazon_product", "https://amazon.com/dp/B08N5WRWNW")
-> "{"title": "Product Name", ...}"
web_data_feed("linkedin_person_profile", "https://linkedin.com/in/johndoe")
-> "{"name": "John Doe", ...}"
web_data_feed(
"facebook_company_reviews", "https://facebook.com/company", num_of_reviews=50
) -> "[{"review": "...", ...}]" | 1 |
Selected tools
No tools selected.
Click "Show all tools" to add tools.
Requirements
Select tools to see requirements
Brightdata.ScrapeAsMarkdown
Execution hints
Signals for MCP clients and agents about how this tool behaves.
Does not modify remote state.
May delete or overwrite remote data.
Safe to retry without extra side effects.
Can call out to external systems.
Scrape a webpage and return content in Markdown format using Bright Data. Examples: scrape_as_markdown("https://example.com") -> "# Example Page Content..." scrape_as_markdown("https://news.ycombinator.com") -> "# Hacker News ..."
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
url | string | Required | URL to scrape |
Requirements
Output
string— Scraped webpage content as MarkdownBrightdata.SearchEngine
Execution hints
Signals for MCP clients and agents about how this tool behaves.
Does not modify remote state.
May delete or overwrite remote data.
Safe to retry without extra side effects.
Can call out to external systems.
Search using Google, Bing, or Yandex with advanced parameters using Bright Data. Examples: search_engine("climate change") -> "# Search Results ## Climate Change - Wikipedia ..." search_engine("Python tutorials", engine="bing", num_results=5) -> "# Bing Results ..." search_engine("cats", search_type="images", country_code="us") -> "# Image Results ..."
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
query | string | Required | Search query |
engine | string | Optional | Search engine to usegooglebingyandex |
language | string | Optional | Two-letter language code |
country_code | string | Optional | Two-letter country code |
search_type | string | Optional | Type of searchimagesshoppingnewsjobs |
start | integer | Optional | Results pagination offset |
num_results | integer | Optional | Number of results to return. The default is 10 |
location | string | Optional | Location for search results |
device | string | Optional | Device typemobileiosiphoneipadandroidandroid_tablet |
return_json | boolean | Optional | Return JSON instead of Markdown |
Requirements
Output
string— Search results as Markdown or JSONBrightdata.WebDataFeed
Execution hints
Signals for MCP clients and agents about how this tool behaves.
Does not modify remote state.
May delete or overwrite remote data.
Safe to retry without extra side effects.
Can call out to external systems.
Extract structured data from various websites like LinkedIn, Amazon, Instagram, etc. NEVER MADE UP LINKS - IF LINKS ARE NEEDED, EXECUTE search_engine FIRST. Supported source types: - amazon_product, amazon_product_reviews - linkedin_person_profile, linkedin_company_profile - zoominfo_company_profile - instagram_profiles, instagram_posts, instagram_reels, instagram_comments - facebook_posts, facebook_marketplace_listings, facebook_company_reviews - x_posts - zillow_properties_listing - booking_hotel_listings - youtube_videos Examples: web_data_feed("amazon_product", "https://amazon.com/dp/B08N5WRWNW") -> "{"title": "Product Name", ...}" web_data_feed("linkedin_person_profile", "https://linkedin.com/in/johndoe") -> "{"name": "John Doe", ...}" web_data_feed( "facebook_company_reviews", "https://facebook.com/company", num_of_reviews=50 ) -> "[{"review": "...", ...}]"
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
source_type | string | Required | Type of data sourceamazon_productamazon_product_reviewslinkedin_person_profilelinkedin_company_profilezoominfo_company_profileinstagram_profilesinstagram_postsinstagram_reelsinstagram_commentsfacebook_postsfacebook_marketplace_listingsfacebook_company_reviewsx_postszillow_properties_listingbooking_hotel_listingsyoutube_videos |
url | string | Required | URL of the web resource to extract data from |
num_of_reviews | integer | Optional | Number of reviews to retrieve. Only applicable for facebook_company_reviews. Default is None |
timeout | integer | Optional | Maximum time in seconds to wait for data retrieval |
polling_interval | integer | Optional | Time in seconds between polling attempts |
Requirements
Output
string— Structured data from the requested source as JSON