Bluesky hit 40 million users earlier this year, and unlike Twitter, it runs on an open protocol — the AT Protocol — where public data is genuinely public and machine-readable by design. No $5,000/month enterprise API tier. No rate limits you need a lawyer to understand. Just a clean REST API that anyone can query.

I wanted to scrape it. Here's how I built a production-ready actor and what I learned along the way.

Why Bluesky is easy to scrape (legitimately)

Most social media scrapers are a fight against Cloudflare, rotating proxies, and terms of service grey areas. Bluesky is different. The AT Protocol was explicitly designed for third-party clients and data access. The public API at public.api.bsky.app serves unauthenticated read requests. There's no fingerprinting, no CAPTCHA, no DOM parsing.

The only wrinkle: the search endpoint (app.bsky.feed.searchPosts) now requires authentication via a free App Password. Everything else — author feeds, threads, profiles — works without a token.