Proxy servers aren’t exactly a high-profile tool – and yet, they power much of AI infrastructure. A proxy server is another device with its own IP address that you use to access the web. In aggregate, they let you open many webpages through automated means without encountering CAPTCHAs or other roadblocks. Lacking proxies, companies wouldn’t be able to collect as much training data for large language models, and AI agents would stumble halfway through every third task.

However, all this power brings tremendous responsibility. Sourced carelessly, proxies turn people’s computers into unwitting botnets. Used maliciously, they can overwhelm websites, create fake social media profiles, or even help steal yours. As any powerful tool, they can either make or maim, which is why proper governance is so important.

Proxyway, a website covering the web data collection infrastructure, makes it their job to follow the proxy server market closely; their findings are presented in an annual and publicly available proxy server market report. This article, which draws from the report, examines the risks of choosing an unethical provider and offers advice on how to avoid doing so.

Proxy servers in times of AI