Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
We are pleased to announce that the crawl archive for March 2025 is now available. The data was crawled between March 15th and March 28th, and contains 2.74 billion web pages (or 4.55 TiB of uncompressed content).
Thom Vaughan
Thom is Principal Technologist at the Common Crawl Foundation.