Malware research group vx-underground, which says it has the largest collection of malware source code, said in a post on X that its archive of data amounts to about 30 terabytes.
A reply by Bernardo Quintero, founder of VirusTotal, an online service that scans files for malware across multiple antivirus engines at once, said his service has about 31 petabytes of malware samples that users have contributed to date. (A petabyte is ~1,000x larger than a terabyte.)
In both cases, that’s a lot of data. For context, cybersecurity companies, AI researchers, and threat intelligence firms treat repositories like these as critical for training detection models and understanding how attacks evolve. But this had us wondering: What would these enormous datasets actually look like stacked as hard drives one on top of the other and side by side? And how would they compare to, say, the Eiffel Tower?
Someone in our newsroom asked an AI chatbot this question, and it got it incredibly wrong.
Instead, we did some rough back-of-a-napkin math to figure out how tall these data banks would be. Since vx-underground and VirusTotal both have “about” that much data each, “about” is good enough for us in this case.














