Back to Articles

TL;DR What exactly is the Multimodal Universe? Why should you care about crossmatching astronomy data? HATS + LSDB ❤️ Hugging Face 🤗 Just give me the code examples! 🗣️ Acknowledgements

TL;DR

The Multimodal Universe (MMU) pools together 80TB1 plus of data from over 30 astronomical surveys into one place. Crossmatching (linking observations of the same object across surveys) is its killer feature, but until now it required downloading hefty chunks of data to local disk. We got tired of needing a cluster just to run a crossmatch, so we gathered in the UniverseTBD and Hugging Science Discord servers to fix that. We've converted the MMU to the parquet-based HATS format so that you can use the LSDB and Hugging Face ecosystems to crossmatch from a laptop.

The datasets are in this Hugging Face collection. No bulk downloads are necessary, and 4GB of RAM is enough even at Gaia scale. Here it is in action: