When AlphaFold2 revolutionized drug discovery in 2020, its success relied entirely on the roughly 170,000 protein structures collected by scientists since 1971 and preserved in the Protein Data Bank. Measured data is the backbone for all AI models and workflows that process data as it’s created, act on what matters in real time, and analyzes data for deep insights. With the current rise of modern sensors and detectors, nobody needs to wait 50 years to collect enough data for groundbreaking AI models.

From large scientific facilities such as Linac Coherent Light Source II (LCLS-II), which generates photon pulses at a 1 MHz repetition rate, to industrial CT scanners and high bandwidth software defined radios, output rates continue to increase and shift the bottleneck away from missing data to the current “collect, store, analyze” architecture, which has never been designed to deal with high data rates on short time scales.

By moving to an adaptable data acquisition pipeline, pre-processing data at the source opens up opportunities that mitigate information loss from the data deluge, while accelerating the path from data collection to discovery.

NVIDIA DAQIRI (Data Acquisition for Integrated Real-time Instruments) shifts the data acquisition to an adaptable, software-centric architecture from an inflexible hardware-centric design. As a high-performance networking library, part of the NVIDIA Holoscan Platform, DAQIRI directly connects existing high-bandwidth streaming detectors and sensors to the NVIDIA software ecosystem. Examples include Holoscan for real-time multi-modal, multi-rate processing; NVIDIA TensorRT for real time inference; and NVIDIA nvCOMP for streaming compression. In addition to the NVIDIA ecosystem, DAQIRI can also stream directly into custom instrument-specific software platforms.