Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to generate actionable trading insights. These advanced AI systems can process financial news, social media sentiment, earnings reports, and market data to predict stock price movements and automate investment strategies with unprecedented accuracy.

The Strategic Technology Analysis Center (STAC) has been developing benchmarks for the workloads key to the financial industry for over 15 years. They have developed the STAC-AI benchmark to help companies assess the end-to-end retrieval-augmented generation (RAG) and LLM inference pipeline.

This post presents the results achieved on the STAC-AI LANG6 benchmark across multiple NVIDIA platforms. We will also share some recommendations on how any user can benchmark NVIDIA TensorRT LLM according to the specifications of their dataset.

STAC-AI LANG6 (Inference-Only) Benchmark

In the broader context of a RAG pipeline, STAC-AI LANG6 is the part of the benchmark focusing on LLM inference performance. The benchmark tests the hardware and software stack on the Llama 3.1 8B Instruct and Llama 3.1 70B Instruct models in combination with the following custom datasets: