Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API

We’re excited to release Llama-2-7B-32K-Instruct, a long-context instruction model fine-tuned using Together API! Llama-2-7B-32K-Instruct achieves state-of-the-art performance for longcontext tasks such as summarization and multi-document question / answering (QA), while maintaining similar performance at a shorter context as Llama-2-7B. We are also releasing the full recipe we used to distill, train, test, and deploy the model.We built Llama-2-7B-32K-Instruct using the Together API and today we’re making fine-tuning Llama-2 available publicly with Together API!Last month, we released Llama-2-7B-32K, which extended the context length of Llama-2 for the first time from 4K to 32K — giving developers the ability to use open-source AI for long-context tasks such as document understanding, summarization, and QA. To provide an example of this fine-tuning capability, we’re introducing Llama-2-7B-32K-Instruct — a long-context instruction-tuned model that we built with less than 200 lines of Python script using Together API. We fine-tuned this model over a mixture of three data sources: 1) a set of single- and multi-round conversations generated by human instructions and Llama-2-70B-Chat outputs, collected using the Together Inference API; 2) summarization data from the BookSum dataset; and 3) multi-document QA dataset.The Llama-2-7B-32K-Instruct recipe: Four main steps to build custom modelsThe code used to implement this recipe using Together API, including the data preparation, is available on Github. The four main steps are outlined below: Distill, Train, Test, and Deploy.Step 1: DistillAs mentioned, we fine-tuned our new model with a mixture of three data sources. The details and our pre-processing steps on BookSum and Multi-document Answering can be found in our previous blog post and our complete data recipe. In this blog post, we share our detailed steps of building the single- and multi-round conversation datasets. We follow the distillation paradigm that is used by Alpaca, Vicuna, WizardLM, and Orca — producing instructions by querying a powerful LLM, which, in our case, is Llama-2-70B-Chat. We do this by leveraging the Together Inference API. For example, given an instruction, such as:instruction = "Create a table about national parks in the US"We can formulate a prompt in the form of "[INST] {instruction} [/INST]", which is the standard prompt format of Llama-2-70B-Chat, we can query the Together Inference API using the following code:res = requests.post("https://api.together.xyz/inference", json={

Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API

Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API

Other newsrooms on this story

Related reading

Fine-tune Llama 2 on Replicate – Replicate blog

Long Context Fine-Tuning: A Technical Deep Dive

Fine Tune Llama 3 For Sql Query Generation Tutorial

A guide to prompting Llama 2 – Replicate blog

Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive

Jet-setting with Llama 2 + Grammars – Replicate blog

Other newsrooms on this story

Related reading

Fine-tune Llama 2 on Replicate – Replicate blog

Long Context Fine-Tuning: A Technical Deep Dive

Fine Tune Llama 3 For Sql Query Generation Tutorial

A guide to prompting Llama 2 – Replicate blog

Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive

Jet-setting with Llama 2 + Grammars – Replicate blog