ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Can LLMs autonomously refine other LLMs for new tasks? Somewhat.…PostTrainBench shows startling growth in AI capabilities at post-training…AI-driven R&D might be the most important thing in all of AI, because it helps us understand whether AI systems might eventually build their own successors. So far, much of the focus on AI R&D has been in components that support AI development (e.g., autonomous creation of AI kernels), or training base models (e.g, the NanoGPT speedrun benchmark). But there’s been less attention paid to fine-tuning – the task involving adapting an existing LLM to a new dataset or behavior. Researchers from the University of Tübingen, the Max Planck Institute for Intelligent Systems, and AI research organization Thoughtful Lab want to change that with PostTrainBench, a benchmark which targets a specific aspect of post-training; improving performance against a given dataset. “Post-training is how raw language models become useful”, the authors write. “Given a clear objective and limited compute, can today’s agents do the technical work?”. The answer appears to be ‘yes, but not as well as humans’.

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text

ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text

Related reading

Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench

Import AI 455: Automating AI Research

Import AI 446: Nuclear LLMs; China's big AI benchmark; measurement and AI policy

Import AI 439: AI kernels; decentralized training; and universal representations

Import AI 448: AI R&D; Bytedance’s CUDA-writing agent; on-device satellite AI

Import AI 455: AI systems are about to start building themselves.

Related reading

Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench

Import AI 455: Automating AI Research

Import AI 446: Nuclear LLMs; China's big AI benchmark; measurement and AI policy

Import AI 439: AI kernels; decentralized training; and universal representations

Import AI 448: AI R&D; Bytedance’s CUDA-writing agent; on-device satellite AI

Import AI 455: AI systems are about to start building themselves.