On July 1 we announced four things at once. The press release version is here. This is the version for people who actually build things.
The short story: while the industry spends hundreds of billions on new hardware, we took the opposite bet. Get more out of the GPUs that already exist, and keep everything inside the customer's own environment.
Here's what came out of it.
BackboardQuant: compression that doesn't lobotomize the model
Everyone who has quantized a model knows the trade: smaller and faster, but dumber. The interesting engineering problem was making that trade disappear.







