The Week in ShortBaseten and Fireworks are reportedly entering the decacorn club as inference demand gets red-hot. Newcomer editor-at-large Jonathan Weber catalogs the rise of the tech industry in San Francisco on the podcast. Anthropic’s latest $65 billion Series H pushes its valuation to $965 billion, ahead of its arch-rival’s. Cognition boasts a billion-dollar fundraise. Apple makes a big push for AI models that can work locally on its devices ahead of a new Siri. A Google employee gets arrested for insider trading on Polymarket. Big law firm Kirkland and Ellis allocates hundreds of millions to make its own AI legal tech tools. Robinhood debuts agentic stock trading. A new market map of VC fund-of-funds highlights those willing to back emerging managers. The Main ItemInference startups once drew a little side-eye from investors worried about whether they had a defensible position between the foundation models and the applications companies who buy their services.Not anymore.Inference computing, or the process of running a trained machine learning model on new data to generate predictions or outputs, is suddenly blowing up alongside the heightened demand for AI tools from enterprises.Inference provider Baseten is raising up to $1 billion in new funding just 4 months after closing its previous round, and is looking for an $11 billion valuation, according to The Information.Fireworks AI, a competitor to Baseten which also provides model customization tools and evaluation tools, in talks for fresh funding at a $15 billion valuation, Bloomberg reported. Modal, which straddles inference and AI agent infrastructure, just closed on $355 million in Series C funding co-led by Redpoint and General Catalyst.Together AI, which includes inference as part of its AI native cloud infrastructure, was reportedly in talks to raise around $1 billion at a $7.5 billion valuation.Fal, which offers API access to its library of over 1000 image, video, audio, 3D, and world models as well as an inference engine for businesses to utilize them, was also reported in March to be raising $300 to $350 million in new funding. VCs say the fresh enthusiasm is all about the cash these companies are suddenly bringing in. “The revenue momentum for all of these companies is hard to deny,” said Deedy Das, partner at Menlo Ventures, citing that many were growing at multiples “on a $100 million-plus baseline” in the first half of 2026.It’s not a given that they can sustain the momentum. Some investors have long had concerns about the businesses’ margins, given that they have to run GPUs for long hours to be available to service requests. Those costs add up, especially since competitors like hyperscalers or big labs don’t have to pay to lease compute in the same way. It’s unclear if these companies can keep up this level of revenue without owning the compute layer to back it up. “It seems like VCs are just doing a revenue multiple and are assuming the margin doesn’t matter,” said one skeptical investor.Baseten, Fireworks AI, and Modal all just lease capacity, unlike neoclouds such as Lambda and Crusoe which provide inference while also owning the chip stack. Additionally, they’re also competing with the labs themselves for compute allocations, as OpenAI and Anthropic are gobbling up more chip capacity for their own training and development. They also have fairly similar product offerings, bringing the risk of customers switching back and forth depending on who offers the best price of the moment. Fireworks works more with custom model APIs, while Baseten is more focused on custom model deployment. But as both expand into other parts of the stack besides inference, like fine-tuning, the risk of commoditization is high. Still, the revenue growth for these inference-focused companies is remarkable, even for the new normals of the AI boom. Fireworks AI CEO Lin Qiao said on X Wednesday that the company surpassed $800 million in annualized revenue, up from $250 million in late October last year. Modal shared in its funding announcement that it had crossed $300 million in ARR, and Baseten’s ARR reportedly jumped to $600 million from $200 million at the start of the quarter after a strong month for growth.Unlike training runs for models, which take place as labs are developing their new models, inference computing is the processing that happens when the models make predictions and generate content based on new data. Simply put, inference is the model running after it’s been trained, getting inputs from users and generating responses based on information that could be from outside its original training data set.Coding assistants have been one area where enterprise adoption generated much more need for better inference, and Fireworks AI in particular has depended on Cursor as a major customer. But even just running LLM queries on internal company data requires inference capacity, so the market is theoretically set to grow much larger as more businesses adopt AI tools.We compiled a chart of recent inference startup fundraising hauls by year for some of the top providers, according to data from Harmonic.CVAINewcomer is hosting our mid-year Cerebral Valley AI Summit in London on June 24. We’re returning with another world-class lineup of AI leaders, founders, and investors for what’s shaping up to be one of our best summits.Apply to AttendNewcomer PodcastThis week on the Newcomer podcast, our very own editor at large, Jonathan Weber, spoke with Eric about Jonathan’s forthcoming coming book City on the Edge: Technology, Politics, and the Fight for the Soul of San Francisco. A note from Jonathan ahead of the episode:For the past couple of years I’ve enjoyed a rich, two-track professional life: working as editor at large for Newcomer, and writing a book about San Francisco in the internet era. It’s a great pleasure to have these two endeavors come together on the Newcomer podcast!The book tells the story of the rise of the internet industry in San Francisco — as opposed to suburban Silicon Valley — and how it transformed politics and culture in one of the world’s most iconic cities. It features a rich cast of characters, including well-known political leaders like Gavin Newsom and Willie Brown, tech kingpins such as Chris Larsen and Mark Pincus, and numerous lesser-known figures who played major roles in shaping events over the course of 30 years.The heroes of the story include the bold and idealistic pioneers of the early Web, who in the 1990s built the foundations of the commercial internet along with an inspiring and inclusive culture, exemplified by the annual reverie that is Burning Man. As to the villains, well, when it comes to San Francisco’s much-discussed dysfunctions, I found there was plenty of blame to go around.I hope you enjoy my conversation with Eric and are inspired to read the book, which will be published June 9th by Simon & Schuster’s Atria Books. Pre-orders are always greatly appreciated! I have various events scheduled as well, details at cityontheedgebook.com. Thank you for listening, and for reading!Six Notable Deals