
Free Daily Podcast Summary
by Nathan Lambert
Get key takeaways, quotes, and insights from Interconnects in a 5-minute read. Delivered straight to your inbox.
The most recent episodes — sign up to get AI-powered summaries of each one.
Staring out the window on a new, high-speed train from Hangzhou to Shanghai I’m gifted with views of dramatic ridgelines speckled with wind turbines that are silhouetted against the setting sun. The mountains cast a backdrop to a mix of spanning fields and clustered skyscrapers. I’m returning from China with great humility. It’s a very warming, human experience to go somewhere so foreign and be so welcomed. I had the honor of meeting so many people in the AI ecosystem who I knew from afar, and they greeted me with big smiles and cheer, reminding me how global my work and the AI ecosystem is.Interconnects AI is a reader-supported publication. Consider becoming a subscriber.The mentality of Chinese researchersThe Chinese companies building language models are set up as the perfect fast-followers for the technology, building on long-standing cultural traditions in education and work, along with subtly different approaches to building technology companies. When you look at the outputs, the latest, biggest models enabling agentic workflows, and the ingredients, excellent scientists, large-scale data, and accelerated computing, the Chinese and American labs look largely similar. The lasting differences emerge in how these are organized and conditioned.I’ve long thought that a reason that the Chinese labs are so good at catching up and keeping up with the frontier is that they’re culturally aligned for this task, but without talking to people directly I felt like it wasn’t my place to attribute substantial influence to this hunch. Speaking with many wonderful, humble, and open scientists at the leading Chinese labs has crystallized a lot of my beliefs.So much of building the best LLMs today comes down to meticulous work across the entire stack, from data to architecture details and RL algorithm implementations. All points of the model can give some improvements, and fitting them in together is a complex process where the work of some brilliant individuals needs to get shelved in favor of the overall model maximizing a multi-objective optimization.Where American researchers are obviously also brilliant at solving the individual components, there’s more of a culture of speaking up for yourself in the U.S. As a scientist, you’re more successful when you speak up for your work and modern culture is pushing the new path to fame of “leading AI scientists”. This results in direct conflict. The Llama organization is heavily rumored to have collapsed under the political weight of these interests embedding themselves in a hierarchical organization. I’ve heard of other labs saying that it can be needed to pay off a top researcher to get them to stop complaining about their idea not making it in the final model. Whether or not that’s exactly true, the idea is clear. Ego and desires for career advancement do get in the way of making the best models. A small, directional shift in this sort of culture between the U.S. and China can have a meaningful impact on the final outputs.Some of this has to do with who is building the models in China. There’s an immediate reality at all of the labs that a large proportion of the core contributors are active students. The labs are quite young, and it reminds me of our setup at Ai2, where students are seen as peers and directly integrated in the LLM team. This is incredibly different from the top labs in the US, where the likes of OpenAI, Anthropic, Cursor, etc. simply don’t offer internships. Other companies like Google nominally have internships related to Gemini, but there’s a lot of concern about whether your internship will be siloed and away from anything real.To summarize how the slight change in culture can improve the ability to build models:* More willingness to do non-flashy work in order to improve the final model,* People new to building AI can be free of prior phases of AI hype cycles, allowing them to adapt to the new modern techniques faster (in fact, one of the Chinese scientists I talked to really actively attached to this strength),* Less ego enabling org charts to scale slightly, as there’s less gamifying the system, and* Abundant talent well-suited to solving problems with a proof of concept elsewhere, etc.This slight inclination towards skills that complement building today’s language models stands in contrast to a known stereotype that Chinese researchers tend to produce less creative, field-spawning, 0-to-1 academic style research. Among the more academic lab visits on our trip, many leaders talk about cultivating this more ambitious research culture. At the same time, some technical leaders we talked to were skeptical about whether such a rewiring in the approach to science is likely in the near term, because it’ll take a redesign of the education and incentive systems that is too big to happen within the current economic equilibrium. This culture seems to be training
‘Distillation attacks’ is a horrible term for what is happening right now. Yes, some Chinese labs are hacking or jailbreaking APIs to attempt to extract more signal from model APIs — stopping this is important to maintain the U.S.’s lead in AI capabilities. Referring to this as distillation attack is going to irrevocably associate all distillation with this behavior, and distillation generally is a core technique needed to diffuse AI capabilities broadly through academic and economic activities.We went through this sort of language transition with the open source vs open weight debate. All the terms just reduced to open models – very few people in the large AI community know exactly how open-source differs from open-weights. And terminology matters, as the less informed people who still care about — and influence — the technology are bound by different terms they use. If we’re not careful with the discourse around distillation, many people could associate this broad technique used for research and development of new models as an act at the boundary of corporate manipulation and crime.I’ve recently written a more technical piece on estimating how impactful state-of-the-art distillation methods are on leading Chinese models, and this piece follows to push for caution in any hasty actions to target the methods with policy. To set the stage, recall Anthropic’s recent blog post where they detailed “distillation attacks” made by 3 Chinese labs.These labs used a technique called “distillation,” which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.This is a clever paragraph, where they normalize distillation generally and explain how a few people can use it illicitly, without detailing how illicit use often involves other more explicit behavior like jailbreaking, hacking, or identity spoofing of the API.Distillation itself is an industry standard. It’s used extensively, primarily in post-training, by smaller players to create specialized or smaller models. In my book coming this summer, I describe it as follows:The term distillation has been the most powerful form of discussion around the role of synthetic data in language models. Distillation as a term comes from a technical definition of teacher-student knowledge distillation from the deep learning literature.Distillation colloquially refers to using the outputs from a stronger model to train a smaller model.In post-training, this general notion of distillation takes two common forms:* As a data engine to use across wide swaths of the post-training process: Completions for instructions, preference data (or Constitutional AI), or verification for RL.* To transfer specific skills from a stronger model to a weaker model, which is often done for specific skills such as mathematical reasoning or coding.With this definition, it’s easy to see how distillation takes many forms. Of course, if you just take the outputs from GPT-5.5 and train a recent open-weight base model with them to host a competitive product, that’s one thing. But, a lot of the things that fall under the bucket of distillation are complex, multi-stage processes that muddle the exact impact of the model you distilled from.Modern LLM processes could look like using a GPT API to build an initial batch of synthetic data to build a specialized small data-processing model. A good example is a model like olmOCR (or many other models in this category) that are trained to convert PDFs to clean text. This specialized model would be used to create large amounts of data. Finally, you train another model (often from scratch) with the new data you created. Is this final model distilled from GPT?When done via a closed, API-based model, distillation sits in the grey area of the terms of service that you agree to when signing up to the Claude or GPT platform. They generally forbid the use of the API to create competing language model products, but this term has largely gone unenforced. The open-source community used to worry deeply at being cut off from these cutting-edge APIs for doing research or creating public datasets, but to date only <a target="_blank" href="https://www.theverge.com/2023/12/15/24003151/bytedance-chi
We’re living through the period of time when we’ll learn if open models can keep up with closed labs. The obvious answer is that no, they won’t. This answer is a form of saying they won’t keep up in every area. This framing closes off a popular prediction where the open models completely catch up, as in all models saturate and open and closed models only become increasingly similar. In living through this, it’s evidently very unclear when the longer-term stable balance of capabilities will solidify. This is a very complex dynamic, where the core point we monitor is a capability gap between models. At the same time, this gap is intertwined with evolving dynamics in the funding of open models, who builds open models, how techniques like distillation that enable fast-following translate through new application domains, potential regulation hampering the open-source AI ecosystem, and of course who actually uses open models. The capabilities gap is one signal in a complex sea of forces, pushing supply and demand into different shapes. In many cases the demand — where obviously tons of individuals, organizations, and sovereigns want, or need, open models — is largely separated from supply. Supply is fully dictated by economics. The question of “which business strategies support releasing open models” is still at stake.Interconnects AI is a reader-supported publication. To receive new posts and support my work, consider becoming a subscriber.With this complexity, I wanted to distill my key beliefs down into a clear list. These are downstream of 10+ pieces I’ve written or recorded on open models this spring (which are linked throughout).* It’s surprising that the top closed models did not show a growing capability margin over open models, based on compute differences for training and research, especially in the second half of 2025 and through today.* Open model labs are technically very strong at keeping pace on well-established benchmarks. This will continue and reflects a balance of abundant talent and sufficient computing power. * Chinese open-weight labs focus slightly more on benchmark scores than comparable closed labs in the U.S. Distillation helps the Chinese LLM companies do so, but it’s not a panacea. Changes in the distillation dynamic (e.g. regulation) will not be a determining factor on the balance of capabilities. This increase in focus is a natural evolution of their incentives in keeping the narrative on keeping up with the frontier alive, which is crucial to fundraising and adoption.* To date, closed models tend to be more robust and generally useful than similarly scoring open models. Closed models have certain hard-to-measure qualities that are not well captured in current or past benchmarks. This will be key to enabling closed models to dominate in markets where an individual user constantly presents new challenges, i.e. supporting knowledge workers as a direct assistant.* The open vs. closed model race, as monitored through benchmarks, will largely be a game of economic staying power and fast-following, until the market structure constricts. I expect Chinese open-weight labs to face funding difficulties first, as soon as later this year. Funding difficulties will be seen in different capability trajectories 3-9 months later.* The RL dominated training era has increased the relevance of distribution to real-world use-cases as a key factor in continued capabilities improvements. These are tasks where users directly use tools like Claude Code or Codex to solve problems in their job with agents. This is the first clear technical area that closed labs can dominate open-weight models on capabilities, potentially leveraging online RL directly based on user feedback.* Open models will be increasingly adopted in repetitive automation tasks, as measured in the relative share of the API market, for repetitive tasks across the ecosystem. This takes the form of many new AI-native applications, business backend automation, etc. The success of this will drive more investment in domain-specific, efficient open models.This is a complex picture, where the long-term trajectory is more of an economics question rather than an ability one. Many other outlets can paint a far more simplistic narrative that “<a target="_blank" href="https://www.nytimes.c
Recently, I was talking with Percy Liang, Stanford professor and lead of the Marin project (another fully-open model lab), and it set in on me that there will eventually be a consortium of companies funding a foundational set of open models used across industry. It’s not clear when this’ll emerge, and Nemotron (Coalition) is Nvidia’s attempt to bankroll and bootstrap this approach within a single wealthy company, but a consortium is the only long-term stable path to well-funded, near-frontier open models.In recent months, we’ve seen a lot of turnover in open model labs, with high-profile departures at Qwen and Ai2 (my comment). This shouldn’t be super surprising to followers of the ecosystem — it’s happened before with Meta shifting its focus away from Llama, and it’ll only happen more as the cost of trying to keep pace at the frontier of AI only increases. The other leading labs with models available today include Chinese startups such as Moonshot AI, MiniMax, and Z.ai — all of which look precarious on their ability to fund continued growth in the cost of training or R&D. Releasing one’s strongest models openly today is in active tension with the option of spending focus and resources on AI products that can currently generate meaningful revenue (and profits).We’re going to see business models emerge around releasing some, or even many, models openly, but these will largely be smaller models that enable a long-tail of functionality, rather than models at the absolute frontier. This class of companies that’ll release many, strong fine-tunable models will include the likes of Arcee AI, Thinking Machines, OpenAI, Google with Gemma, and more in that class. The cost and relative advantage of keeping the best models closed in a business environment with many opportunities for revenue are too high. To summarize — there will be an ever increasing number of companies releasing models that are good for creating a lively niche of smaller, custom models, but an ever decreasing number of companies willing to release fully open, near-frontier models. This is the core thesis of why I’m pushing hard for more people to do more research on how these smaller models can complement the best closed agents, the science of finetunability, etc. See my post below — it’s about creating a sustainable open model ecosystem, whether or not the frontier of open keeps paced with closed:It’ll take years for this equilibrium to become more obvious, seen through the lens of more open model families coming and going. This year, it seems likely we’ll see Nvidia’s Nemotron reach new heights, Reflection AI challenge some of the Chinese models with a strong, large MoE, maybe Meta releases a new open-weight model, and so on. True pressure to change strategy will only come when the capital environment punishes the less efficient spend on resources (e.g. giving away your competitive advantage, in having an in-house model). This pressure will likely hit Chinese startups training these models first. All of Moonshot AI, MiniMax, and Zhipu AI will show signs of financial challenge in the coming years if they retain their strategy, on top of their models falling further behind the best open models in terms of generality. This is inevitable pressure to evolve open models to areas that are profitable and complementary of the frontier of AI.Nvidia, which is best positioned to support the open ecosystem in the near term to support its core GPU business, could face many pressures to pull back its open model efforts. It could:* Realize it’s too competitive to their biggest customers as they succeed too much with Nemotron, * Fall to competition on their core business and lose the free cash flow buffer needed to fund this (e.g. it’s 2031 and OpenAI, Anthropic, Google, and the other frontier labs are worth so much they build their own chips). * Start succeeding beyond their initial goals and keep the chips for them to build ASI themselves, as a closed-weight model. The pressures for new funding mechanisms for open models are bas
With the announcement of the Claude Mythos model this week and the admittedly very strong stated abilities, especially in cybersecurity, a new wave of anti open-weight AI model narratives surged. The TL;DR of the argument is that our digital infrastructure will not be ready in time for an open-weight version of this model, which will allow attacks to be conducted by numerous parties.The backlash against open models in the wake of the Mythos news conflates too many general unknowns into a simple, broad policy recommendation that could actually further weaken cybersecurity readiness.We’ve been here before – open-weight models were discussed as being extremely dangerous when OpenAI withheld GPT-2 weights in 2019, and when OpenAI released GPT-4 in 2023. Both of these waves came and went. The core mistake that is being made is the composition of two issues: 1) the acceptance of the open-closed model gap being static in time and 2) linking open-weight viability generally to specific issues.I’ve written at length recently on how I think that the best, frontier-level open weight models are going to fall behind the best closed models in overall capabilities in the near future. I’ve also written about how the open-weight ecosystem needs to adapt to accept this reality. This is one of the times for the AI industry where I will repeat that it’s a total blessing to have the 6-18 month delay from when a certain capability is available within a closed lab to it being reproduced in the open. It’s a good balance of safety and monitoring the frontier of AI systems while allowing a useful open-source ecosystem to exist and thrive.The core argument I’ve focused on in the open-closed model time gap has been in general capabilities – i.e. for general purpose, frontier models such as Claude Opus 4.X or GPT Thinking 5.X. The abilities of these closed models to robustly solve and work in diverse situations as agents remains out of scope of the best open-weight models. What the open-weight models have tended to be better at is quickly keeping pace on key benchmarks (which admittedly is helped to some extent, but not necessarily substantially by distillation). This discussion is entirely different, it has to do with if open weight models can keep pace on the specific skills related to cybersecurity, and when we could expect an open version of this model to be available to the world.The case of a Claude Mythos level open weight model is admittedly more nuanced to me than the previous few anti-open weight narratives the community has experienced. Where GPT-4 was about a more hypothetical risk, especially in areas like bio-risk, the clear and present reality of cyber infrastructure being prone to attack is far more tangible. Still, much of this nuance in the moment comes down to not knowing the full details of what the system can actually do (i.e. Mythos), and the state of the environment it would act in (i.e. our digital infrastructure).To properly assess this risk, we need to know what it takes to build and deploy a Claude Mythos scale model. This entails three pieces: 1) training and releasing the weights, 2) the harness that gives the model effective tools it knows how to use, and 3) the inference compute and software.(Below I make some model size & price estimates to show my thinking, these should not be taken as ground truth.)Current estimates put the size ranges of leading models like Claude Opus 4.6 or GPT 5.4 as being around 3-5T parameters. Currently, the largest open-source models, which have been coming from Chinese labs, are around 1T parameters. Claude Mythos’s preview pricing is 5X Opus, which could come from a simple multiplicative increase in active parameters (with the same serving system design), far higher inference-time scaling, more complex harnesses that make inference less efficient, lower utilization expectations, and so on. The simplest guess is that it’s a mix of all of the above, something like 2X bigger in parameters and much less efficient to serve. That’s a huge model, likely something similar to <a target="_blank" href="https://www.interc
Having written a lot of model release blog posts, there’s something much harder about reviewing open models when they drop relative to closed models, especially in 2026. In recent years, there were so few open models, so when Llama 3 was released most people were still doing research on Llama 2 and super happy to get an update. When Qwen 3 was released, the Llama 4 fiasco had just gone down, and a whole research community was emerging to study RL on Qwen 2.5 — it was a no brainer to upgrade. Today, when an open model releases, it’s competing with Qwen 3.5, Kimi K2.5, GLM 5, MiniMax M2.5, GPT-OSS, Arcee Large, Nemotron 3, Olmo 3, and others. The space is populated, but still feels full of hidden opportunity. The potential of open models feels like a dark matter, a potential we know is huge, but few clear recipes and examples for how to unlock it are out there. Agentic AI, OpenClaw, and everything brewing in that space is going to spur mass experimentation in open models to complement the likes of Claude and Codex, not replace them.Especially with open models, the benchmarks at release are an extremely incomplete story. In some ways this is exciting, as new open models have a much higher variance and ability to surprise, but it also points at some structural reasons that make building businesses and great AI experiences around open models harder than the closed alternatives. When a new Claude Opus or GPT drops, spending a few hours with them in my agentic workflows is genuinely a good vibe test. For open models, putting them through this test is a category error.Something else to be said about open models in the era of agents is that they get out of the debate of integration, harnesses, and tools and let us see close to the ground on what exactly is the ability of just a model. Of course, we can’t test some things like search abilities without some tool, but being able to measure exactly the pace of progress of the model alone is a welcome simplification to a systematically opaque AI space.The list of factors I’d use to assess a new open-weight model I’m considering investing in includes:* Model performance (and size) — how this model performs on benchmarks I care about and how it compares to other models of a similar size.* Country of origin — some businesses care deeply about provenance, and if a model was built in China or not.* Model license — if a model needs legal approval for use, uptake will be slower at mid-sized and large companies.* Tooling at release — many models release with half-broken, or at least substantially slower, implementations in popular software like vLLM, Transformers, SGLANG, etc due to pushing the envelope of architectures or tools.* Model fine-tunability — how easy or hard it is to modify the given model to your use-case when you actually try and use it.The core problem is that some of these are immediately available at release, e.g. general performance, license, origin, etc. but others such as tooling take day(s) to week(s) to stabilize, and others are open research questions — with no group systematically monitoring fine-tunability. In the early era of open models, the days of Llama 2 or 3 and Qwen pre v3.5, the architectures were fairly simple and the models tended to work out of the box. Some of this was due to the extremely hard work of the Llama, Qwen, Mistral, etc. developer teams. Some is due to the new models being genuinely harder to work with. When it comes to something like Qwen 3.5 or Nemotron 3, with hybrid models (either gated delta net or mamba layers), the tooling is very rough at release. Things you would expect to “just work” often don’t.I’ve been following this area closely since we released Olmo Hybrid with a similar architecture, and Qwen 3.5 is just starting to work well in the various open-source tools that need to all play nice together for RL research. That’s 1.5 months after the release date! This is just to start really investing more into understanding the behavior of the models. Of course, others started working on these models sooner by investing more engineering resources or relying on partially closed software. The fully open and distributed ecosystem takes a long time to get going on some new models.All of this is lead-in for the most imp
Fast takeoff, the singularity, and recursive self-improvement (RSI) are all top of mind in AI circles these days. There are elements of truth to them in what’s happening in the AI industry. Two, maybe three, labs are consolidating as an oligopoly with access to the best AI models (and the resources to build the next ones). The AI tools of today are abruptly transforming engineering and research jobs.AI research is becoming much easier in many ways. The technical problems that need to be solved to scale training large language models even further are formidable. Super-human coding assistants making these approachable is breaking a lot of former claims of what building these things entailed. Together this is setting us up for a year (or more) of rapid progress at the cutting edge of AI.We’re also at a time where language models are already extremely good. They’re in fact good enough for plenty of extremely valuable knowledge-work tasks. Language models taking another big step is hard to imagine — it’s unclear which tasks they’re going to master this year outside of code and CLI-based computer-use. There will be some new ones! These capabilities unlock new styles of working that’ll send more ripples through the economy.These dramatic changes almost make it seem like a foregone conclusion that language models can then just keep accelerating progress on their own. The popular language for this is a recursive self-improvement loop. Early writing on the topic dates back to the 2000s, such as the blog post entirely on the topic from 2008: Recursion is the sort of thing that happens when you hand the AI the object-level problem of “redesign your own cognitive algorithms”.And slightly earlier, in 2007, Yudkowsky also defined the related idea of a Seed AI in Levels of Organization in General Intelligence:A seed AI is an AI designed for self-understanding, self-modification, and recursive self-improvement. This has implications both for the functional architectures needed to achieve primitive intelligence, and for the later development of the AI if and when its holonic self-understanding begins to improve. Seed AI is not a workaround that avoids the challenge of general intelligence by bootstrapping from an unintelligent core; seed AI only begins to yield benefits once there is some degree of available intelligence to be utilized. The later consequences of seed AI (such as true recursive self-improvement) only show up after the AI has achieved significant holonic understanding and general intelligence.It’s reasonable to think we’re at the start here, with how general and useful today’s models are.Generally, RSI can be summarized as when AI can improve itself, the improved version can improve even more efficiently, creating a closed amplification loop that leads to an intelligence explosion, often referred to as the singularity. There are a few assumptions in this. For RSI to occur, it needs to be that:* The loop is closed. Models can keep improving on themselves and beget more models.* The loop is self-amplifying. The next models will yield even bigger improvements than the current ones.* The loop continues to run without losing efficiency. There are not added pieces of friction that make the exponential knee-capped as an early sigmoid.While I agree that momentous, socially destabilizing changes are coming in the next few years from sustained AI improvements, I expect the trend line of progress to be more linear than exponential when we reflect back. Instead of recursive self-improvement, it will be lossy self-improvement (LSI) – the models become core to the development loop but friction breaks down all the core assumptions of RSI. The more compute and agents you throw at a problem, the more loss and repetition shows up.Interconnects AI is a reader-supported publication. Consider becoming a subscriber.I’m still a believer that the complexity brake on advanced systems will be a strong counterbalance to the reality that AI models are getting substantially better at every narrow task we need to compose together in making a leading AI model. I quoted this previously in April of 2025 in response to AI 2027.Microsoft co-founder Paul Allen argued the opposite of accelerating returns, the complexity brake: the more progress science makes towards understanding intelligence, the more difficult it becomes to make additional progress. A study of the number of patents shows that human creativity does not show accelerating returns, but i
I’m a little late to this model review, but that has given me more time to think about the axes that matter for agents. Traditional benchmarks reduce model performance to a single score of correctness – they always have because that was simple, easy to quickly use to gauge performance, and so on. This is also advice that I give to people trying to build great benchmarks – it needs to reduce to one number that is interpretable. This is likely still going to be true in a year or two, and benchmarks for agents will be better, but for the time being it doesn’t really map to what we feel because agentic tasks are all about a mix of correctness, ease of use, speed, and cost. Eventually benchmarks will individually address these.Where GPT 5.4 feels like another incremental model on some on-paper benchmarks, in practice it feels like a meaningful step in all four of those traits. GPT 5.4 in Codex, always on fast mode and high or extra-high effort, is the first OpenAI agent that feels like it can do a lot of random things you can throw at it.Interconnects AI is a reader-supported publication. Consider becoming a subscriber.I haven’t been particularly deep in software engineering over the last few months, so most of my working with agents has been smaller projects (not totally one-off, but small enough where I’ve built the entire thing and manage the design over weeks), data analysis, and research tasks. When you embrace being agent-native, this style of work entails a lot of regular APIs, background packages (like installing and managing LateX binaries, ffmpeg, multimedia conversion tools, etc), git operations, file management, search etc. Prior to GPT 5.4, I always churned off of OpenAI’s agents due to a death by a thousand cuts. It felt like rage quits. I’d feel like I was getting into GPT 5.2 Codex, but it would fail on a git operation and have me (or Claude) need to reset it. Those hard edges are no longer there.The other subtle change in GPT 5.4’s approachability – the biggest reason I think OpenAI is much more back in the agent wars – is that it just feels a bit more “right.” I classify this differently to the routine tasks I discussed above, and it has to do with how the product (i.e. the model harness) presents the model outputs, requests, and all that to you the user. It has to do with how easy it is to dive in. This has always been Claude’s biggest strength in its astronomical growth. Not only has Claude been immensely useful, but it has a charm and entertainment value to it that’ll make new people stick around. GPT 5.4 has a bit of that, but the underlying model strengths of Claude still leave it feeling warmer.Where Claude is a super smart model, with character, a turn of phrase in a debate, and sometimes forgetting something, OpenAI’s models in Codex feel meticulous, slightly cold, but deeply mechanical. I’d use Claude for things I need more of an opinion on and GPT 5.4 to churn through an overwhelmingly specific TODO list. The instruction following of GPT 5.4 is so precise that I need to learn to interact with the models differently after spending so much time with Claude. Claude, in some domains, you come to see has an excellent model for your intent. GPT 5.4 just does what you say to do. These are very different philosophies of “what will make the best model for an agent”, Claude will likely appeal to the newcomers, but GPT 5.4 will likely appeal to the master agent coordinator that wants to unleash their AI army on distributed tasks.Outside of charm, and dare I say taste, a lot of the usability factors are actually better on OpenAI’s half of the world. The Codex app is compelling – I don’t always use it, but sometimes I totally love it. I suspect substantial innovation is coming in what these apps look like. Personally, I expect them to eventually look like Slack (when multiple agents need to talk to eachother, under my watch).OpenAI also natively offers fast mode for their models with a subscription and very large rate limits. I’ve been on the $100/month Claude plan and $200/month ChatGPT plan for quite some time. I’ve never been remotely close to my Codex limits with fast mode and xhigh reasoning effort, where I hit my Claude limits from time to time. There’s definitely a modeling reason to this – most of OpenAI’s release blogs showcase each iterative model being substantially more concise in the number of tokens it takes to get peak benchmark performance. This is a measure of reasoning efficiency. This 2D (or more) benchmark picture is exactly where the world is going.Here’s a plot from Cursor, which sadly doesn’t have all the GPT 5.4 reasoning efforts, but it confirms this point in a third party evaluation. What is missing across model families is the speed and price (a proxy for total compute used) to get there.The final benefit of GPT 5.4, and Open
Free AI-powered daily recaps. Key takeaways, quotes, and mentions — in a 5-minute read.
Get Free Summaries →Free forever for up to 3 podcasts. No credit card required.
Listeners also like.
Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories. www.interconnects.ai
AI-powered recaps with compact key takeaways, quotes, and insights.
Get key takeaways from Interconnects in a 5-minute read.
Stay current on your favorite podcasts without falling behind.
It's a free AI-powered email that summarizes new episodes of Interconnects as soon as they're published. You get the key takeaways, notable quotes, and links & mentions — all in a quick read.
When a new episode drops, our AI transcribes and analyzes it, then generates a personalized summary tailored to your interests and profession. It's delivered to your inbox every morning.
No. Podzilla is an independent service that summarizes publicly available podcast content. We're not affiliated with or endorsed by Nathan Lambert.
Absolutely! The free plan covers up to 3 podcasts. Upgrade to Pro for 15, or Premium for 50. Browse our full catalog at /podcasts.
Interconnects publishes weekly. Our AI generates a summary within hours of each new episode.
Interconnects covers topics including Science, Technology. Our AI identifies the specific themes in each episode and highlights what matters most to you.
Free forever for up to 3 podcasts. No credit card required.
Free forever for up to 3 podcasts. No credit card required.