ChinaEV Home was invited as an evaluation media outlet to participate in a non-public test of NWM 2.0. This presents a less technical record of this discussion. NIO’s new intelligent driving system, known as World Model 2.0 (NWM 2.0), has begun rolling out in phases. According to official disclosures, full deployment across NT 2.0 and NT 3.0 models is expected to be completed within about 30 days. For NIO, this is a major update. In the AI-driven era, intelligent driving has become a core battlefield where premium brands must compete head-on. A brief review of this version will be released soon in video form. Today’s article is different. Ahead of the New Year, ChinaEV Home was invited as an evaluation media outlet to participate in a non-public test of NWM 2.0, followed by an in-depth communication session with Ren Shaoqing, Vice President of Intelligent Driving at NIO, and She Xiaoli, Head of Intelligent Driving R&D Product Systems. She Xiaoli, Head of Intelligent Driving R&D Product Systems at NIO. During the nearly three-hour discussion, Ren Shaoqing and She Xiaoli, in a rare and candid exchange, offered deep explanations from technical, strategic and theoretical perspectives. They covered topics including the choice of high-level intelligent driving paths, differences between VLA and world models, distinctions between expert datasets and mass-production data, and what reinforcement learning can truly solve. Taken together, the discussion resembled a profound self-reflection on technology routes, engineering costs and patience over time — a defining contest over the “main road” of intelligent driving. Ren did not evade questions, nor did he rush to provide definitive answers. Ren Shaoqing, Vice President of Intelligent Driving at NIO. In the end, experience will deliver the answer, and users will cast the final vote. What follows is ChinaEV Home’s attempt to present a less technical record of this discussion. From rules to imitation In many intelligent driving evaluation reports, a “rule-heavy” approach has increasingly become a point of criticism. Rule-heavy systems often feel mechanical, less fluid and prone to hesitation, while also struggling with low-probability edge cases. In engineers’ eyes, rules are not AI, but rather a logical world stitched together from human experience. As a result, over the past three years — especially following the release of Tesla FSD V12 and V13 — intelligent driving R&D in China has repeatedly shifted technical routes, from BEV to end-to-end approaches, in an effort to overcome these non-AI, non-humanlike shortcomings. That said, rules remain the starting point. “Without rules, the system cannot even run,” Ren said during the session. In NIO’s early models, especially the first generation, rules largely served as safety fallbacks: explicit lane-change distances, fixed deceleration logic, and clearly defined safety boundaries. However, as urban driving scenarios grew more complex, the limitations of rule-based systems quickly surfaced. “You can write rules for 99% of scenarios, but the remaining 1% can never be fully covered,” Ren said. This led to the introduction of imitation learning in the second phase. Industry focus shifted from city coverage and “nationwide availability” to “end-to-end” and “data-driven” approaches. Despite significant progress — particularly through massive parameter scaling that enabled large-scale learning of human driving behavior — fundamental limitations remained. As She explained, end-to-end models essentially memorize observed data through parameters and then match patterns in real-world scenarios. But real-world complexity far exceeds what parameters can cover, inevitably leading imitation learning into a “probability averaging trap.” Internally, NIO summarized this with a single word: “lazy,” or more bluntly, “lying flat.” Such models appear smooth and fluid because they learn from humans, but human behavior itself is uneven. Some drivers change lanes, others do not; some slow down cautiously, others cut in aggressively. The system cannot judge which behavior is better or worse, only averaging probabilities. The result is a series of familiar issues: hesitant starts at intersections, indecisive lane changes, abnormally low speeds on open roads — the notorious failure to accelerate. The conclusion, She said, is that past optimization focused on improving how efficiently systems memorize and imitate humans, without addressing the core problem: a lack of goal awareness and explicit evaluation of action quality. Code 3.0 Ren explained the evolution of intelligent driving routes by referencing Tesla’s concept of “Code 2.0.” Code 1.0 is rules. For example, crossing three lanes at an intersection requires rules such as changing lanes at 300 meters, then 200 meters, then 100 meters. If something goes wrong, engineers collect data, analyze causes and add more rules. The problem is that code grows endlessly. In systems with millions of lines, engineers struggle to fully understand the logic, and new rules easily conflict with existing ones. Code 2.0 is data. Rules are compressed into parameters, with models learning lane changes and path selection from massive real-world driving data. But data-driven approaches face another bottleneck: inconsistent human behavior leads models toward compromise averages. In left-turn merges, some drivers change lanes earlier, others later. The model’s “middle choice” often fails. On narrow two-way roads with cyclists, some drivers overtake while others follow. Models tend to conservatively follow for safety, conflicting with users’ expectations for efficiency and assertiveness. Common industry fixes include SD+ (map-guided long-horizon planning), expert data, and additional rules. Ren was pragmatic: “The first and third cost money, the second costs manpower.” Expert data is also regionally dependent — data collected in Beijing or Shanghai does not generalize to Chongqing. Is there a Code 3.0? Ren said yes — reinforcement learning. The version NIO is rolling out represents a step toward a complete reinforcement learning system. Scoring problems instead of patching them What does reinforcement learning solve? First, it improves logical consistency. Training incorporates tasks with clear right and wrong answers, such as code and math problems. The paradigm shifts from “adding rules when problems arise” (Code 1.0) or “adding data” (Code 2.0) to “scoring outcomes.” After the model outputs a result, the system provides positive or negative feedback, allowing the model to learn what constitutes a better outcome. This involves self-supervision, using reward scorers or feedback derived from human behavior. Ren gave a left-turn intersection example. NIO builds a simulation environment with a defined target line. Successfully crossing it earns rewards, with faster completion earning higher rewards. Additional constraints, such as penalties for crossing solid lines, are applied. Beyond that, there are few complex rules. Where to change lanes and how to safely cross three lanes are left to the model to explore in simulation. Ren emphasized that this approach does not rely on SD+ or expert data. The benefits are twofold. First, no incremental data is required. As long as similar intersections can be constructed in simulation, generalization is achieved without collecting data from countless specific intersections. Second, fewer rules reduce conflicts. Simpler systems generalize better. Ren said NIO has completely changed its iteration logic over the past six months. While previous versions mixed old and new approaches, the current version fully adopts the new paradigm, with safety fallbacks retained but model iteration driven by the new system. In response to a question about how many domestic systems have truly entered Code 3.0, Ren said that in China, NIO is currently the only one to have implemented a complete reinforcement learning system. A new paradigm While Ren focused on why the paradigm must change, She emphasized why the experience improves after the change. She summarized NIO’s approach in three steps: Imitation learning to absorb large-scale human behavior distributions. Long-horizon reasoning within the new world model. High-frequency closed-loop reinforcement learning to continuously inject feedback. This corresponds to a concrete engineering shift. Under NWM 1.0, models would output trajectories too close to vehicles or pedestrians, requiring rules at inference to filter them. Previously, models generated multiple candidate trajectories, which were scored by rule-based rewards, with lateral control handled by models and longitudinal control shared with rules. In the new version, the vehicle outputs a single trajectory directly, with both lateral and longitudinal control handled entirely by the model. Ren added that corrections previously done on the vehicle are now pushed upstream into training and distribution alignment. The biggest gains, he said, are in lane deviation correction and intersection handling, including cut-ins. She explained that lane deviation benefits from long-horizon reasoning. If the system predicts that failing to change lanes one kilometer ahead will cause deviation later, it proactively adjusts based on early penalty signals. She sees this as evidence that the new architecture enables long-cycle data closed-loop iteration, driven by long-distance, long-duration reasoning rather than last-second reactions. Data throughput Data throughput is another key issue. Over the past two years, avoiding the pitfalls of imitation learning has been central to intelligent driving debates. Two-stage end-to-end systems, SD+, rule patches, small-plus-large models and expert datasets all serve similar purposes to reinforcement learning. While some approaches have been abandoned, expert datasets remain controversial. Are expert datasets useful? How should they be used? Ren compared them to “delicate but expensive ingredients.” Collected from professional drivers and test fleets, they are highly curated and can quickly establish baseline capabilities. But “expert data is clean, the world is not.” Ren identified three problems. First, high cost and limited scale. Second, constrained scenario coverage. Expert data must be specially collected for corner cases. Data from Beijing or Shanghai does not cover Chongqing’s unique road conditions, requiring localized expert data collection. Third, expert data fails to reflect real user behavior distributions. It filters behavior to match expectations, similar to preference alignment in large language models, but cannot capture the full spectrum of real-world behavior. These factors mean expert data cannot serve as the long-term staple of intelligent driving development. That staple, Ren said, is mass-production data. Mass-production data is nearly limitless, but extremely noisy. It includes inconsistent lane changes, solid-line crossings, distracted driving and other irregular behaviors. The challenge is building systems capable of digesting this noise. Ren explained that in imitation learning, noisy data creates distributions where different behaviors rank by frequency. Engineers often adjust distributions to force desired outcomes. NIO now adjusts priorities through rewards, using reinforcement learning to shift model preferences and suppress noisy behaviors. She likened reinforcement learning to a coach, giving feedback on every action and enabling the system to distinguish between “60-point” and “100-point” behaviors. Using cut-ins as an example, she contrasted hesitant, reckless moves with smooth, anticipatory ones. The behavioral difference is subtle, but the user experience is vastly different. Even so, reinforcement learning alone cannot yet absorb all mass-production data. Real-world data is far more complex than internet text, and no intelligent driving system yet processes data at that scale. Additional techniques are needed, Ren said, and NIO continues to research them. The main road debate Public discussions often reduce intelligent driving routes to labels such as VLA, world models or end-to-end reinforcement learning. In China, a central debate is whether VLA or world models represent the correct path. Ren said VLA’s strengths lie in short-term effectiveness, synergy with large language models and strong semantic understanding. But it is essentially an extension of language models, adding vision-to-language translation layers trained on relatively limited real-world data. As a result, language models and their variants struggle with real-world understanding, especially quantitative reasoning about speed, distance and physical laws. This stems from training data dominated by text, with limited images and little video. Without video-based learning of object motion, physical understanding remains weak. Therefore, world models are essential for intelligent driving. NIO is training models directly on massive video datasets for autonomous driving and robotics, exploring a different path within the automotive industry. Recent breakthroughs include robotics models trained on 270,000 hours of data. Using real-world data rather than language models, and scaling to millions of hours, is proving viable. Ren said NIO is committed to advancing world models to perform better in real-world applications. Returning to leadership Before the Q&A, Ren addressed what NIO had been doing since the launch of NWM 1.0, responding to earlier skepticism. Rather than framing it as feature upgrades, he emphasized paradigm shifts and engineering integration. He said industry changes accelerated in 2025, with rapid FSD progress overseas. NIO spent six months moving from NWM 1.0 to 2.0. First was the paradigm shift, which caused months of internal difficulty but ultimately improved product capability. Second was the deployment of NIO’s self-developed intelligent driving chip, Shenji NX9031. From mid-2025 to year-end, second-generation platform models completed EOP and fully adopted the in-house chip. Iteration cycles initially took three to six months, but have now been compressed to two weeks, matching the mainline four-Orin platform. Code and model sharing between platforms has exceeded 95 percent, enabling synchronized updates. NWM 2.0 will be pushed simultaneously to both platforms. Ahead of the release, NIO founder William Li repeatedly outlined plans for three major intelligent driving updates this year and a return to the top tier. In a leaked internal speech on January 15, Li acknowledged that NIO chose a difficult path — self-developed chips, operating systems and world models — requiring deep foundational work. In 2026, NIO will further increase investment in computing power and R&D efficiency, even under tight resources. On NWM 2.0, Li said early testing feedback has been very positive. NIO aims to return to industry leadership through three major releases this year. Time will tell.