Xiaomi’s two core technological advancements in embodied AI are the Xiaomi-Robotics-0 VLA large model and the TacRefineNet tactile-based grasping fine-tuning model. Xiaomi Auto announced today that its humanoid robot has completed its first real-world testing at the Xiaomi Auto Factory. According to official data, during a continuous three-hour autonomous operation test, the robot was required to complete the entire workflow from grasping to installing self-tapping nuts at a workpiece station. Examples of successful self-tapping nut installation under different conditions The challenge of the task lies in the spline structure inside the self-tapping nuts, which results in an inconsistent orientation of the nut in the robot’s gripper each time it is grasped. Additionally, the magnetic force from the positioning pin creates extra pulling interference. The robot must ensure precise alignment and reliable seating between the nut and the positioning pin. Test results show that the success rate for simultaneous dual-side installation reached 90.2%, while also meeting the production line’s fastest cycle time requirement of 76 seconds. Simultaneous dual-side installation success rate reaches 90.2% Supporting this performance are two core technological advancements from Xiaomi in the field of embodied AI: the Xiaomi-Robotics-0 VLA (Vision-Language-Action) large model and the TacRefineNet tactile-based grasping fine-tuning model. During this deployment, Xiaomi implemented three key technological solutions: End-to-End Data-Driven Control: Building upon the VLA foundation model and incorporating reinforcement learning, this enables the robot to quickly adapt to different downstream tasks and continuously learn from interactive experiences in real physical environments. This framework effectively reduces reliance on teleoperation data and enhances the model’s generalization ability across different embodiments and scenarios. Model framework and training process Multi-Modal Perception Fusion: The robot integrates multi-modal information, including vision, touch, and joint proprioception, for coordinated perception and comprehensive judgment during operations. Relying solely on vision can lead to uncertainty under changing lighting or partial occlusion. Introducing tactile feedback significantly improves task execution stability and robustness. Display of head camera, wrist camera, and fingertip tactile information Whole-Body Motion Control Hybrid Architecture: This architecture combines optimization-based control and reinforcement learning. The optimization controller, based on quadratic programming, achieves four levels of strictly prioritized control with a solution time of less than 1ms. The reinforcement learning controller, trained on a large-scale parallel simulation platform, enables the robot to learn balance strategies under extreme disturbances, allowing for zero-shot deployment in the real world. Whole-body motion control block diagram The self-tapping nut workpiece station represents the first step in the large-scale application of Xiaomi’s humanoid robot in automotive manufacturing scenarios. To this end, Xiaomi is also conducting deployment and validation work at other typical stations, including a tote handling station and a front emblem installation station. It is noteworthy that alongside advancing its robotics technology, Xiaomi is also building an open-source ecosystem. Details of the preliminary technical solutions and experimental videos have been made public. The relevant code and models can be accessed through the following channels: Project Page:https://sites.google.com/view/hil-daft/ Arxiv:https://arxiv.org/abs/2509.13774 TacRefineNet:https://sites.google.com/view/tacrefinenet Xiaomi-Robotics-0:https://github.com/XiaomiRobotics/Xiaomi-Robotics-0