ACT Imitation Learning of Operate Box with ROS/Gazebo Simulation
Introduction
This experiment is carried out in the Gazebo simulation platform and focuses on the dual - arm robot operation handover and block - placement tasks based on action - block imitation learning. The main steps are as follows:
Data Collection: First, a series of motion data are collected by having a human operator manually control a dual - arm robot to perform operation handover and block - placement tasks in the simulation environment. These data include joint angles and trajectory information. Model Training: An imitation learning model is trained using the collected data. The model can learn the action patterns of human operators, thereby learning to accurately perform dual - arm handovers and place blocks in designated positions.
Simulation Experiment: The trained model is applied to the Gazebo simulation environment. In the simulation experiment, the dual - arm robot executes operation handover and block - placement tasks based on the learned strategy. The success rate and accuracy of the robot in completing the tasks are observed and evaluated through multiple experiments, and various data during the experiment are recorded for further analysis and model optimization.
Optimization and Improvement: The model is adjusted and optimized according to the experimental results to improve the performance of the robot in actual operations, enabling it to complete tasks more stably and efficiently. This experiment aims to explore the effective control of dual - arm robots in complex operation tasks using imitation learning methods, providing a theoretical foundation and technical support for the future application of dual - arm robots in industrial scenarios.
Experiment
The video shows a robot imitation learning experiment based on ROS/Gazebo. In the simulation environment, a robotic arm is operating a red and blue cube on the workbench. The “color_image” window in the top left corner displays the image information from the perspective of the robotic arm. This experiment aims to train the robot to complete cube manipulation tasks through imitation learning.
Method
The figure illustrates a Transformer - based testing process for predicting action sequences. ResNet18 extracts features from images of four cameras. These features, combined with sinusoidal position embeddings, are fed into a Transformer encoder. The encoder processes them and passes to a Transformer decoder, which generates the predicted action sequence with the aid of position embeddings.
Others
This picture shows the selection box for identifying fruit types through owlvit, then determining the specific outline of the fruit through SAM, finally conducting video tracking through XMem, and finally determining the position of the fruit through point cloud recognition of the depth image and then picking it.

