Research explorations

Brief descriptions of the papers I explore along with any questions that I am seeking answers of or any idea related to it that can be explored. More for the purpose of tracking my learnings.
Feel free to drop an email if you have some discussion points on the below topics!


  • Self-Improving Autonomous Underwater Manipulation

    Authors: Ruoshi Liu, Huy Ha, Mengxue Hou, Shuran Song, Carl Vondrick

    Brief Description:

    • The paper introduces AquaBot which is a fully autonomous manipulation system that combines behaviour cloning with self-learning optimization.
    • Learns from human demonstrations and self-improves through trial and error
    • Handles underwater challenges like random water currents and optical challenges like light distortion and visibility issues
    • Uses a visuomotor policy to link vision with precise arm movements
    • Uses self-supervised learning to refine grasping and manipulation

    Questions:

    • Given the challenges like complex fluid dynamics and optical distortions in real-world settings, can sim-to-real transfer can be done to accelerate training and improve generalization to unseen objects and conditions by utilising domain adaptation techniques or adversarial training?

  • RoboCrowd: Scaling Robot Data Collection through Crowdsourcing

    Authors: Suvir Mirchandani, David D. Yuan1, Kaylee Burns, Md Sazzad Islam, Tony Z. Zhao, Chelsea Finn, Dorsa Sadigh

    Brief Description:

    • RoboCrowd gathers large scale human demonstrations through public participation to train robotic policies using the ALOHA bimanual platform
    • The system encourages participation by offering material rewards like candies, making tasks fun and adding competitive elements like leaderboard.
    • When tested at a university cafe for 2 weeks, it gathered more than 800 demonstrations from over 200 participants.
    • Combining this gathered data with expert fine-tuning improved the robot's performance by about 20%.

    Suggestions:

    Following things might further increase user participation and hence data collection:
    • Teleoperators at multiple locations: Instead of having a single teleoperator near the ALOHA system, could we install multiple teleoperators with consoles connected to the single ALOHA system across campus similar to teleoperated surgical robotic systems?
    • Alternate user authentication: Users can scan a QR code with their own devices to land on the sign in page, removing the need of having physical card readers at each location
    • Tablet-Free interface: The same webpage could host the tutorial, leaderboard, and other interactions, removing the need of having a tablet at each location
    • Digital incentives: Online monetary rewards could replace candies/sweets as suggested in the paper itself, making it easier to manage incentives across multiple locations
    • Enhanced Gamification: The results can be posted on social media through an official RoboCrowd handle to further incentivize participation. Automated posts can be done every few hours/days

  • FAST: Efficient Action Tokenization for Vision-Language-Action Models

    Authors: Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine

    Brief Description:

    • FAST: Frequence-space Action Sequence Tokenization
    • Existing Vision-Language-Action (VLA) models struggle with continuous action spaces which make learning slow and complex
    • Instead of learning continuous movements, FAST proposes breaking actions into discrete tokens, like words in a sentence, so that they can be processed easily
    • Improves sample efficiency (fewer training steps) and helps robots learn faster
    • Presents an efficient tokenization of high-frequency actions for dexterous tasks, and nice zero-shot generalization on the DROID dataset.
    • This approach makes it easier to integrate AI models that understand vision, language and movement together

    Questions/Suggestions:

    • Since mobile robots need to conserve energy, can adding an energy cost penalty to DCT and retraining BPE would extend FAST's approach to be more efficient for them?

  • Tube-Certified Trajectory Tracking for Nonlinear Systems with Robust Control Contraction Metrics

    Authors: Pan Zhao, Arun Lakshmanan, Kasey Ackerman, Aditya Gahlawat, Marco Pavone, Naira Hovakimyan

    Brief description:

    • The paper presents a method to ensure reliable trajectory tracking for non-linear control-affine Systems
    • Based on Robust Control Contraction Metrics (RCCM). Gives tighter tubes compared to existing methods based on standard control contraction metrics
    • Incorporated RCCM-based tracking controller, along with computed tubes, into a feedback motion planning framework to plan safe trajectories for robotic systems
    • Invariant tubes: A set of trajectories around a nominal path where a system's actual state is guaranteed to remain, despite disturbances and uncertainities.

    Questions:

    • Barrier Lyapunov Functions have the ability to naturally prevent constraint violations. Can BLFs be combined with the proposed approach to get less conservative and more safety guaranteed trajectory tracking?