OpenTouch: Bringing Full-Hand Touch to Real-World Interaction

Yuxin Ray Song1,*   Jinzhou Li2,*   Rao Fu3,*   Devin Murphy4   Kaichen Zhou1,5
Rishi Shiv1   Yaqi Li1   Haoyu Xiong1   Crystal E. Owens1   Yilun Du5
Yiyue Luo4   Xianyi Cheng2   Antonio Torralba1   Wojciech Matusik1   Paul Pu Liang1

1MIT    2Duke University    3Brown University    4University of Washington    5Harvard University
*Equal contribution

OpenTouch: Egocentric Video • Full-hand Tactile • Hand Poses

OPENTOUCH is the first in-the-wild, full-hand tactile dataset with synchronized egocentric video, force-aware full-hand touch, and hand-pose trajectories. It contains 5 hours of recordings, including 3 hours of densely annotated, contact-rich interactions.

Example Vision-Touch-Pose Data

OPENTOUCH demonstrates that hardware-based tactile sensing and pose tracking reveal critical force, contact, and motion cues that vision alone cannot capture.

Example 1 figure
Example 1 — figureassets/examples/ex1.png
Example 1 — videoassets/examples/ex1.mp4

(a) Although the first three frames show nearly identical hand poses, the tactile signals reveal that in the third frame the hand applies sufficient force to move the chair.

Example 2 figure
Example 2 — figureassets/examples/ex2.png
Example 2 — videoassets/examples/ex2.mp4

(b) In the first frame, tactile readings clearly indicate contact with the table, ambiguous from RGB alone. In the next two frames, the hand moves out of view, making vision-based pose estimation unreliable; OPENTOUCH provides accurate hardware-tracked poses throughout.

Example 3 figure
Example 3 — figureassets/examples/ex3.png
Example 3 — videoassets/examples/ex3.mp4

(c) Tactile sensing exposes clear interaction patterns with transparent object that remain difficult to infer from visual tracking alone.

Example 4 figure
Example 4 — figureassets/examples/ex4.png
Example 4 — videoassets/examples/ex4.mp4

(d) The tactile map captures a subtle middle-finger double-click on a button, a fine-grained motion that even pose tracking may miss. See the supplementary video for the high-fidelity tactile signals and subtle dynamic patterns.

Hardware + Annotation Setup

Meta Aria glasses, Rokoko Smartgloves, and the FPC-based tactile sensor are synchronized at 30 Hz with an average 2 ms latency. High-level descriptions and detailed annotations are automatically generated from the egocentric video and the rendered tactile maps using a large language model.

Interactive
Full-hand Touch Spatial Layout
Annotation pipeline figure
Annotation pipeline

Tactile Map & Grasp Taxonomy

We visualize accumulated tactile maps across dataset for different grasp types (defined by grasp taxonomy). The spatial pressure patterns strongly correlate with the underlying grasp configuration, demonstrating the accuracy and quality of our tactile data and grasp type annotation.

Tactile map and taxonomy figure

Annotation Statistics

Sankey diagram visualizing the distribution of dataset labels, including environment, action, grasp type, and object category. In total, OpenTouch contains objects across 14 everyday environments, covering over 8,000 objects from 14 categories.

Sankey diagram + stats figure

Tactile retrieval in-the-wild (Ego4D)

OPENTOUCH can act as a tactile database for in-the-wild egocentric video datasets: we demonstrate that in-the-wild video (e.g., Ego4D) can retrieve plausible tactile sequences, enabling large-scale egocentric video to be augmented with contact and force cues. The source videos paired with the retrieved tactile exhibit human behaviors and manipulation primitives strikingly similar to the query.

OpenTouch benchmark

Retrieval + classification

Retrieval benchmark

tactile → videoassets/videos/retrieval_tactile_to_video.mp4
video → tactileassets/videos/retrieval_video_to_tactile.mp4

Bi-modal (Ours only)

R@1 / R@5 / R@10 / mAP

DirectionR@1R@5R@10mAP
video → tactile7.1526.7339.7415.47
tactile → video7.1526.3039.0315.28
pose → tactile6.9321.0230.4513.13
tactile → pose7.1521.8730.8813.43

Tri-modal (Ours only)

R@1 / R@5 / R@10 / mAP

DirectionR@1R@5R@10mAP
video + pose → tactile14.0842.9662.2626.86
tactile + pose → video12.7238.5353.1823.46
video + tactile → pose15.4443.3957.6126.86
  • Finding 1: Bi-Modal retrieval. The symmetry across both directions suggests that learned representation is genuinely multimodal rather than biased toward a single modality.
  • Finding 2: Multi-modal outperform unimodal. Multimodal inputs offer complementary information that reduces retrieval ambiguity, as video provides global scene context, pose encodes kinematics, and tactile captures local contact and force.

Classification benchmark

Modality Action Acc. (RN18) Action Acc. (Lite-CNN) Grasp Acc. (RN18) Grasp Acc. (Lite-CNN)
V40.2657.45
P33.2246.32
T29.9531.5960.2357.12
T + P28.3127.0060.7262.19
T + V30.1132.7351.7265.47
T + P + V35.0237.3255.6568.09
  • Finding 1: Tactile is highly informative for grasp type, reflecting that grasp relies on local contact geometry.
  • Finding 2: Actions recognition depend on higher-level global context provided by video modality.

Citation

BibTeX

@misc{song2025opentouchbringingfullhandtouch,
      title={OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction},
      author={Yuxin Ray Song and Jinzhou Li and Rao Fu and Devin Murphy and Kaichen Zhou and Rishi Shiv and Yaqi Li and Haoyu Xiong and Crystal Elaine Owens and Yilun Du and Yiyue Luo and Xianyi Cheng and Antonio Torralba and Wojciech Matusik and Paul Pu Liang},
      year={2025},
      eprint={2512.16842},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.16842},
}
Click to copy BibTeX.

Acknowledgment

We thank the MIT Office of Research Computing and Data (ORCD) for support through ORCD Seed Fund Grants, which provided access to 8×H200 GPUs and additional funding support. We also thank the NVIDIA Academic Grant Program for GPU support, Murata, and Analog Devices for supporting this work through the MIT Gen AI Impact Consortium. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NVIDIA, Murata, and Analog Devices.

Like & Contact

Say hi / request access / collaborations

Corresponding author: Ray Song
Email: rayxsong@mit.edu
GitHub: opentouch-tactile
✉️ Email