題名: 基於視覺辨識之機器人交互界面
其他題名: Vision-Based Human-Robot Interaction Interface
作者: 周竑宇
關鍵字: 機器人交互介面
視覺辨識
人機交互
YOLOv8-pose
DeepSORT
MediaPipe
表情辨識
物件追蹤
骨架辨識
關鍵點軌跡
即時互動
Human-Robot Interaction Interface
Vision-Based Recognition
Human-Robot Interaction
Gesture Recognition
Emotion Recognition
Object Tracking
Skeleton Tracking
Keypoint Trajectory
Real-time Interaction
系所/單位: 資訊工程學系, 資訊電機學院
摘要: 中文摘要 本專案致力於開發一套「基於視覺辨識之機器人交互介面」,旨在提升人機協作的直觀性與實用性 。系統核心目標是建構一個能夠即時感知、分析使用者行為,並觸發人形機器人交互功能的完整運行模式 。 本系統深度整合多項先進技術,實現精準的使用者行為捕捉與情緒感知: 多目標人像偵測與追蹤: 採用 YOLOv8-pose 進行人像偵測,並結合 DeepSORT 演算法實現多使用者穩定追蹤與身份識別,有效降低誤判並提升追蹤魯棒性 。 高精度手勢識別: 基於 MediaPipe Hand 提供的關鍵點座標序列,開發客製化揮手檢測算法 。該算法透過分析手部關鍵點軌跡的週期性運動,並設定像素位移閾值(如 50-100 像素)和 1-2 秒滑動視窗,結合多層過濾機制,確保高精度手勢識別 。 臉部情緒辨識: 針對現有模型(如 DeepFace)的限制 ,本專案選擇運用 MediaPipe Face Mesh 技術 ,並透過建立客製化臉部資料集顯著提升情緒辨識準確率,包含快樂、中性、驚訝等多種情緒,並經精確度、召回率和 F1-score 等指標嚴格評估 。 高效能數據流與機器人協同: 系統將影像處理後的 ROI 數據精確裁切並轉化為文字訊息傳輸,大幅降低延遲 。這些優化後的數據驅動機器人,使其能實現「骨架跟隨使用者移動」等交互功能,確保機器人動作與使用者行為的自然同步 。
Abstract This project is dedicated to developing an "Vision-Based Human-Robot Interaction Interface," aiming to enhance the intuitiveness and practicality of human-robot collaboration. The system's core objective is to construct a fully operational mode capable of real-time perception and analysis of user behavior, thereby triggering interactive functions in humanoid robots. This system deeply integrates multiple advanced technologies to achieve precise capture of user behavior and emotional perception: • Multi-Target Human Detection and Tracking: The system employs the YOLOv8-pose model for high-efficiency human detection, and its detected bounding boxes are fed into the DeepSORT tracking algorithm. This combination ensures stable tracking and identity recognition of multiple users (track IDs) across continuous video frames, effectively reducing false positives and mitigating the impact of temporary occlusions, thereby enhancing tracking robustness. • High-Precision Gesture Recognition: For interactive gestures such as waving, this system develops a customized waving detection algorithm based on the keypoint coordinate sequences provided by MediaPipe Hand. This algorithm analyzes the periodic motion of hand keypoint trajectories in the horizontal direction and sets pixel displacement thresholds (e.g., 50-100 pixels) and a 1-2 second sliding window. Combined with multi-layer filtering mechanisms, it ensures high-precision gesture recognition. • Facial Emotion Recognition: Addressing the limitations of existing models (e.g., DeepFace) in specific application scenarios, this project opts to utilize MediaPipe Face Mesh technology. To this end, we specifically built and trained a customized facial dataset, significantly improving the accuracy of emotion recognition (including happy, neutral, surprised, etc.) in complex environments. This was rigorously evaluated using metrics such as precision, recall, and F1-score. • High-Efficiency Data Stream and Robot Collaboration: The system design considers the demands of real-time interaction. After image processing, the analyzed ROI (Region of Interest) data is precisely cropped and converted into text messages for transmission, significantly reducing latency. This optimized data then drives the robot, enabling interactive functions such as "skeleton following user movement," ensuring natural synchronization between robot actions and user behavior.
學年度: 113學年度第二學期
開課老師: 葉, 春秀
課程名稱: 多媒體系統
系所: 資訊工程學系, 資訊電機學院
分類:資電113學年度

文件中的檔案:
檔案 描述 大小格式 
1132-37.pdf2.9 MBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。