Analysis, Interpretation and Recognition of 2D (touch) and 3D Gestures for New Man-Machine Interactions (AIR)

(Analyse, Interprétation et Reconnaissance de Gestes 2D (tactile) et 3D pour de nouvelles interactions homme-machine)

Description (EN)

Analysis, Interpretation and Recognition of 2D (touch) and 3D Gestures for New Man-Machine Interactions With the development of touch screen and motion capture technology, new human-computer interaction gains in popularity in the recent years: human-machine interactions are evolving. Several methods of artifical intelligence have been designed to take advantage of the new interaction potential offered by 2D and 3D action gestures. These gestural controls allow the user to execute many actions simply by doing 2D or 3D Gestures. Recognition of human actions (2D and 3D action gestures) has recently become an active research topic in Artificial Intelligence, Computer Vision, Pattern Recognition and Man-Machine Interaction. In this course we address this emerging scientific topic: Analysis, Interpretation and Recognition of 2D (touch) and 3D Gestures for new Man-Machine Interactions. Technically, an action is a sequence generated by a human subject during the performance of a task. Action recognition deals with the process of labelling such motion sequence with respect to the depicted motions. The course will expose the specificity of the motion capture and modelisation as well as the recognition process of these two kind of actions (2D and 3D action gestures) but also the potential convergence of the scientific approaches used for each of them. We want also to address in this course some notion of user-centered design, user needs, acceptability and user testing to illustrate the importance of considering the user when we develop such new human-computer interaction.

Description (FR)

Avec le développement des écrans tactiles et des technologies de capture de mouvement, de nouvelles interactions homme-machine sont apparues ces dernières années. Des approches d'intellignece artificielle sont utilisées pour tirer parti du potentiel d'interaction offert par la reconnaissance de gestes 2D et 3D. Ces commandes gestuelles permettent à l'utilisateur d'exécuter de nombreuses actions simplement en faisant des gestes. Aujourd’hui, la reconnaissance de commandes gestuelles 2D et 3D est devenue un sujet de recherche très actif dans les domaines scientifiques suivants : Intelligence artificielle, Computer Vision, Pattern Recognition et Man-Machine Interaction. Dans ce cours, nous abordons ce thème scientifique émergeant: l'analyse, l'interprétation et la reconnaissance des gestes 2D et 3D pour de nouvelles interactions homme-machine. Techniquement, une action est une séquence de gestes générée par un sujet humain pendant l'exécution d'une tâche. La reconnaissance d'action consiste à identifier automatiquement cette séquence de mouvement par rapport à un ensemble de commandes possibles. Ce cours exposera les spécificités des processus de capture et de modélisation des gestes ainsi que la reconnaissance de ces deux types d'actions (gestes 2D et 3D) mais aussi les convergences possibles des approches scientifiques utilisées. Nous voulons également aborder dans ce cours les notions de conception centrée utilisateur, les besoins utilisateurs, l'acceptabilité et les tests utilisateurs pour illustrer l'importance de considérer l'utilisateur lorsque nous développons de telles interactions homme-machine.

Mots-clés

Geste 2D, Geste 3D, classification, Reconnaissance, Analyse, Interaction Homme-Machine, Computer Vision, Pattern Recognition, Man-Machine Interaction

Prérequis

Aucun

Contenu

Thèmes abordés dans le cours:

Acquisition de signaux, prétraitement et normalisation
- Acquisition de signaux sur écran tactile, orienté stylet et sur surface tangible qui permettent la participation simultanée de plusieurs utilisateurs.
- Systèmes de capture de mouvement (MoCap) pour extraire des postures corporelles basées sur des positions et des orientations articulaires 3D en utilisant des marqueurs et un ensemble de caméras haute précision.
- Microsoft Kinect ou Capteur Leap Motion.
- Prétraitement et normalisation morphologique.
- Modélisation du squelette humain.
Extraction de caractéristiques et espace de représentation
- Extraction de caractéristiques 2D et 3D ;
- Modélisation des relations temporelles, spatiales et de mouvement.
Intelligence artificielle pour la reconnaissance d'actions en 2D et 3D
- Reconnaissance à la volée et a posteriori
- Reconnaissance de geste basée sur le squelette
- Moteurs de reconnaissance et d'apprentissage automatique:
  - Graph modelling, matching and embedding algorithm
  - Dynamic Time Warping (DTW)
  - Hidden Markov Model (HMM)
  - Support Vector Machine (SVM)
  - Neural Network (NN)
  - Reject Option…
Segmentation et détection d'actions 2D et 3D
- Commandes directes et indirectes
- Détection précoce d'une action dans un flux de mouvement non segmenté.
- Méthodes de segmentation temporelle, fenêtres glissantes...
Conception centrée utilisateur (CCU - ISO 9241-210) et protocole de tests
- La conception centrée utilisateur vise à rendre les systèmes utilisables en mettant l'accent sur les utilisateurs, leurs besoins et leurs exigences, et en appliquant les facteurs humains, l'ergonomie et les connaissances et techniques d'utilisabilité.
- Protocole de test
- Méthodes d'évaluation et d'analyse des données de tests
Exemple et démonstration

Content

Mains topics address in the course:

Signal acquisition, Preprocessing and Normalization
- Motion capture (MoCap) systems to extract 3D joint positions by using markers and high precision camera array.
- Microsoft Kinect or Leap Motion sensor: Shotton algorithm largely eases the task of extracting 3D joint positions.
- Pen-based and Multi-Touch Capture on touch screen: smartphone, tablet PC and tangible surface which support simultaneous participation of multiple users
- Morphology normalisation pre-processing
- Joint trajectory modelling
Feature Extraction
- 2D and 3D feature extraction
- Sub-stroke representation
- Temporal, shape and motion relation between Sub-stroke
Artificial Intelligence for 2D and 3D Action recognition
- Eager and lazy Recognition
- Skeleton-based human action recognition
- Several Recognition and Machine Learning Approaches:
  - Graph modelling, matching and embedding algorithm
  - Dynamic Time Warping (DTW)
  - Hidden Markov Model (HMM)
  - Support Vector Machine (SVM)
  - Neural Network (NN)
  - Reject Option
2D and 3D Segmentation and action detection
- Direct manipulation and indirect commands
- Early detection of an action, in an unsegmented stream.
- Temporal segmentation methods.
- Sliding Window approach.
Human-centered design (ISO 9241-210) and test protocol
- The goal of the user-centered design process is to obtain a product that is functional, operational and satisfies the user applying humans factors, ergonomics, and knowledge and technics of usability.
- Test protocols
- Data analysis
Examples and demo

Compétences acquises

Interaction homme machine gestuelle 2D et 3D, Traitement et Reconnaissance de formes, classification

References

A. Delaye and E. Anquetil, “Hbf49 feature set: A first unified baseline for online symbol recognition,” Pattern Recognition, vol. 46, no. 1, pp. 117–130, 2013.
Z. Chen, E. Anquetil, H. Mouchère, and C. Viard-Gaudin, “Recognize multi-touch gestures by graph modeling and matching,” in 17th Biennial Conference of the International Graphonomics Society, Pointe-a`-Pitre, France, Jun. 2015.
D. Rubine, “Specifying gestures by example,” in Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’91. New York, NY, USA: ACM, 1991, pp. 329–337.
S. Macé and E. Anquetil, “Eager interpretation of on-line hand-drawn structured documents: The dali methodology,” Pattern Recognition, vol. 42, no. 12, pp. 3202–3214, Dec. 2009.
M. Mu¨ller, T. Ro¨der, M. Clausen, B. Eberhardt, B. Kru¨ger, and Weber, “Documentation mocap database hdm05,” 2007.
Said Yacine Boulahia, Eric Anquetil, Richard Kulpa, Franck Multon, HIF3D: Handwriting-Inspired Features for 3D Skeleton-Based Action Recognition, IEEE. 23rd International Conference on Pattern Recognition (ICPR 2016), Dec 2016, Cancun, Mexico.
Zhaoxin Chen, Eric Anquetil, Harold Mouchère, Christian Viard-Gaudin, The MUMTDB dataset for evaluating simultaneous composition of structured documents in a multi-user and multi-touch environment, 15th International Conference on Frontiers in Handwriting Recognition, Oct 2016, Shenzhen, China
L. Xia, C.-C. Chen, and J. Aggarwal, “View invariant human action recognition using histograms of 3d joints,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27, 2012.
M. A. Gowayyed, M. Torki, M. E. Hussein, and M. El-Saban, “Histogram of oriented displacements (hod): describing trajectories of human joints for action recognition,” in Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1351–1357, 2013.
R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recog- nition by representing 3d skeletons as points in a lie group,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595, 2014.
R. Chaudhry, F. Ofli, G. Kurillo, R. Bajcsy, and R. Vidal, “Bio- inspired dynamic 3d discriminative skeletal features for human action recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 471–478,2013.
H. Zhang and L. E. Parker, “Bio-inspired predictive orientation decomposition of skeleton trajectories for real-time human activity prediction,” in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3053–3060, 2015.
R. Kulpa, F. Multon, and B. Arnaldi, “Morphology-independent representation of motions for interactive human-like animation,” in Computer Graphics Forum, vol. 24, pp. 343–351, 2005.
A. Sorel, R. Kulpa, E. Badier, and F. Multon, “Dealing with vari- ability when recognizing user’s performance in natural 3d gesture interfaces,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 27, no. 08, 2013.
M. E. Hussein, M. Torki, M. A. Gowayyed, and M. El-Saban, “Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations,” in Proceedings of the International Joint Conference on Artificial Intelligence, vol. 13, pp. 2466–2472, 2013.
G. Evangelidis, G. Singh, and R. Horaud, “Skeletal quads: Human action recognition using joint quadruples,” in Proceedings of the IEEE International Conference on Pattern Recognition, pp. 4513– 4518, 2014.
V. Bloom, D. Makris, and V. Argyriou. Clustered spatio- temporal manifolds for online action recognition. In Pattern Recognition (ICPR), 2014 22nd International Conference on, pages 3963–3968. IEEE, 2014.
Y. Li, C. Lan, J. Xing, W. Zeng, C. Yuan, and J. Liu. Online human action detection using joint classification-regression recurrent neural networks. arXiv preprint arXiv:1604.05633, 2016.
finger-count interaction: Combining multitouch gestures and menus”, International Journal of Human-Computer Studies, v.70 n.10, p.673-689, October, 2012.
Sriganesh Madhvanath, Dinesh Mandalapu, Tarun Madan, Naznin Rao, Ramesh Kozhissery, “GeCCo: Finger gesture-based command and control for touch interfaces”, IHCI 2012: 1-6.

Teachers / Enseignant•e•s

Eric Anquetil (responsable), Richard Kulpa, Nathalie Girard