lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
3D Gesture Recognition and Tracking for Next Generation of Smart Devices: Theories, Concepts, and Implementations
KTH, Medieteknik och interaktionsdesign, MID.ORCID iD: 0000-0003-2203-5805
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The rapid development of mobile devices during the recent decade has been greatly driven by interaction and visualization technologies. Although touchscreens have signicantly enhanced the interaction technology, it is predictable that with the future mobile devices, e.g., augmentedreality glasses and smart watches, users will demand more intuitive in-puts such as free-hand interaction in 3D space. Specically, for manipulation of the digital content in augmented environments, 3D hand/body gestures will be extremely required. Therefore, 3D gesture recognition and tracking are highly desired features for interaction design in future smart environments. Due to the complexity of the hand/body motions, and limitations of mobile devices in expensive computations, 3D gesture analysis is still an extremely diffcult problem to solve.

This thesis aims to introduce new concepts, theories and technologies for natural and intuitive interaction in future augmented environments. Contributions of this thesis support the concept of bare-hand 3D gestural interaction and interactive visualization on future smart devices. The introduced technical solutions enable an e ective interaction in the 3D space around the smart device. High accuracy and robust 3D motion analysis of the hand/body gestures is performed to facilitate the 3D interaction in various application scenarios. The proposed technologies enable users to control, manipulate, and organize the digital content in 3D space.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology , 2014. , p. xii, 101
Series
TRITA-CSC-A, ISSN 1653-5723 ; 14:02
Keywords [en]
3D gestural interaction, gesture recognition, gesture tracking, 3D visualization, 3D motion analysis, augmented environments
National Category
Media and Communication Technology
Research subject
Media Technology
Identifiers
URN: urn:nbn:se:lnu:diva-40974ISBN: 978-91-7595-031-0 (print)OAI: oai:DiVA.org:lnu-40974DiVA, id: diva2:796232
Public defence
2014-03-17, F3, Lindstedtsvägen 26, KTH, 13:15 (English)
Opponent
Supervisors
Note

QC 20140226

Available from: 2014-02-26 Created: 2015-03-18 Last updated: 2018-01-11Bibliographically approved
List of papers
1. Experiencing real 3D gestural interaction with mobile devices
Open this publication in new window or tab >>Experiencing real 3D gestural interaction with mobile devices
2013 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 34, no 8, p. 912-921Article in journal (Refereed) Published
Abstract [en]

Number of mobile devices such as smart phones or Tablet PCs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with the device more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device, in the camera's field of view. In this paper, our gestural interaction heavily relies on particular patterns from local orientation of the image called Rotational Symmetries. This approach is based on finding the most suitable pattern from a large set of rotational symmetries of different orders that ensures a reliable detector for hand gesture. Consequently, gesture detection and tracking can be hired as an efficient tool for 3D manipulation in various applications in computer vision and augmented reality. The final output will be rendered into color anaglyphs for 3D visualization. Depending on the coding technology, different low cost 3D glasses can be used for the viewers. (C) 2013 Elsevier B.V. All rights reserved.

Keywords
3D mobile interaction, Rotational symmetries, Gesture detection, SIFT, Gesture tracking, stereoscopic visualization
National Category
Interaction Technologies
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40988 (URN)10.1016/j.patrec.2013.02.004 (DOI)000318129200010 ()
Available from: 2013-06-05 Created: 2015-03-18 Last updated: 2017-12-04Bibliographically approved
2. 3D photo browsing for future mobile devices
Open this publication in new window or tab >>3D photo browsing for future mobile devices
2012 (English)In: MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia, ACM Press, 2012, p. 1401-1404Conference paper, Published paper (Refereed)
Abstract [en]

By introducing the interactive 3D photo/video browsing and exploration system, we propose novel approaches for handling the limitations of the current 2D mobile technology from two aspects: interaction design and visualization. Our contributions feature an effective interaction that happens in the 3D space behind the mobile device's camera. 3D motion analysis of the user's gesture captured by the device's camera is performed to facilitate the interaction between users and multimedia collections in various applications. This approach will solve a wide range of problems with the current input facilities such as miniature keyboards, tiny joysticks and 2D touch screens. The suggested interactive technology enables users to control, manipulate, organize, and re-arrange their photo/video collections in 3D space using bare-hand, marker-less gesture. Moreover, with the proposed techniques we aim to visualize the 2D photo collection, in 3D, on normal 2D displays. This process is automatically done by retrieving the 3D structure from single images, finding the stereo/multiple views of a scene or using the geo-tagged meta-data from huge photo collections. By using the design and implementation of the contributions of this work, we aim to achieve the following goals: Solving the limitations of the current 2D interaction facilities by 3D gestural interaction; Increasing the usability of the multimedia applications on mobile devices; Enhancing the quality of user experience with the digital collections.

Place, publisher, year, edition, pages
ACM Press, 2012
Keywords
3D gestural interaction, 3D visualization, motion analysis, photo browsing, quality of experience
National Category
Human Computer Interaction
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40978 (URN)10.1145/2393347.2396503 (DOI)978-1-4503-1089-5 (ISBN)
Conference
20th ACM International Conference on Multimedia, MM 2012, 29 October 2012 through 2 November 2012, Nara
Available from: 2013-01-02 Created: 2015-03-18 Last updated: 2018-01-11Bibliographically approved
3. Bare-hand Gesture Recognition and Tracking through the Large-scale Image Retrieval
Open this publication in new window or tab >>Bare-hand Gesture Recognition and Tracking through the Large-scale Image Retrieval
2014 (English)Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
SciTePress, 2014
National Category
Signal Processing
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40983 (URN)
Conference
9th International Conference on Computer Vision Theory and Applications (VISAPP)
Note

NQC 2014

Available from: 2014-02-25 Created: 2015-03-18 Last updated: 2017-04-19Bibliographically approved
4. Interactive 3D Visualization on a 4K Wall-Sized Display
Open this publication in new window or tab >>Interactive 3D Visualization on a 4K Wall-Sized Display
2014 (English)In: Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA), 2014, p. 1-4Conference paper, Published paper (Refereed)
Keywords
computer vision;data visualisation;human computer interaction;image capture;motion measurement;object tracking;screens (display);three-dimensional displays;video cameras;video signal processing;2D screen;3D motion parameter retrieval;3D space;4K wall sized display;digital window;head mounted camera;interactive 3D display;interactive 3D visualization;motion capture system;real-time 3D interaction;user head motion measurement;user head motion tracking;video frame capture;vision-based approach;Cameras;Head;Three-dimensional displays;Tracking;Transmission line matrix methods;Visualization
National Category
Signal Processing
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40991 (URN)10.1109/APSIPA.2014.7041653 (DOI)
Conference
International Conference on Image Processing (ICIP 2014)
Note

NQC 2014

Available from: 2014-02-25 Created: 2015-03-18 Last updated: 2017-04-19Bibliographically approved
5. 3D Visualization of Single Images using Patch Level Depth
Open this publication in new window or tab >>3D Visualization of Single Images using Patch Level Depth
2011 (English)In: Signal Processing and Multimedia Applications (SIGMAP), 2011 Proceedings of the International Conference on, IEEE Press, 2011, p. 61-66Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we consider the task of 3D photo visualization using a single monocular image. The main idea is to use single photos taken by capturing devices such as ordinary cameras, mobile phones, tablet PCs etc. and visualize them in 3D on normal displays. Supervised learning approach is hired to retrieve depth information from single images. This algorithm is based on the hierarchical multi-scale Markov Random Field (MRF) which models the depth based on the multi-scale global and local features and relation between them in a monocular image. Consequently, the estimated depth image is used to allocate the specified depth parameters for each pixel in the 3D map. Accordingly, the multi-level depth adjustments and coding for color anaglyphs is performed. Our system receives a single 2D image as input and provides a anaglyph coded 3D image in output. Depending on the coding technology the special low-cost anaglyph glasses for viewers will be used.

Place, publisher, year, edition, pages
IEEE Press, 2011
Keywords
Cameras, Glass, Image color analysis, Stereo image processing, Three-dimensional displays, Vectors, Visualization, 3D Visualization, Color Anaglyph, Depth Map, MRF, Monocular Image
National Category
Signal Processing
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40980 (URN)
Conference
International Conference on Signal Processing and Multimedia Applications, 18-21 July, 2011, Seville, Spain
Note

QC 20140226

Available from: 2014-02-25 Created: 2015-03-18 Last updated: 2017-04-19Bibliographically approved
6. Stereoscopic visualization of monocular images in photo collections
Open this publication in new window or tab >>Stereoscopic visualization of monocular images in photo collections
2011 (English)In: Wireless Communications and Signal Processing (WCSP), 2011 International Conference on, IEEE Press, 2011, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we propose a novel approach for 3D video/photo visualization using an ordinary digital camera. The idea is to turn any 2D camera into 3D based on the data derived from a collection of captured photos or a recorded video. For a given monocular input, the retrieved information from the overlapping photos can be used to provide required information for performing 3D output. Robust feature detection and matching between images is hired to find the transformation between overlapping frames. The transformation matrix will map images to the same horizontal baseline. Afterwards, the projected images will be adjusted to the stereoscopic model. Finally, stereo views will be coded into 3D channels for visualization. This approach enables us making 3D output using randomly taken photos of a scene or a recorded video. Our system receives 2D monocular input and provides double layer coded 3D output. Depending on the coding technology different low cost 3D glasses will be used for viewers.

Place, publisher, year, edition, pages
IEEE Press, 2011
Keywords
cameras, feature extraction, image matching, matrix algebra, stereo image processing, video coding, video retrieval, 3D channel, 3D glasses, 3D video-photo visualization, coding technology, digital camera, feature detection, information retrieval, monocular images, overlapping frames, overlapping photos, photo collections, stereoscopic visualization, transformation matrix, Image color analysis, Robustness, Three dimensional displays, Visualization
National Category
Signal Processing
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40995 (URN)10.1109/WCSP.2011.6096688 (DOI)2-s2.0-84555194972 (Scopus ID)978-1-4577-1008-7 (ISBN)
Conference
WCSP 2011, 9-11 Nov 2011, Nanjing
Note

QC 20140226

Available from: 2014-02-25 Created: 2015-03-18 Last updated: 2017-04-19Bibliographically approved
7. Robust correction of 3D geo-metadata in photo collections by forming a photo grid
Open this publication in new window or tab >>Robust correction of 3D geo-metadata in photo collections by forming a photo grid
2011 (English)In: WCSP2011: IEEE International Conference on Wireless Communications and Signal Processing, IEEE Press, 2011, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we present a technique for efficient and robust estimation of the exact location and orientation of a photo capture device in a large data set. The provided data set includes a set of photos and the associated information from GPS and orientation sensor. This attached metadata is noisy and lacks precision. Our strategy to correct this uncertain data is based on the data fusion between measurement model, derived from sensor data, and signal model given by the computer vision algorithms. Based on the retrieved information from multiple views of a scene we make a grid of images. Our robust feature detection and matching between images result in finding a reliable transformation. Consequently, relative location and orientation of the data set construct the signal model. On the other hand, information extracted from the single images combined with the measurement data make the measurement model. Finally, Kalman filter is used to fuse these two models iteratively and enhance the estimation of the ground truth(GT) location and orientation. Practically, this approach can help us to design a photo browsing system from a huge collection of photos, enabling 3D navigation and exploration of our huge data set.

Place, publisher, year, edition, pages
IEEE Press, 2011
National Category
Media Engineering
Research subject
Computer and Information Sciences Computer Science, Media Technology
Identifiers
urn:nbn:se:lnu:diva-40994 (URN)10.1109/WCSP.2011.6096689 (DOI)978-1-4577-1008-7 (ISBN)
Conference
IEEE International Conference on Wireless Communications and Signal Processing (WCSP2011), Nanjing, China, 9-11 November 2011
Available from: 2012-03-02 Created: 2015-03-18 Last updated: 2017-04-19Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

fulltext

Authority records BETA

Yousefi, Shahrouz

Search in DiVA

By author/editor
Yousefi, Shahrouz
Media and Communication Technology

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 170 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf