Articulated body pose estimation
Articulated body pose estimation, in computer vision, is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.[1][2]
Description
There is a need to develop accurate tether-less, vision-based articulated body pose estimation systems to recover the pose of bodies such as the human body, a hand, or non-human creatures. Such a system have several foreseeable applications, including
- Marker-less motion capture for human-computer interfaces,
- Physiotherapy,
- 3D animation,
- Ergonomics studies,
- Robot control, and
- Visual surveillance.
One of the major difficulties in recovering pose from images is the high number of degrees-of-freedom (DOF) in the body's movement that has to be recovered. Any rigid object requires six DOF to fully describe its pose, and each additional rigid object connected to it adds at least one DOF. A human body contains no less than 10 large body parts, equating to more than 20 DOF. This difficulty is compounded by the problem of self-occlusion, where body parts occlude each other depending on the configuration of the parts. Other challenges involve dealing with varying lighting, which affect appearance; varying subject attire or body type; required camera configuration; and required computation time.
The typical articulated body pose estimation system involves a model-based approach, in which an observation is made and provided as input to the model to generate pose estimates. Different kinds of sensors have been explored for use in making the observation, including
- Visible wavelength imagery,
- Long-wave thermal infrared imagery,
- Time-of-flight imagery, and
- Laser range scanner imagery.
These sensors produce intermediate representations that is directly used by the model; the representations include
- Image appearance,
- Voxel (volume element) reconstruction,
- 3D surface point clouds, and
- 3D surface meshes.
Related technology
A commercially successful but specialized computer vision-based articulated body pose estimation technique is optical motion capture. This approach involves placing markers on the individual at strategic locations to capture the 6 degrees-of-freedom of each body part.
Active Research Groups
A number of groups are actively pursuing this topic, including groups in Brown University; Carnegie Mellon University; MPI Saarbruecken; Stanford University; the University of California, San Diego; the Univeresity of Toronto; and the Ecole Centrale de Paris.
References
External links
- Michael J. Black, Professor at Brown University
- Research Project Page of German Cheung at Carnegie Mellon University
- Homepage of Dr.-Ing at MPI Saarbruecken
- Markerless Motion Capture Project at Stanford
- Computer Vision and Robotics Research Laboratory at the University of California, San Diego
- Research Projects of David J. Fleet at the Univeresity of Toronto
- Ronald Poppe at the University of Twente.
- Professor Nikos Paragios at the Ecole Centrale de Paris