Examine individual changes
Appearance
This page allows you to examine the variables generated by the Edit Filter for an individual change.
Variables generated for this change
Variable | Value |
---|---|
Whether or not the edit is marked as minor (no longer in use) (minor_edit ) | false |
Name of the user account (user_name ) | 'Matthewcvbec' |
Whether or not a user is editing through the mobile interface (user_mobile ) | false |
Page ID (page_id ) | 13274389 |
Page namespace (page_namespace ) | 0 |
Page title without namespace (page_title ) | 'Articulated body pose estimation' |
Full page title (page_prefixedtitle ) | 'Articulated body pose estimation' |
Action (action ) | 'edit' |
Edit summary/reason (summary ) | '/* Articulated Model M. Ding and G. Fan, "Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation" IEEE Transactions on Image Processing, Vol. 25, No. 2, Feb 2016 */ ' |
Old content model (old_content_model ) | 'wikitext' |
New content model (new_content_model ) | 'wikitext' |
Old page wikitext, before the edit (old_wikitext ) | ''''Articulated body pose estimation''', in [[computer vision]], is the study of [[algorithm]]s and systems that recover the pose of an articulated body, which consists of [[kinematic pair|joints]] and [[rigid body|rigid]] parts using image-based observations. It is one of longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.<ref>[http://citeseer.ist.psu.edu/moeslund01survey.html Survey of Computer Vision-Based Human Motion Capture (2001)]</ref><ref>[http://www.sciencedirect.com/science/article/B6WCX-4M1DB7H-1/2/8da6f6e7a8c8e07d9331bc7738c6d499 Survey of Advances in Computer Vision-based Human Motion Capture (2006)]</ref>
==Description==
Perception of human beings in neighboring environment is an important capability Robots must possess. If a person uses gestures to point to a particular object, then the interacting machine should be able to understand the situation in real world context. Hence pose estimation is an important and challenging problem in computer vision and many algorithms have been deployed in solving this problem over last two decades. Many state of the art models have been developed to solve this problem, models are first trained using large data sets and then used in real world application.
Pose estimation is a difficult problem and still a subject of research because the human body has 244 degrees of freedom with 230 joints. Not all movements between joints are evident, but the human body is composed of ten large parts with 20 degrees of freedom. Moreover, appearance differences due to clothing, body shape, scale (e.g. tall, fat, short body), and hairstyles cause large variability, and an algorithm must account for that. Partial occlusions due to self-articulation (hand covering the face or the frontal portion of the body) or occlusions due to external objects may cause ambiguities in the results obtained. Also, most of the algorithms estimate pose from two-dimensional images, the ones captured from the normal camera. These images lack the three-dimensional information of an actual body pose. There is recent work in this area wherein images from RGBD cameras provide information about color and depth.<ref>Droeschel, David, and Sven Behnke. "3D body pose estimation using an adaptive person model for articulated ICP." Intelligent Robotics and Applications. Springer Berlin Heidelberg, 2011. 157167.</ref>
There is a need to develop accurately[[tether]]-less, vision-based articulated body pose estimation systems to recover the pose of bodies such as the human body, a hand, or non-human creatures. Such a system have several foreseeable applications, including
* Marker-less [[motion capture]] for human-computer interfaces,
* [[Physiotherapy]],
* [[3D animation]],
* [[Ergonomics]] studies,
* [[Robot]] control, and
* Visual surveillance.
One of the major difficulties in recovering pose from images is the high number of degrees-of-freedom (DOF) in the body's movement that has to be recovered. Any rigid object requires six DOF to fully describe its pose, and each additional rigid object connected to it adds at least one DOF. A human body contains no less than 10 large body parts, equating to more than 20 DOF. This difficulty is compounded by the problem of self-occlusion, where body parts [[wikt:occlusion|occlude]] each other depending on the configuration of the parts. Other challenges involve dealing with varying lighting, which affect appearance; varying subject attire or body type; required camera configuration; and required computation time.
The typical articulated body pose estimation system involves a model-based approach, in which the pose estimation is achieved by maximizing/minimizing a similarity/dissimilarity between an observation (input) and a template model. Different kinds of sensors have been explored for use in making the observation, including
* Visible wavelength imagery,
* [[Long-wave]] thermal [[infrared]] imagery,<ref name="han13humanpose">{{cite book| author=Han, J.| author2=Gaszczak, A.| author3=Maciol, R.| author4=Barnes, S.E.| author5=Breckon, T.P.| chapter=Human Pose Classification within the Context of Near-IR Imagery Tracking| title=Proc. SPIE Optics and Photonics for Counterterrorism, Crime Fighting and Defence|date=September 2013| volume=8901| number=E| pages=1–10| publisher=SPIE| doi=10.1117/12.2028375| url=http://www.durham.ac.uk/toby.breckon/publications/papers/han13humanpose.pdf| accessdate=5 November 2013}}</ref>
* [[TOF-camera|Time-of-flight]] imagery, and
* [[Laser range scanner]] imagery.
These sensors produce intermediate representations that is directly used by the model; the representations include
* Image appearance,
* Voxel (volume element) reconstruction,
* 3D point clouds, and sum of Gaussian kernels<ref>M. Ding and G. Fan, "Generalized Sum of Gaussians for Real-Time Human Pose Tracking from a Single Depth Sensor" 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan 2015</ref>
* 3D surface meshes.
==Part Models==
The basic idea of part based model can be attributed to human skeleton. Any object having property of articulation can be disintegrated into smaller parts wherein each part can take different orientation resulting in different articulations of the same object. Hence different scales and orientation of the main object can be articulated to scales and orientation of the corresponding parts. To formulate the model so that it can be represented in mathematical terms, the parts are connected to each other using springs. Hence the model is also known as Spring model. The degree of closeness between each part is accounted by the compression and expansion of the springs. There is geometric constrain on the orientation of
springs. For example, limbs of legs cannot move 360 degrees. Hence parts cannot have that extreme orientation. This reduces the possible permutations.<ref>Fischler, Martin A., and Robert A. Elschlager. "The representation and matching of pictorial structures." IEEE Transactions on computers 1 (1973): 6792.</ref>
The spring model forms a graph G(V,E) where V (nodes) corresponds to the parts and E (edges) represent springs connecting two neighboring parts. Each location in image can be reached by the <math>x</math> and <math>y</math> coordinates of the pixel location. Let <math>\mathbf{p}_{i}(x, \, y)</math> be point at <math>\mathbf{i}^{th}</math> location. Then the cost associated in joining the spring between <math>\mathbf{i}^{th}</math> and the <math>\mathbf{j}^{th}</math> point can be given by <math>S(\mathbf{p}_{i},\,\mathbf{p}_{j}) = S(\mathbf{p}_{i} - \mathbf{p}_{j})</math>. Hence the
total cost associated in placing <math>l</math> components at locations <math>\mathbf{P}_{l}</math> is given by
:<math>
S(\mathbf{P}_{l}) = \displaystyle\sum_{i=1}^{l} \; \displaystyle\sum_{j=1}^{i} \; \mathbf{s}_{ij}(\mathbf{p}_{i},\,\mathbf{p}_{j})
</math>
The above equation simply represents the spring model used to describe body pose. To estimate pose from images cost or energy function must be minimized. This energy function consists of two terms, one related to how each component matches image data and second deals with how much do the
oriented(deformed) parts match thus accounting for articulation along with object detection.<ref>Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Pictorial structures for object recognition." International Journal of Computer Vision 61.1 (2005): 5579.</ref>
The part models also known as pictorial structures is one the basic models on which other efficient models are built by slight modification. One such example is flexible mixture model which reduces the database of hundreds or thousands of deformed parts by exploiting the notion of local rigidity.<ref>Yang, Yi, and Deva Ramanan. "Articulated pose estimation with flexible mixturesofparts." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.</ref>
==Articulated Model <ref>M. Ding and G. Fan, "Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation" IEEE Transactions on Image Processing, Vol. 25, No. 2, Feb 2016</ref>==
The kinematic skeleton is constructed by a tree-structured chain, as illustrated in the Figure. Each rigid body segment has its local coordinate system that can be transformed to the world coordinate system via a 4×4 transformation matrix <math>T_l </math>,
:<math>
T_{l} = T_{par(l)}R_{l},
</math>
where <math>R_l</math> denotes the local transformation from body segment <math>S_l</math> to its parent <math>par(S_l)</math>. Each joint in the body has 3 degrees of freedom (DoF) rotation. If <math>l</math> is the root joint (the hip joint), <math>T_{root}</math> is the global transformation of the whole body. Given a transformation matrix <math>T_l</math> , the joint position at the T-pose can be transferred to its corresponding position in the world coordination. In many works <ref>M. Ding and G. Fan, "Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation" IEEE Transactions on Image Processing, Vol. 25, No. 2, Feb 2016</ref>, the 3D joint rotation is expressed as a normalized quaternion <math>[x,y,z,w]</math> due to its continuity that can facilitate gradient-based optimization in the parameter estimation.
==Applications==
===Assisted Living===
For old age homes, nursing homes soon personal care robots will be deployed in near future. Human detection, pose estimation with high accuracy is a requirement. Also the models used should not be memory intensive.
===Movies, 3D TV and animation===
Avatar animation is one such example where human pose helps in developing animation.
===Intelligent driver assisting system===
Car accidents account for 2.2% of death globally each year. Hence an intelligent system tracking drivers pose constantly will be of great use as emergency alerts can Successful pedestrian detection algorithms have proved to be useful for autonomous cars. Along with real time detection if nature or behavior of pedestrians can be predicted beforehand, the decision capability of driver assisting system would increase.
===Other Applications===
Other applications include Physical therapy, study of cognitive brain development of young children, video surveillance, animal tracking and behavior understanding to preserve endangered species, sign language detection,advanced human computer interaction, marker less motion capturing.
==Sensors Used==
Marker based systems, Laser range scanners, Structured Light, Time of flight sensors, MicrosoftKinect
sensor, Multi camera Systems.
==Related technology==
A commercially successful but specialized computer vision-based articulated body [[3D Pose Estimation|pose estimation]] technique is optical [[motion capture]]. This approach involves placing markers on the individual at strategic locations to capture the 6 degrees-of-freedom of each body part.
==Active Research Groups==
A number of groups are actively pursuing this topic, including groups in Brown University; Carnegie Mellon University; MPI Saarbruecken; Stanford University; the University of California, San Diego; the University of Toronto; the Ecole Centrale de Paris; ETH Zurich; National University of Sciences and Technology (NUST)<ref>http://rise.smme.nust.edu.pk/</ref> and UC Irvine;
==Implementation Links==
* http://groups.inf.ed.ac.uk/calvin/articulated_human_pose_estimation_code/
* http://www.ics.uci.edu/~dramanan/software/pose/
==References==
{{Reflist}}
==External links==
*[http://www.cs.brown.edu/people/black/ Michael J. Black, Professor at Brown University]
*[http://www.cs.cmu.edu/~german/research/HumanApp/humanapp.html Research Project Page of German Cheung at Carnegie Mellon University]
*[https://web.archive.org/web/20070612082024/http://www.mpi-inf.mpg.de/%7Erosenhahn/ Homepage of Dr.-Ing at MPI Saarbruecken]
*[https://web.archive.org/web/20070308123012/http://www.stanford.edu/group/biomotion/currentprojects_Markerless.html Markerless Motion Capture Project at Stanford]
*[http://cvrr.ucsd.edu Computer Vision and Robotics Research Laboratory at the University of California, San Diego]
*[http://www.cs.toronto.edu/~fleet/research/projects.html Research Projects of David J. Fleet at the University of Toronto]
*[http://hmi.ewi.utwente.nl/person/Ronald%20Poppe Ronald Poppe at the University of Twente].
*[http://vision.mas.ecp.fr Professor [[Nikos Paragios]] at the Ecole Centrale de Paris]
*[http://www.ics.uci.edu/~dramanan/software/pose Articulated Pose Estimation with Flexible Mixtures of Parts Project at UC Irvine]
*[http://ttic.uchicago.edu/~gregory/thesis/thesisChapter4.pdf http://screenrant.com/crazy3dtechnologyjamescameronavatarkofi3367/]
[[Category:Computer vision]]' |
New page wikitext, after the edit (new_wikitext ) | ''''Articulated body pose estimation''', in [[computer vision]], is the study of [[algorithm]]s and systems that recover the pose of an articulated body, which consists of [[kinematic pair|joints]] and [[rigid body|rigid]] parts using image-based observations. It is one of longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.<ref>[http://citeseer.ist.psu.edu/moeslund01survey.html Survey of Computer Vision-Based Human Motion Capture (2001)]</ref><ref>[http://www.sciencedirect.com/science/article/B6WCX-4M1DB7H-1/2/8da6f6e7a8c8e07d9331bc7738c6d499 Survey of Advances in Computer Vision-based Human Motion Capture (2006)]</ref>
==Description==
Perception of human beings in neighboring environment is an important capability Robots must possess. If a person uses gestures to point to a particular object, then the interacting machine should be able to understand the situation in real world context. Hence pose estimation is an important and challenging problem in computer vision and many algorithms have been deployed in solving this problem over last two decades. Many state of the art models have been developed to solve this problem, models are first trained using large data sets and then used in real world application.
Pose estimation is a difficult problem and still a subject of research because the human body has 244 degrees of freedom with 230 joints. Not all movements between joints are evident, but the human body is composed of ten large parts with 20 degrees of freedom. Moreover, appearance differences due to clothing, body shape, scale (e.g. tall, fat, short body), and hairstyles cause large variability, and an algorithm must account for that. Partial occlusions due to self-articulation (hand covering the face or the frontal portion of the body) or occlusions due to external objects may cause ambiguities in the results obtained. Also, most of the algorithms estimate pose from two-dimensional images, the ones captured from the normal camera. These images lack the three-dimensional information of an actual body pose. There is recent work in this area wherein images from RGBD cameras provide information about color and depth.<ref>Droeschel, David, and Sven Behnke. "3D body pose estimation using an adaptive person model for articulated ICP." Intelligent Robotics and Applications. Springer Berlin Heidelberg, 2011. 157167.</ref>
There is a need to develop accurately[[tether]]-less, vision-based articulated body pose estimation systems to recover the pose of bodies such as the human body, a hand, or non-human creatures. Such a system have several foreseeable applications, including
* Marker-less [[motion capture]] for human-computer interfaces,
* [[Physiotherapy]],
* [[3D animation]],
* [[Ergonomics]] studies,
* [[Robot]] control, and
* Visual surveillance.
One of the major difficulties in recovering pose from images is the high number of degrees-of-freedom (DOF) in the body's movement that has to be recovered. Any rigid object requires six DOF to fully describe its pose, and each additional rigid object connected to it adds at least one DOF. A human body contains no less than 10 large body parts, equating to more than 20 DOF. This difficulty is compounded by the problem of self-occlusion, where body parts [[wikt:occlusion|occlude]] each other depending on the configuration of the parts. Other challenges involve dealing with varying lighting, which affect appearance; varying subject attire or body type; required camera configuration; and required computation time.
The typical articulated body pose estimation system involves a model-based approach, in which the pose estimation is achieved by maximizing/minimizing a similarity/dissimilarity between an observation (input) and a template model. Different kinds of sensors have been explored for use in making the observation, including
* Visible wavelength imagery,
* [[Long-wave]] thermal [[infrared]] imagery,<ref name="han13humanpose">{{cite book| author=Han, J.| author2=Gaszczak, A.| author3=Maciol, R.| author4=Barnes, S.E.| author5=Breckon, T.P.| chapter=Human Pose Classification within the Context of Near-IR Imagery Tracking| title=Proc. SPIE Optics and Photonics for Counterterrorism, Crime Fighting and Defence|date=September 2013| volume=8901| number=E| pages=1–10| publisher=SPIE| doi=10.1117/12.2028375| url=http://www.durham.ac.uk/toby.breckon/publications/papers/han13humanpose.pdf| accessdate=5 November 2013}}</ref>
* [[TOF-camera|Time-of-flight]] imagery, and
* [[Laser range scanner]] imagery.
These sensors produce intermediate representations that is directly used by the model; the representations include
* Image appearance,
* Voxel (volume element) reconstruction,
* 3D point clouds, and sum of Gaussian kernels<ref>M. Ding and G. Fan, "Generalized Sum of Gaussians for Real-Time Human Pose Tracking from a Single Depth Sensor" 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan 2015</ref>
* 3D surface meshes.
==Part Models==
The basic idea of part based model can be attributed to human skeleton. Any object having property of articulation can be disintegrated into smaller parts wherein each part can take different orientation resulting in different articulations of the same object. Hence different scales and orientation of the main object can be articulated to scales and orientation of the corresponding parts. To formulate the model so that it can be represented in mathematical terms, the parts are connected to each other using springs. Hence the model is also known as Spring model. The degree of closeness between each part is accounted by the compression and expansion of the springs. There is geometric constrain on the orientation of
springs. For example, limbs of legs cannot move 360 degrees. Hence parts cannot have that extreme orientation. This reduces the possible permutations.<ref>Fischler, Martin A., and Robert A. Elschlager. "The representation and matching of pictorial structures." IEEE Transactions on computers 1 (1973): 6792.</ref>
The spring model forms a graph G(V,E) where V (nodes) corresponds to the parts and E (edges) represent springs connecting two neighboring parts. Each location in image can be reached by the <math>x</math> and <math>y</math> coordinates of the pixel location. Let <math>\mathbf{p}_{i}(x, \, y)</math> be point at <math>\mathbf{i}^{th}</math> location. Then the cost associated in joining the spring between <math>\mathbf{i}^{th}</math> and the <math>\mathbf{j}^{th}</math> point can be given by <math>S(\mathbf{p}_{i},\,\mathbf{p}_{j}) = S(\mathbf{p}_{i} - \mathbf{p}_{j})</math>. Hence the
total cost associated in placing <math>l</math> components at locations <math>\mathbf{P}_{l}</math> is given by
:<math>
S(\mathbf{P}_{l}) = \displaystyle\sum_{i=1}^{l} \; \displaystyle\sum_{j=1}^{i} \; \mathbf{s}_{ij}(\mathbf{p}_{i},\,\mathbf{p}_{j})
</math>
The above equation simply represents the spring model used to describe body pose. To estimate pose from images cost or energy function must be minimized. This energy function consists of two terms, one related to how each component matches image data and second deals with how much do the
oriented(deformed) parts match thus accounting for articulation along with object detection.<ref>Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Pictorial structures for object recognition." International Journal of Computer Vision 61.1 (2005): 5579.</ref>
The part models also known as pictorial structures is one the basic models on which other efficient models are built by slight modification. One such example is flexible mixture model which reduces the database of hundreds or thousands of deformed parts by exploiting the notion of local rigidity.<ref>Yang, Yi, and Deva Ramanan. "Articulated pose estimation with flexible mixturesofparts." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.</ref>
==Articulated Model <ref>M. Ding and G. Fan, "Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation" IEEE Transactions on Image Processing, Vol. 25, No. 2, Feb 2016</ref>==
The kinematic skeleton is constructed by a tree-structured chain, as illustrated in the Figure. Each rigid body segment has its local coordinate system that can be transformed to the world coordinate system via a 4×4 transformation matrix <math>T_l </math>,
:<math>
T_{l} = T_{par(l)}R_{l},
</math>
where <math>R_l</math> denotes the local transformation from body segment <math>S_l</math> to its parent <math>par(S_l)</math>. Each joint in the body has 3 degrees of freedom (DoF) rotation. Given a transformation matrix <math>T_l</math> , the joint position at the T-pose can be transferred to its corresponding position in the world coordination. In many works, the 3D joint rotation is expressed as a normalized quaternion <math>[x,y,z,w]</math> due to its continuity that can facilitate gradient-based optimization in the parameter estimation.
==Applications==
===Assisted Living===
For old age homes, nursing homes soon personal care robots will be deployed in near future. Human detection, pose estimation with high accuracy is a requirement. Also the models used should not be memory intensive.
===Movies, 3D TV and animation===
Avatar animation is one such example where human pose helps in developing animation.
===Intelligent driver assisting system===
Car accidents account for 2.2% of death globally each year. Hence an intelligent system tracking drivers pose constantly will be of great use as emergency alerts can Successful pedestrian detection algorithms have proved to be useful for autonomous cars. Along with real time detection if nature or behavior of pedestrians can be predicted beforehand, the decision capability of driver assisting system would increase.
===Other Applications===
Other applications include Physical therapy, study of cognitive brain development of young children, video surveillance, animal tracking and behavior understanding to preserve endangered species, sign language detection,advanced human computer interaction, marker less motion capturing.
==Sensors Used==
Marker based systems, Laser range scanners, Structured Light, Time of flight sensors, MicrosoftKinect
sensor, Multi camera Systems.
==Related technology==
A commercially successful but specialized computer vision-based articulated body [[3D Pose Estimation|pose estimation]] technique is optical [[motion capture]]. This approach involves placing markers on the individual at strategic locations to capture the 6 degrees-of-freedom of each body part.
==Active Research Groups==
A number of groups are actively pursuing this topic, including groups in Brown University; Carnegie Mellon University; MPI Saarbruecken; Stanford University; the University of California, San Diego; the University of Toronto; the Ecole Centrale de Paris; ETH Zurich; National University of Sciences and Technology (NUST)<ref>http://rise.smme.nust.edu.pk/</ref> and UC Irvine;
==Implementation Links==
* http://groups.inf.ed.ac.uk/calvin/articulated_human_pose_estimation_code/
* http://www.ics.uci.edu/~dramanan/software/pose/
==References==
{{Reflist}}
==External links==
*[http://www.cs.brown.edu/people/black/ Michael J. Black, Professor at Brown University]
*[http://www.cs.cmu.edu/~german/research/HumanApp/humanapp.html Research Project Page of German Cheung at Carnegie Mellon University]
*[https://web.archive.org/web/20070612082024/http://www.mpi-inf.mpg.de/%7Erosenhahn/ Homepage of Dr.-Ing at MPI Saarbruecken]
*[https://web.archive.org/web/20070308123012/http://www.stanford.edu/group/biomotion/currentprojects_Markerless.html Markerless Motion Capture Project at Stanford]
*[http://cvrr.ucsd.edu Computer Vision and Robotics Research Laboratory at the University of California, San Diego]
*[http://www.cs.toronto.edu/~fleet/research/projects.html Research Projects of David J. Fleet at the University of Toronto]
*[http://hmi.ewi.utwente.nl/person/Ronald%20Poppe Ronald Poppe at the University of Twente].
*[http://vision.mas.ecp.fr Professor [[Nikos Paragios]] at the Ecole Centrale de Paris]
*[http://www.ics.uci.edu/~dramanan/software/pose Articulated Pose Estimation with Flexible Mixtures of Parts Project at UC Irvine]
*[http://ttic.uchicago.edu/~gregory/thesis/thesisChapter4.pdf http://screenrant.com/crazy3dtechnologyjamescameronavatarkofi3367/]
[[Category:Computer vision]]' |
Whether or not the change was made through a Tor exit node (tor_exit_node ) | 0 |
Unix timestamp of change (timestamp ) | 1482162565 |