Jump to content

Haar-like feature: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m typo
mNo edit summary
 
(32 intermediate revisions by 20 users not shown)
Line 1: Line 1:
'''Haar-like features''' are [[digital image]] [[feature (computer vision)|feature]]s used in [[object recognition]]. They owe their name to their intuitive similarity with [[Haar wavelet]]s and were used in the first real-time face detector.<ref name ="Viola 2001">Viola and Jones, "Rapid object detection using a boosted cascade of simple features", Computer Vision and [[Pattern Recognition]], 2001</ref>
'''Haar-like features''' are [[digital image]] [[feature (computer vision)|feature]]s used in [[object recognition]]. They owe their name to their intuitive similarity with [[Haar wavelet]]s and were used in the first real-time face detector.<ref name="Viola 2001">Viola and Jones, "[https://www.merl.com/publications/docs/TR2004-043.pdf Rapid object detection using a boosted cascade of simple features]", Computer Vision and [[Pattern Recognition]], 2001</ref>


Historically, working with only image intensities (i.e., the [[RGB]] [[pixel]] values at each and every pixel of image) made the task of feature calculation [[computationally expensive]]. A publication by Papageorgiou et al.<ref>Papageorgiou, Oren and Poggio, "A general framework for object detection", International Conference on Computer Vision, 1998.</ref> discussed working with an alternate feature set based on Haar wavelets instead of the usual image intensities. Viola and Jones<ref name = "Viola 2001"/> adapted the idea of using Haar wavelets and developed the so-called Haar-like features. A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums. This difference is then used to [[categorization|categorize]] subsections of an image.
Working with only image intensities (i.e., the [[RGB]] [[pixel]] values at each and every pixel of image) made the task of feature calculation [[computationally expensive]]. A publication by Papageorgiou et al.<ref>Papageorgiou, Oren and Poggio, "A general framework for object detection", International Conference on Computer Vision, 1998.</ref> discussed working with an alternate feature set based on Haar wavelets instead of the usual image intensities. [[Paul Viola]] and [[Michael_Jones_(scientist)|Michael Jones]]<ref name = "Viola 2001"/> adapted the idea of using Haar wavelets and developed the so-called Haar-like features. A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums. This difference is then used to [[categorization|categorize]] subsections of an image.
For example, let us say we have an image database with human [[face]]s. It is a common observation that among all faces the region of the eyes is darker than the region of the cheeks. Therefore a common Haar feature for face detection is a set of two adjacent rectangles that lie above the eye and the cheek region. The position of these rectangles is defined relative to a detection window that acts like a bounding box to the target object (the face in this case).
For example, with a human face, it is a common observation that among all faces the region of the eyes is darker than the region of the cheeks. Therefore, a common Haar feature for face detection is a set of two adjacent rectangles that lie above the eye and the cheek region. The position of these rectangles is defined relative to a detection window that acts like a bounding box to the target object (the face in this case).


In the detection phase of the [[Viola–Jones object detection framework]], a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated. This difference is then compared to a learned threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features are necessary to describe an object with sufficient accuracy. In the Viola–Jones object detection framework, the Haar-like features are therefore organized in something called a ''classifier cascade'' to form a strong learner or classifier.
In the detection phase of the [[Viola–Jones object detection framework]], a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated. This difference is then compared to a learned threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features are necessary to describe an object with sufficient accuracy. In the Viola–Jones object detection framework, the Haar-like features are therefore organized in something called a ''classifier cascade'' to form a strong learner or classifier.
Line 9: Line 9:


== Rectangular Haar-like features ==
== Rectangular Haar-like features ==
[[File:VJ featureTypes.svg|thumb|An example of early Haar-like features used by Viola and Jones in 2001.]]


A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of areas inside the rectangle, which can be at any position and scale within the original image. This modified feature set is called ''2-rectangle feature''. Viola and Jones also defined 3-rectangle features and 4-rectangle features. The values indicate certain characteristics of a particular area of the image. Each feature type can indicate the existence (or absence) of certain characteristics in the image, such as edges or changes in texture. For example, a 2-rectangle feature can indicate where the border lies between a dark region and a light region.
A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of areas inside the rectangle, which can be at any position and scale within the original image. This modified feature set is called ''2-rectangle feature''. Viola and Jones also defined 3-rectangle features and 4-rectangle features. The values indicate certain characteristics of a particular area of the image. Each feature type can indicate the existence (or absence) of certain characteristics in the image, such as edges or changes in texture. For example, a 2-rectangle feature can indicate where the border lies between a dark region and a light region.


== Fast computation of Haar-like features ==
== Fast computation of Haar-like features ==

One of the contributions of Viola and Jones was to use [[summed area table]]s,<ref>Crow, F, "Summed-area tables for texture mapping", in Proceedings of [[SIGGRAPH]], 18(3):207–212, 1984</ref> which they called ''[[integral image]]s''. Integral images can be defined as two-dimensional [[lookup table]]s in the form of a matrix with the same size of the original image. Each element of the integral image contains the sum of all pixels located on the up-left region of the original image (in relation to the element's position). This allows to compute sum of rectangular areas in the image, at any position or scale, using only four lookups:
[[Image:Prm VJ fig3 computeRectangleWithAlpha.png|thumb|right|Finding the sum of the shaded rectangular area]]
[[Image:Prm VJ fig3 computeRectangleWithAlpha.png|thumb|right|Finding the sum of the shaded rectangular area]]

One of the contributions of Viola and Jones was to use [[summed-area table]]s,<ref>Crow, F, "[https://classes.soe.ucsc.edu/cmps160/Fall05/papers/p207-crow.pdf Summed-area tables for texture mapping] {{Webarchive|url=https://web.archive.org/web/20170808221656/https://classes.soe.ucsc.edu/cmps160/Fall05/papers/p207-crow.pdf |date=2017-08-08 }}", in Proceedings of [[SIGGRAPH]], 18(3):207–212, 1984</ref> which they called ''[[integral image]]s''. Integral images can be defined as two-dimensional [[lookup table]]s in the form of a matrix with the same size of the original image. Each element of the integral image contains the sum of all pixels located on the up-left region of the original image (in relation to the element's position). This allows to compute sum of rectangular areas in the image, at any position or scale, using only four lookups:


: <math> \text{sum} = I(C) + I(A) - I(B) - I(D). \, </math>
: <math> \text{sum} = I(C) + I(A) - I(B) - I(D). \, </math>
Line 24: Line 25:


== Tilted Haar-like features ==
== Tilted Haar-like features ==
[[File:Haar features Lienhart.svg|thumb|lang=en|alt=Illustration of Haar-like features proposed by Lienhart: 4 edge features, 8 line features, and 2 center-surround features|The extension proposed by Lienhart and Maydt<ref name=Lienhart/>]]
Lienhart and Maydt<ref>Lienhart, R. and Maydt, J., "An extended set of Haar-like features for rapid object detection", ICIP02, pp. I: 900–903, 2002</ref> introduced the concept of a tilted (45°) Haar-like feature. This was used to increase the [[dimensionality]] of the set of features in an attempt to improve the detection of objects in images. This was successful, as some of these features are able to describe the object in a better way. For example, a 2-rectangle tilted Haar-like feature can indicate the existence of an edge at 45°.


Lienhart and Maydt<ref name=Lienhart>Lienhart, R. and Maydt, J., "[http://www.videoanalysis.org/Prof._Dr._Rainer_Lienhart/Publications_files/ICIP2002.pdf An extended set of Haar-like features for rapid object detection] {{Webarchive|url=https://web.archive.org/web/20171215071721/http://www.videoanalysis.org/Prof._Dr._Rainer_Lienhart/Publications_files/ICIP2002.pdf |date=2017-12-15 }}", ICIP02, pp. I: 900–903, 2002</ref> introduced the concept of a tilted (45°) Haar-like feature. This was used to increase the [[dimensionality]] of the set of features in an attempt to improve the detection of objects in images. This was successful, as some of these features are able to describe the object in a better way. For example, a 2-rectangle tilted Haar-like feature can indicate the existence of an edge at 45°.
Messom and Barczak<ref>Messom, C.H. and Barczak, A.L.C., "Fast and Efficient Rotated Haar-like Features Using Rotated Integral Images", Australian Conference on Robotics and Automation (ACRA2006), pp. 1–6, 2006</ref> extended the idea to a generic rotated Haar-like feature. Although the idea is sound mathematically, practical problems prevent the use of Haar-like features at any angle. In order to be fast, detection algorithms use low resolution images introducing [[rounding error]]s. For this reason rotated Haar-like features are not commonly used.

Messom and Barczak<ref>Messom, C.H. and Barczak, A.L.C., "[https://web.archive.org/web/20180105011331/https://pdfs.semanticscholar.org/6c95/6a107ba358d873a169a8678080145dcc3d5b.pdf Fast and Efficient Rotated Haar-like Features Using Rotated Integral Images]", Australian Conference on Robotics and Automation (ACRA2006), pp. 1–6, 2006</ref> extended the idea to a generic rotated Haar-like feature. Although the idea is sound mathematically, practical problems prevent the use of Haar-like features at any angle. In order to be fast, detection algorithms use low resolution images introducing [[rounding error]]s. For this reason rotated Haar-like features are not commonly used.


==References==
==References==
<references/>
<references/>

* Haar A. ''Zur Theorie der orthogonalen Funktionensysteme'', Mathematische Annalen, '''69''', pp.&nbsp;331–371, 1910.
==Further reading==
* Haar A. ''[http://www.laurent-duval.eu/Documents-WITS-starlet/Haarlets/Haar_A_1910_ma_zur_tofs-haarlet.pdf Zur Theorie der orthogonalen Funktionensysteme]'', Mathematische Annalen, '''69''', pp.&nbsp;331–371, 1910.


[[Category:Bioinformatics]]
[[Category:Bioinformatics]]

Latest revision as of 10:58, 30 October 2024

Haar-like features are digital image features used in object recognition. They owe their name to their intuitive similarity with Haar wavelets and were used in the first real-time face detector.[1]

Working with only image intensities (i.e., the RGB pixel values at each and every pixel of image) made the task of feature calculation computationally expensive. A publication by Papageorgiou et al.[2] discussed working with an alternate feature set based on Haar wavelets instead of the usual image intensities. Paul Viola and Michael Jones[1] adapted the idea of using Haar wavelets and developed the so-called Haar-like features. A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums. This difference is then used to categorize subsections of an image. For example, with a human face, it is a common observation that among all faces the region of the eyes is darker than the region of the cheeks. Therefore, a common Haar feature for face detection is a set of two adjacent rectangles that lie above the eye and the cheek region. The position of these rectangles is defined relative to a detection window that acts like a bounding box to the target object (the face in this case).

In the detection phase of the Viola–Jones object detection framework, a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated. This difference is then compared to a learned threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features are necessary to describe an object with sufficient accuracy. In the Viola–Jones object detection framework, the Haar-like features are therefore organized in something called a classifier cascade to form a strong learner or classifier.

The key advantage of a Haar-like feature over most other features is its calculation speed. Due to the use of integral images, a Haar-like feature of any size can be calculated in constant time (approximately 60 microprocessor instructions for a 2-rectangle feature).

Rectangular Haar-like features

[edit]
An example of early Haar-like features used by Viola and Jones in 2001.

A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of areas inside the rectangle, which can be at any position and scale within the original image. This modified feature set is called 2-rectangle feature. Viola and Jones also defined 3-rectangle features and 4-rectangle features. The values indicate certain characteristics of a particular area of the image. Each feature type can indicate the existence (or absence) of certain characteristics in the image, such as edges or changes in texture. For example, a 2-rectangle feature can indicate where the border lies between a dark region and a light region.

Fast computation of Haar-like features

[edit]
Finding the sum of the shaded rectangular area

One of the contributions of Viola and Jones was to use summed-area tables,[3] which they called integral images. Integral images can be defined as two-dimensional lookup tables in the form of a matrix with the same size of the original image. Each element of the integral image contains the sum of all pixels located on the up-left region of the original image (in relation to the element's position). This allows to compute sum of rectangular areas in the image, at any position or scale, using only four lookups:

where points belong to the integral image , as shown in the figure.

Each Haar-like feature may need more than four lookups, depending on how it was defined. Viola and Jones's 2-rectangle features need six lookups, 3-rectangle features need eight lookups, and 4-rectangle features need nine lookups.

Tilted Haar-like features

[edit]
Illustration of Haar-like features proposed by Lienhart: 4 edge features, 8 line features, and 2 center-surround features
The extension proposed by Lienhart and Maydt[4]

Lienhart and Maydt[4] introduced the concept of a tilted (45°) Haar-like feature. This was used to increase the dimensionality of the set of features in an attempt to improve the detection of objects in images. This was successful, as some of these features are able to describe the object in a better way. For example, a 2-rectangle tilted Haar-like feature can indicate the existence of an edge at 45°.

Messom and Barczak[5] extended the idea to a generic rotated Haar-like feature. Although the idea is sound mathematically, practical problems prevent the use of Haar-like features at any angle. In order to be fast, detection algorithms use low resolution images introducing rounding errors. For this reason rotated Haar-like features are not commonly used.

References

[edit]
  1. ^ a b Viola and Jones, "Rapid object detection using a boosted cascade of simple features", Computer Vision and Pattern Recognition, 2001
  2. ^ Papageorgiou, Oren and Poggio, "A general framework for object detection", International Conference on Computer Vision, 1998.
  3. ^ Crow, F, "Summed-area tables for texture mapping Archived 2017-08-08 at the Wayback Machine", in Proceedings of SIGGRAPH, 18(3):207–212, 1984
  4. ^ a b Lienhart, R. and Maydt, J., "An extended set of Haar-like features for rapid object detection Archived 2017-12-15 at the Wayback Machine", ICIP02, pp. I: 900–903, 2002
  5. ^ Messom, C.H. and Barczak, A.L.C., "Fast and Efficient Rotated Haar-like Features Using Rotated Integral Images", Australian Conference on Robotics and Automation (ACRA2006), pp. 1–6, 2006

Further reading

[edit]