Difference between Z-test, F-test, and T-test

October 30, 2013 Leave a comment


A z-test is used for testing the mean of a population versus a standard, or comparing the means of two populations, with large (n ≥ 30) samples whether you know the population standard deviation or not. It is also used for testing the proportion of some characteristic versus a standard proportion, or comparing the proportions of two populations.
Example:Comparing the average engineering salaries of men versus women.
Example: Comparing the fraction defectives from 2 production lines.

A t-test is used for testing the mean of one population against a standard or comparing the means of two populations if you do not know the populations’ standard deviation and when you have a limited sample (n < 30). If you know the populations’ standard deviation, you may use a z-test.
Example:Measuring the average diameter of shafts from a certain machine when you have a small sample.

An F-test is used to compare…

View original post 90 more words

Categories: 杂记

Understanding Harris corner detector

February 16, 2013 3 comments

Let’s first go over Harris detector a little bit. For a basic idea about Harris detector, check textbooks or opencv or blogs. For its intuition, check its precursor Movarec operator, which explains why we want to maximize the variation within a window to find a corner. The key to Harris detector is the variation of intensity within a sliding window w(x,y) (with displacement u in the x direction and v in the y direction)

E(u,v) = \sum_{x,y} w(x,y)[I(x+u, y+v) - I(x,y)]^2


  • w(x,y) is the window weighting function at position (x,y)
  • I(x,y) is the intensity at (x,y)
  • I(x+u, y+v) is the intensity at the moved window (x+u, y+v)

The aim is to maximize E(u,v). Applying Taylor expansion and some arithmetic operations, we can get

E(u,v) \approx \sum_{x,y} w(x,y) (u^2I_x^2+2uvI_xI_y+v^2I_y^2) \\ = (u,v)\left( \sum_{x,y}w(x,y) \begin{bmatrix} I_x^2 & I_xI_y \\ I_xI_y & I_y^2\end{bmatrix}\right) \begin{pmatrix}u\\v \end{pmatrix} \\ =(u,v)M\begin{pmatrix}u\\v \end{pmatrix}

where (I_x, I_y) is the gradient at (x,y).  Matrix M is known as structure tensor of a pixel.  It actually a characterization of information of all pixels within the window. Then we calculate the eigenvalues \lambda_1 and \lambda_2 of matrix M to determine if window corresponds to a corner. (I)

  • If \lambda_1 \approx 0 and \lambda_2 \approx 0 then this pixel (x,y) has no features of interest.
  • If \lambda_1 \approx 0 and  \lambda_2 has some large positive value, then an edge is found.
  • If \lambda_1 and \lambda_2 have large positive values, then a corner is found.

[Note: to avoid computing the eigenvalues which is computationally expensive, in original Harris detector, the authors suggested using the following score:

R = det(M) - k(trace(M))^{2}

to determine if a point is a corner.]

The question is WHY? Why does the matrix M have the capacity to determine the cornerness? Here, we look around to the covariance matrix \Sigma of gradient (I_x, I_y) at a pixel. Let \mu_{x}=E[ I_{x} ],\mu_{y}=E[ I_{y}]. By definition of covariance matrix, we have

\begin{array}{rl} \Sigma & = \begin{bmatrix} E[ (I_{x}-\mu_{x})(I_{x}-\mu_{x}) ] & E[ (I_{x}-\mu_{x})(I_{y}-\mu_{y}) ] \\ E[ (I_{y}-\mu_{y})(I_{x}-\mu_{x}) ] & E[ (I_{y}-\mu_{y})(I_{y}-\mu_{y}) ] \\ \end{bmatrix} \\ & = \begin{bmatrix} E[ I_{x}I_{x} ] - \mu_{x}^{2} & E[ I_{x}I_{y} ] - \mu_{x}\mu_{y}\\ E[ I_{y}I_{x} ] - \mu_{y}\mu_{x} & E[ I_{y}I_{y} ] - \mu_{y}^{2}\\ \end{bmatrix} \\ & \triangleq \begin{bmatrix} E[ I_{x}I_{x} ] & E[ I_{x}I_{y} ]\\ E[ I_{y}I_{x} ] & E[ I_{y}I_{y} ]\\ \end{bmatrix}\\ \end{array}

The last step is derived based on the the assumption that the mean of derivatives I_x and I_y is zeros, i.e. (\mu_{x},\mu_{y})=0.

On the other hand, the structure tensor M is as follows if we apply an averaging filter over the time dimension (= we give uniform weight over the pixels within the pixels):

\begin{array}{rl} M & = \frac{1}{f}\sum_{x,y} \begin{bmatrix} I_{x}^2 & I_{x}I_{y}\\ I_{x}I_{y} & I_{y}^2 \\ \end{bmatrix} \\ & = \begin{bmatrix} \widehat{I_{x}^2} & \widehat{I_{x}I_{y}}\\ \widehat{I_{x}I_{y}} & \widehat{I_{y}^2} \\ \end{bmatrix} \\ \end{array}

where f is total number of pixels within the window, and \widehat{f} is the average of f.

It can be seen that structure tensor M is actually an unbiased estimate of the covariance matrix of the gradients for the pixels within the window. Therefore, any findings/conclusions about covariance matrix could be used to analyze structure tensors.

Let’s look at an example showing how the derivatives are distributed for different types of image patches (i.e. the window).  The following figures are adapted from Robert Collins‘s slides. In this example, the image patches are from three different types: linear edge, flat area and corner. You can image that we are testing if the pixel at the center is a corner (feature point).

Screenshot-1       The first step is calculating the derivatives I_x and I_y for each pixel in the image, then the distributions of (I_x, I_y) are shown in the figure below. Obviously, the distributions are discriminative fro these three different type of pixels (on the edge, on flat area and a corner).(II)

  • For corners, the ranges in both I_x and I_y are large.
  • For points on an edge, one derivative has wide distribution while the other is almost all near zeros.
  • For points of a flat area, both derivatives are around zeros.

This observation is indeed the reason behind (I) we discussed above.


In Relationship between eigenvectors and directions of data distribution, we have shown that the eigenvectors of a covariance matrix are the same as the directions of principle component (i.e. the axes of a ellipsoid enclosing the data points), and the eigenvalues measure the variances of date points along eigenvectors. (Click Read more to see the relationship between eigenvalues and the semi-axes) Therefore, if the smallest eigenvalue is big enough, then it means the data have large variance along all different directions–it’s a corner!

In a word, the structure tensor is actually the covariance matrix of gradients for pixels around the pixel investigated, the the reason for current eigenvalue-based analysis is based on the property of covariance matrices.

Read more…

[OPENCV] Feature detectors and descriptors

February 16, 2013 1 comment

For a simple example of how to setup and use them, see [OPENCV] OpenCV+Eclipse|Qt|VS2010

For many computer vision tasks, the first step is usually extracting features from images. It consists of two parts: feature detection and description. By feature detection, we mean to find the pixels (or regions, objects) of interest in an image, which are represented by cv::KeyPoint. They are corresponding to the corners of objects in the image. OpenCV implements many popular feature points detection methods, such as SIFT, SURF, ORB, GoodFeatureToTrack and so on.

After these feature points are located, the next step is to describe them. An intuitive description of a feature point could be its intensity/color value. But a single intensity/color value does not provide sufficiently discriminative information so that this feature point can be uniquely matched to feature points in another image. Usually we use the information around a feature point to represent it, i.e. we use information of many pixels to represent one feature point. Based on Hubel & Wiesel’s research on information processing, orientation is the most sensitive factor to visual system. That’s why most feature techniques are based on orientation to describe feature points. For example, SIFT descriptor uses the normalized orientation histogram to describe a feature point. See figure below. Since this post is not about how the descriptors are designed, interested readers should read their papers.


To provide consistent programming interface, OpenCV designs the feature detectors and descriptor extractors following OO principle. Below is the class hierarchy of most detectors and descriptors for version 2.4.1. FeatureDetector and DescriptorExtractor are the base class for detectors and descriptors, respectively. Feature2d simplifies detection and extraction by  deriving from FeatureDetector and DescriptorExtractor.  But Feature2d is still an abstract class because it doesn’t implement the virtual function detectImpl() and computeImpl().


Classes SURF, SIFT, BRISK and ORB (and others) are concrete classes which are both detectors and descriptor extractors. In order to explicitly use them as detectors or descriptor extractors, aliases are defined. For example,

typedef SURF SurfFeatureDetector;
typedef SURF SurfDescriptorExtractor;

typedef SIFT SiftFeatureDetector;
typedef SIFT SiftDescriptorExtractor;

typedef ORB OrbFeatureDetector;
typedef ORB OrbDescriptorExtractor;

From the perspective of functionality, there’s no distinction between SurfFeatureDetector and SurfDescriptorExtractor. They are exactly the same thing except they have different names. The same is true for SiftFeatureDetecor and SiftDescriptorExtractor, and others.

From this class hierarchy diagram, it can also be seen that FAST, GFTTDetector and DenseFeatureDetector are only feature point detectors. After obtaining the feature points (std::vector<cv::KeyPoint>), we can use SURF, SIFT or any other descriptor extractors to get a description for these feature points. std::vector<cv::KeyPoint> is the data structure for data passing.

Categories: 计算机视觉, 杂记 Tags: ,

Relationship between eigenvectors and directions of data distribution

February 7, 2013 1 comment

For approaches based on dimension reduction, such as Principle Component Analysis, one critical question is “To which direction should we project the data?” Given N observations of variable X, we might compute the covariance matrix, and then use its eigenvectors as the direction of projection. Lots of approaches use this technique (e.g. Harris corner detector), but WHY? What’s the reason behind? This posts explains.g12

In the figure above, we illustrate a distribution of a 2D dataset. It’s already demeaned. The discussion below applies to any dimension size.  Suppose each data point is N-dimensional X. Intuitively, we want to find linear functions of X, which project X into a new space. Since the data is already demeaned, \bar{X}=0 which means the origin of the axes is now at the center of the data,  and the linear function has no intercept part.

We start by defining the first linear function (or projection): \alpha_1^TX, where \alpha_1 is the coefficients of the linear function (see the figure above for 2D projection where \alpha_1=(a,b)^T). Looking at the data plot above, the line of \alpha_1 should pass through the data points, and the projected points should have largest span. Mathematically speaking, we want to maximize the data variance after projecting  them to a line \alpha, i.e. to find \alpha_1 so that

                                             \alpha_1 = \arg\max_\alpha \{Var(\alpha^TX)\}

Example: For 2D data, X=(X_1, X_2), \alpha=(a,b). \alpha^TX=aX_1+bX_2. It’s easy to show that the expectation E(\alpha^TX)=aE(X_1)+bE(X_2)=a\mu_1+b\mu_2. The variance

\begin{array} {rl} Var(\alpha^TX) & =E[ (aX_1+bX_2)-(a\mu_1+b\mu_2) ]^2\\ & =E\left[a(X_1-\mu_1)+b(X_2-\mu_2)\right]^2 \\ & =E\left[a^2(X_1-\mu_1)^2+b^2(X_2-\mu_2)^2+2ab(X_1-\mu_1)(X_2-\mu_2)\right] \\ & =a^2Var(X_1)+b^2Var(X_2)+2abCov(X_1, X_2) \\ & =a^2\sigma_{11}+b^2\sigma_{22}+2ab\sigma_{12} \\ & =(a,b)\begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix}\\ & = \alpha^T\Sigma\alpha \end{array}

where \Sigma is the covariance matrix of the original data.  It is worth noticing that the finding Var(\alpha^TX)=\alpha^T\Sigma\alpha can be extended to any higher dimensional variable X. So right now our aim becomes

\alpha_1 = \arg\max_\alpha \alpha^T\Sigma\alpha

Without any constraint, we could always pick a larger \alpha so there’s no solution to \alpha_1. We choose normalized \alpha, which means \|\alpha\|^2=\alpha^T\alpha=1.

Solution: Now our problem becomes a optimization problem subject to a constraint. We use Lagrange multipliers to maximize the function


with respect to \alpha. This results in

\begin{array}{rl} \frac{d}{d\alpha}\left(\alpha\Sigma\alpha-\lambda(\alpha^T\alpha-1)\right) & =0\\ \Sigma\alpha-\lambda\alpha & = 0 \\ \Sigma\alpha & = \lambda\alpha\\ \end{array}

Suppose the solution to this equation is \alpha=\alpha^* and \lambda=\lambda_1.

It can be easily seen that the \alpha_1 = \alpha^* is the eigenvector of \Sigma. When the line (represented by \alpha) is in the same direction as the eigenvector associated with largest eigenvalue, the variance will be the largest when we project the original data onto this line, and the variable is:

\alpha_1^T\Sigma\alpha_1 = \alpha_1^T\lambda\alpha_1 = \lambda_1\alpha_1^T\alpha_1 = \lambda_1

Therefore, we should choose the eigenvector of the covariance matrix with largest eigenvalue as the basis of first dimension (first principle component), and the corresponding eigenvalue is indeed the variance of that projected data in that dimension.


Following the same deduction, we can conclude that the k-th eigenvector \alpha_k is the $k$-th principle component, and Var(\alpha^TX)=\lambda_k. For example, for k=1,

\begin{array}{rl} &\arg\max_{\alpha_2} \alpha_2^T\Sigma\alpha_2 \\ s.t. &\alpha_2^T\alpha_2 = 1 \\ &\alpha_2\alpha_1=0 \end{array}

(The second due to cov(\alpha_1^TX,\alpha_2^TX)=\alpha_1^T\Sigma\alpha_2=\alpha_2^T\Sigma\alpha_1=\alpha_2^T\lambda_1\alpha_1=\lambda_1\alpha_2\alpha_1=0 .)

Use Lagrange multiplier again, you will get

\Sigma\alpha_2 = \lambda_2\alpha_2


If you read here, I hopeit is clear now the reason why the eigenvector with largest eigenvalue coincide with the direction of biggest variance and why the eigenvectors of a covariance matrix are used as the directions of principle components.

Note: One more question in my mind is why we are so furtunate to have such a simple yet powerful solution, but I want to stop here today.

Categories: 统计, 学习&研究 Tags: , ,

Statistical Data Mining Basics

November 14, 2012 Leave a comment

Statistical Data Mining Tutorials
by Andrew Moore

The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.

Read more…

Categories: 统计 Tags:

Articles on Doctoral Education

November 8, 2012 Leave a comment
Categories: 杂记

[OPENGL]Some OpenGL Examples

August 30, 2012 Leave a comment

Here are some OpenGL programs written for course assignments, from the very basic one to some advanced application. The codes are hosted on github:https://github.com/gccheng/openGL-tutorials

1. Rotating pendulum

2. Hand & fingers

3. Solar System


4. Array operation (Rotation)

function from gc-cheng@hotmail.com on Vimeo.

5. Quaternion algebra

planet-quaternion algebra from gc-cheng@hotmail.com on Vimeo.

6. 分段贝塞尔(Bezier)曲线


实现原理:Piecewise Bezier Curve