I'm having a problem with OpenCV's various parameterization of coordinates used for **camera** calibration purposes. The problem is that three different sources of information on image distortion formulae apparently give three non-equivalent description of the parameters and equations involved:
(1) In their book "Learning OpenCV…" Bradski and Kaehler write regarding lens distortion (page 376):
xcorrected = x * ( 1 + k1 * r^2 + k2 * r^4 + k3 * r^6 ) + [ 2 * p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ],
ycorrected = y * ( 1 + k1 * r^2 + k2 * r^4 + k3 * r^6 ) + [ p1 * ( r^2 + 2 * y^2 ) + 2 * p2 * x * y ],
where r = sqrt( x^2 + y^2 ).
Assumably, (x, y) are the coordinates of pixels in the uncorrected captured image corresponding to world-point objects with coordinates (X, Y, Z), camera-frame referenced, for which
xcorrected = fx * ( X / Z ) + cx and ycorrected = fy * ( Y / Z ) + cy,
where fx, fy, cx, and cy, are the **camera**'s intrinsic parameters. So, having (x, y) from a captured image, we can obtain the desired coordinates ( xcorrected, ycorrected ) to produced an undistorted image of the captured world scene by applying the above first two correction expressions.
However...
(2) The complication arises as we look at OpenCV 2.0 C Reference entry under the **Camera** Calibration and 3D Reconstruction section. For ease of comparison we start with all world-point (X, Y, Z) coordinates being expressed with respect to the **camera**'s reference frame, just as in #1. Consequently, the transformation matrix [ R | t ] is of no concern.
In the C reference, it is expressed that:
x' = X / Z,
y' = Y / Z,
x'' = x' * ( 1 + k1 * r'^2 + k2 * r'^4 + k3 * r'^6 ) + [ 2 * p1 * x' * y' + p2 * ( r'^2 + 2 * x'^2 ) ],
y'' = y' * ( 1 + k1 * r'^2 + k2 * r'^4 + k3 * r'^6 ) + [ p1 * ( r'^2 + 2 * y'^2 ) + 2 * p2 * x' * y' ],
where r' = sqrt( x'^2 + y'^2 ), and finally that
u = fx * x'' + cx,
v = fy * y'' + cy.
As one can see these expressions are not equivalent to those presented in #1, with the result that the two sets of corrected coordinates ( xcorrected, ycorrected ) and ( u, v ) are not the same. Why the contradiction? It seems to me the first set makes more sense as I can attach physical meaning to each and every x and y in there, while I find no physical meaning in x' = X / Z and y' = Y / Z when the **camera** focal length is not exactly 1. Furthermore, one cannot compute x' and y' for we don't know (X, Y, Z).
(3) Unfortunately, things get even murkier when we refer to the writings in Intel's Open Source Computer Vision Library Reference Manual's section Lens Distortion (page 6-4), which states in part:
"Let ( u, v ) be true pixel image coordinates, that is, coordinates with ideal projection, and ( u ~, v ~ ) be corresponding real observed (distorted) image coordinates. Similarly, ( x, y ) are ideal (distortion-free) and ( x ~, y ~ ) are real (distorted) image physical coordinates. Taking into account two expansion terms gives the following:
x ~ = x * ( 1 + k1 * r^2 + k2 * r^4 ) + [ 2 p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ]
y ~ = y * ( 1 + k1 * r^2 + k2 * r^4 ] + [ 2 p2 * x * y + p2 * ( r^2 + 2 * y^2 ) ],
where r = sqrt( x^2 + y^2 ). ...
"Because u ~ = cx + fx * u and v ~ = cy + fy * v , … the resultant system can be rewritten as follows:
u ~ = u + ( u – cx ) * [ k1 * r^2 + k2 * r^4 + 2 * p1 * y + p2 * ( r^2 / x + 2 * x ) ]
v ~ = v + ( v – cy ) * [ k1 * r^2 + k2 * r^4 + 2 * p2 * x + p1 * ( r^2 / y + 2 * y ) ]
The latter relations are used to undistort images from the **camera**."
Well, it would appear that the expressions involving x ~ and y ~ coincided with the two expressions given at the top of this writing involving xcorrected and ycorrected. However, x ~ and y ~ do not refer to corrected coordinates, according to the given description. I don't understand the distinction between the meaning of the coordinates ( x ~, y ~ ) and ( u ~, v ~ ), or for that matter, between the pairs ( x, y ) and ( u, v ). From their descriptions it appears their only distinction is that ( x ~, y ~ ) and ( x, y ) refer to 'physical' coordinates while ( u ~, v ~ ) and ( u, v ) do not. What is this distinction all about? Aren't they all physical coordinates? I'm lost!
Thanks for any input!