ALE
Image Processing Software Deblurring, Anti-aliasing, and Superresolution. Local Operation localhost 5393119533 |
[ Up | Merging | Drizzling | Enhancement | Irani-Peleg | Alignment ]
ALE combines a series of input frames into a single output image possibly having:
This page discusses related work, models of program input, renderers used by ALE, the alignment algorithm used, an overview of the structure of the program, and examples for which the above output image characteristics are satisfied.
ALE's approach to exposure registration and certainty-based rendering is not discussed in the current version of this document. See the source code for implementation details.
Note: This document uses HTML 4 character entities. Sigma is 'Σ'; summation is '∑'.
The drizzling renderer used in ALE is based on Richard Hook and Andrew Fruchter's drizzling technique.
Steve Mann's discussions (e.g. here and here) of increased spatial extents, projective transformations, image processing on linear data, and weighting by certainty functions, have influenced features incorporated by ALE.
ALE incorporates an iterative solver based on Michal Irani and Shmuel Peleg's iterative deblurring and super-resolution technique (see here).
The following conventions are used in this document to denote sets of scalar values:
Symbol | Meaning |
---|---|
NN | Set of integers ≥ 0. |
R | Set of real numbers. |
R+ | Set of real numbers ≥ 0. |
For d1, d2 ∈ NN, A discrete image D of size (d1, d2) is a function
D: {0, 1, …, d1 - 1}×{0, 1, …, d2 - 1} → R+×R+×R+
The set of all discrete images is denoted DD.
For c1, c2 ∈ R+, a continuous image I of size (c1, c2) is a function
I: [0, c1]×[0, c2] → R+×R+×R+
The set of all continuous images is denoted CC.
A position is a point in Euclidean 3-space R3.
A direction is a point on the unit 3-sphere S3.
A scene S is a function
S: R3 × S3 → R+×R+×R+
mapping a position and direction to an RGB triple. The set of all scenes is denoted SS.
A view V is a function
V: SS → CC
A camera snapshot is defined as a triple consisting of:
For positive integer N, a sequence of camera snapshots { K1, K2, …, KN }, defined by the triples { Kj = (Sj, Vj, dj) } is a camera input frame sequence if, for all j and j', Sj = Sj'.
A projective transformation is a transformation that maps lines to lines. For more details, see:
Heckbert, Paul. "Projective Mappings for Image Warping." Excerpted from his Master's Thesis (UC Berkeley, 1989). 1995. http://www.cs.cmu.edu/afs/cs/project/classes-ph/862.95/www/notes/proj.ps
A projective snapshot is defined as an n-tuple consisting of:
For positive integer N, a sequence of projective snapshots { P1, P2, …, PN }, defined by the n-tuples { Pj = (Σj, qj, dj) }, is a projective input frame sequence if, for all j and j', Σj = Σj'.
The first frame in the sequence of input frames is called the original frame, and subsequent frames are called supplemental frames.
Given a camera input frame sequence { Kj = (S, Vj, dj) }, if there exists a continuous image Σ and a sequence { qj } of projective transformations with restricted domain such that, for all j, Vj(S) = qj(Σ), then this camera input frame sequence admits a corresponding projective input frame sequence { Pj = (Σ, qj, dj) }.
Informally, two cases where such construction is possible for an ideal pinhole camera are:
For more information about the properties of projective transformations, see the first of Steve Mann's papers referenced in the Related Work section above.
For a projective input frame sequence { Pj = (Σ, qj, dj) }, a projective renderer without extension is an algorithm that outputs a discrete image approximation of composite(Σ, q1). The assumptions used in calculating the approximation vary across rendering methods.
For a projective input frame sequence { Pj = (Σ, qj, dj) }, a projective rendering method with extension is an algorithm that outputs a discrete image approximation of composite(Σ', q1'), where q1' extends the domain of q1 so that its range is a superset of the domain of Σ, and Σ' extends Σ to match the range of q1'. The assumptions used in calculating the approximation vary across rendering methods.
All renderers can be used with or without extension (according to whether the --extend flag is used). The target image for approximation (either composite(Σ, q1) or composite(Σ', q1')) is generically called T.
Renderers can be of incremental or non-incremental type. Incremental renderers update the rendering as each new frame is loaded, while non-incremental renderers update the rendering only after all frames have been loaded.
Incremental renderers contain two data structures that are updated with each new frame: an accumulated image A with elements Ax, y and the associated weight array W with elements Wx, y. The accumulated image stores the current rendering result, while the weight array stores information about contributions to each accumulated image pixel.
The following table lists predicates which may be useful in determining whether the discrete-image output of a rendering method approximates T. The section following this lists, for each renderer, a collection of predicates which should result in T being approximated.
Predicate Explanation Alignment The projective input frame transformations qj are known. Translation All projective input frame transformations qj are translations. Point sampling (∀j) (∀x ∈ Domain[qj]) (dj(composite(T, qj))(x) = composite(T, qj)(x)). Box Filter Approximation An acceptable discrete approximation of T can be achieved by partitioning it into a square grid and representing each square region by its mean value. Jittering Assumption 1 The average of several point samples drawn uniformly from a region of T is an acceptable approximation for the mean value of the region. Jittering Assumption 2 Each region in T corresponding to an output pixel has been sampled several times at points drawn uniformly from the region. Small radius The radius parameter used for drizzling is chosen to be sufficiently small. Barlett filter approximation Convolution of T with a Bartlett (aka triangle) filter remains an acceptable approximation of T. Linear PSF only There is no non-linear PSF component. USM approximation The Unsharp Mask technique provides an acceptable approximation of the inverse convolution for the given linear point-spread function. Correct PSF The behavior of dj is equivalent to convolution with the given point-spread functions. Low Response Approximation Frequencies to which dj has low response need not be accurately reconstructed in the program output. Convergence The Irani-Peleg technique converges to the desired output for the given input sequence. This condition is proven for special cases in the source paper.
For each renderer, the following table gives a collection of rendering predicates that should result in rendered output being an acceptable approximation of T. Note that renderers may produce acceptable output even when these predicates are not satisfied. Justification for the entries in this table should appear in the detailed descriptions; if this is not the case, then the values given should be considered unreliable.
M D U V I Alignment X X X X X Translation X X X Point sampling X X Box Filter Approximation X X X X Jittering Assumption 1 X X X X Jittering Assumption 2 X X X X Small radius X X Bartlett filter approximation X X Linear PSF X X USM approximation X X Correct PSF X X X Low Response Approximation X X X X X Convergence X
First, a merging renderer is instantiated. Then, program flags are used to determine what other renderers should be instantiated.
An iterative loop supplies to the renderers each of the frames in sequence, beginning with the original frame. The drizzling and merging renderers are incremental renderers, and immediately update their renderings with each new frame, while the USM and Irani-Peleg renderers do not act until the final frame has been received.
In the case of the incremental renderers, the original frame is used without transformation, and each supplemental frame is transformed according to the results of the alignment algorithm, which aligns each new frame with the current rendering of the merging renderer.
Once all frames have been aligned and merged, non-incremental renderers produce renderings based on input frames, alignment information, and the output of other renderers.
For a projective input frame sequence { Pk = (Σ, q, dk) }, these examples use the shorthand
I = composite(Σ, q)and
Dk = dk(I).
Assume a projective input frame sequence { Pk } of K frames, defined by n-tuples { Pk = (Σ, q, dk) }, such that (∃a > 0) (∀x ∈ Domain[q]) (∀y ∈ {R, G, B}) (I(x)y > a/2) and, for one such a, (∀x) (∀y) (Dk(x)y = I(x)y + nk,x,y), where nk,x,y are i.i.d. additive noise terms having uniform distribution over the interval (-a/2, a/2).
In this case, rendering input frames by merging or drizzling averages the additive noise terms of the input images to produce an output additive noise term; when K ≥ 2, this output noise has a smaller expected absolute value than that of any given input noise term.
To prove this, first observe that rendered output image O (for drizzling or merging) averages the K inputs with equal weight:
O(x)y = ∑k (Dk(x)y / K)Substituting for Dk:
O(x)y = ∑k [(I(x)y + nk,x,y) / K]Moving the constant outside of the sum:
O(x)y = (I(x)y / K) * K + ∑k (nk,x,y / K)
O(x)y = I(x)y + ∑k (nk,x,y / K)The last term in the equation above is the noise term for the output image. Leaving location and channel implicit, the expected absolute value for this term is:
Eout = E[|∑k (nk / K)|]Since there is nonzero probability that both strictly negative and strictly positive terms appear in the sum, this gives rise to the strict inequality:
Eout < E[∑k |nk / K|]By linearity of expectation, this is equivalent to:
Eout < ∑k E[|nk / K|]
Eout < ∑k E[|nk|] / KSince the input noise distributions are identical, this reduces to:
Eout < Ein * K / K
Eout < Einwhere Ein = E[|nk|] for any choice of k.
Assume a projective input frame sequence { Pk } of K frames, defined by n-tuples { Pk = (Σ, q, dk) }, such that components of I assume a real value in the interval [0, 1] and (∀ x, y, k) (Probability[Dk(x)y = 1] = 1 - Probability[Dk(x)y = 0] = I(x)y]).
Since the transformations of all frames are identical, rendering with merging or drizzling averages the values for each pixel Dk(x) over all values of k. Since the expected value of Dk(x) is equal to I(x), the expected value of the corresponding pixel in the rendered output is always equal to I(x). However, the probability of obtaining a value within a small neighborhood of I(x) is generally not high for small numbers of frames.
As the number of frames increases, however, the probability of obtaining a value within a neighborhood of any given size approaches 1. Hence, arbitrarily high tonal resolution can be achieved with arbitrarily small (but non-zero) likelihood of error. The exact probability can be determined according to the binomial distribution.
(The above constitutes an outline of a proof, but there may be holes in it.)
Assume a projective input frame sequence such that each snapshot can be associated with a point in the scene invisible from that snapshot, but visible from some other snapshot.
In this case, for every snapshot in the input sequence, the output rendering
will include at least one point not present in that snapshot.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.