ALE
Image Processing Software

Deblurring, Anti-aliasing, and Superresolution.

Local Operation

localhost
5393119533

ALE Version 0.6.0 Technical Description

Abstract

ALE combines a series of input frames into a single output image possibly having:

Reduced noise.
Reduced aliasing.
Increased tonal resolution.
Increased spatial resolution.
Increased spatial extents.

This page discusses related work, models of program input, renderers used by ALE, the alignment algorithm used, an overview of the structure of the program, and examples for which the above output image characteristics are satisfied.

ALE's approach to exposure registration and certainty-based rendering is not discussed in the current version of this document. See the source code for implementation details.

Note: This document uses HTML 4 character entities. Sigma is 'Σ'; summation is '∑'.

Related Work

The drizzling renderer used in ALE is based on Richard Hook and Andrew Fruchter's drizzling technique.

Steve Mann's discussions (e.g. here and here) of increased spatial extents, projective transformations, image processing on linear data, and weighting by certainty functions, have influenced features incorporated by ALE.

ALE incorporates an iterative solver based on Michal Irani and Shmuel Peleg's iterative deblurring and super-resolution technique (see here).

Models of Program Input

Scalars

The following conventions are used in this document to denote sets of scalar values:

Symbol	Meaning
NN	Set of integers ≥ 0.
R	Set of real numbers.
R⁺	Set of real numbers ≥ 0.

RGB triples

An RGB triple is a member of the set R⁺×R⁺×R⁺. Components of an RGB triple t are denoted t_R, t_G, t_B.

Discrete Images

For d₁, d₂ ∈ NN, A discrete image D of size (d₁, d₂) is a function

D: {0, 1, …, d₁ - 1}×{0, 1, …, d₂ - 1} → R⁺×R⁺×R⁺

The set of all discrete images is denoted DD.

Continuous Images

For c₁, c₂ ∈ R⁺, a continuous image I of size (c₁, c₂) is a function

I: [0, c₁]×[0, c₂] → R⁺×R⁺×R⁺

The set of all continuous images is denoted CC.

Position and Direction

A position is a point in Euclidean 3-space R³.

A direction is a point on the unit 3-sphere S₃.

Scene

A scene S is a function

S: R³ × S₃ → R⁺×R⁺×R⁺

mapping a position and direction to an RGB triple. The set of all scenes is denoted SS.

View

A view V is a function

V: SS → CC

Camera Snapshots

A camera snapshot is defined as a triple consisting of:

A scene S.
A view V.
A function d: CC → DD.

Camera Input Frame Sequences

For positive integer N, a sequence of camera snapshots { K₁, K₂, …, K_N }, defined by the triples { K_j = (S_j, V_j, d_j) } is a camera input frame sequence if, for all j and j', S_j = S_j'.

Projective transformations

A projective transformation is a transformation that maps lines to lines. For more details, see:

Heckbert, Paul. "Projective Mappings for Image Warping." Excerpted from his Master's Thesis (UC Berkeley, 1989). 1995. http://www.cs.cmu.edu/afs/cs/project/classes-ph/862.95/www/notes/proj.ps

Projective Snapshots

A projective snapshot is defined as an n-tuple consisting of:

A continuous image Σ.
A projective transformation q with restricted domain, such that composite(Σ, q) ∈ CC
A function d: CC → DD.

Projective Input Frame Sequences

For positive integer N, a sequence of projective snapshots { P₁, P₂, …, P_N }, defined by the n-tuples { P_j = (Σ_j, q_j, d_j) }, is a projective input frame sequence if, for all j and j', Σ_j = Σ_j'.

The first frame in the sequence of input frames is called the original frame, and subsequent frames are called supplemental frames.

Construction of Projective Input Frame Sequences from Camera Input Frame Sequences

Given a camera input frame sequence { K_j = (S, V_j, d_j) }, if there exists a continuous image Σ and a sequence { q_j } of projective transformations with restricted domain such that, for all j, V_j(S) = q_j(Σ), then this camera input frame sequence admits a corresponding projective input frame sequence { P_j = (Σ, q_j, d_j) }.

Informally, two cases where such construction is possible for an ideal pinhole camera are:

A sequence of frames taken from a fixed position in space, but with variable orientation.
A sequence of frames depicting a single flat, diffuse surface, taken from arbitrary positions and orientations.

For more information about the properties of projective transformations, see the first of Steve Mann's papers referenced in the Related Work section above.

Projective Renderer without Extension

For a projective input frame sequence { P_j = (Σ, q_j, d_j) }, a projective renderer without extension is an algorithm that outputs a discrete image approximation of composite(Σ, q₁). The assumptions used in calculating the approximation vary across rendering methods.

Projective Renderer with Extension

For a projective input frame sequence { P_j = (Σ, q_j, d_j) }, a projective rendering method with extension is an algorithm that outputs a discrete image approximation of composite(Σ', q₁'), where q₁' extends the domain of q₁ so that its range is a superset of the domain of Σ, and Σ' extends Σ to match the range of q₁'. The assumptions used in calculating the approximation vary across rendering methods.

Renderers

Extension

All renderers can be used with or without extension (according to whether the --extend flag is used). The target image for approximation (either composite(Σ, q₁) or composite(Σ', q₁')) is generically called T.

Renderer Types

Renderers can be of incremental or non-incremental type. Incremental renderers update the rendering as each new frame is loaded, while non-incremental renderers update the rendering only after all frames have been loaded.

Incremental renderers contain two data structures that are updated with each new frame: an accumulated image A with elements A_{x, y} and the associated weight array W with elements W_{x, y}. The accumulated image stores the current rendering result, while the weight array stores information about contributions to each accumulated image pixel.

Renderer Details

These pages offer detailed descriptions of renderers.

Incremental Renderers

Non-incremental Renderers

Rendering Predicates

The following table lists predicates which may be useful in determining whether the discrete-image output of a rendering method approximates T. The section following this lists, for each renderer, a collection of predicates which should result in T being approximated.

Predicate Explanation
Alignment The projective input frame transformations q_j are known.
Translation All projective input frame transformations q_j are translations.
Point sampling (∀j) (∀x ∈ Domain[q_j]) (d_j(composite(T, q_j))(x) = composite(T, q_j)(x)).
Box Filter Approximation An acceptable discrete approximation of T can be achieved by partitioning it into a square grid and representing each square region by its mean value.
Jittering Assumption 1 The average of several point samples drawn uniformly from a region of T is an acceptable approximation for the mean value of the region.
Jittering Assumption 2 Each region in T corresponding to an output pixel has been sampled several times at points drawn uniformly from the region.
Small radius The radius parameter used for drizzling is chosen to be sufficiently small.
Barlett filter approximation Convolution of T with a Bartlett (aka triangle) filter remains an acceptable approximation of T.
Linear PSF only There is no non-linear PSF component.
USM approximation The Unsharp Mask technique provides an acceptable approximation of the inverse convolution for the given linear point-spread function.
Correct PSF The behavior of d_j is equivalent to convolution with the given point-spread functions.
Low Response Approximation Frequencies to which d_j has low response need not be accurately reconstructed in the program output.
Convergence The Irani-Peleg technique converges to the desired output for the given input sequence. This condition is proven for special cases in the source paper.

Predicate	Explanation
Alignment	The projective input frame transformations q_j are known.
Translation	All projective input frame transformations q_j are translations.
Point sampling	(∀j) (∀x ∈ Domain[q_j]) (d_j(composite(T, q_j))(x) = composite(T, q_j)(x)).
Box Filter Approximation	An acceptable discrete approximation of T can be achieved by partitioning it into a square grid and representing each square region by its mean value.
Jittering Assumption 1	The average of several point samples drawn uniformly from a region of T is an acceptable approximation for the mean value of the region.
Jittering Assumption 2	Each region in T corresponding to an output pixel has been sampled several times at points drawn uniformly from the region.
Small radius	The radius parameter used for drizzling is chosen to be sufficiently small.
Barlett filter approximation	Convolution of T with a Bartlett (aka triangle) filter remains an acceptable approximation of T.
Linear PSF only	There is no non-linear PSF component.
USM approximation	The Unsharp Mask technique provides an acceptable approximation of the inverse convolution for the given linear point-spread function.
Correct PSF	The behavior of d_j is equivalent to convolution with the given point-spread functions.
Low Response Approximation	Frequencies to which d_j has low response need not be accurately reconstructed in the program output.
Convergence	The Irani-Peleg technique converges to the desired output for the given input sequence. This condition is proven for special cases in the source paper.

Rendering Predicates by Renderer

For each renderer, the following table gives a collection of rendering predicates that should result in rendered output being an acceptable approximation of T. Note that renderers may produce acceptable output even when these predicates are not satisfied. Justification for the entries in this table should appear in the detailed descriptions; if this is not the case, then the values given should be considered unreliable.

M = Merging
D = Drizzling
U = USM Renderer after merging
V = USM Renderer after drizzling
I = Irani-Peleg Iterative Image Reconstruction

M D U V I
Alignment X X X X X
Translation X X X
Point sampling X X
Box Filter Approximation X X X X
Jittering Assumption 1 X X X X
Jittering Assumption 2 X X X X
Small radius X X
Bartlett filter approximation X X
Linear PSF X X
USM approximation X X
Correct PSF X X X
Low Response Approximation X X X X X
Convergence X

	M	D	U	V	I
Alignment	X	X	X	X	X
Translation	X		X	X
Point sampling	X	X
Box Filter Approximation	X	X	X	X
Jittering Assumption 1	X	X	X	X
Jittering Assumption 2	X	X	X	X
Small radius		X		X
Bartlett filter approximation	X		X
Linear PSF			X	X
USM approximation			X	X
Correct PSF			X	X	X
Low Response Approximation	X	X	X	X	X
Convergence					X

Space Complexity

Image storage space in memory for all renderers without extension is O(1) in the number of input frames and O(n) in the number of pixels per input frame. The worst-case image storage space in memory for all renderers with extension is O(n) in the size of program input.

Alignment Algorithm

Details on the alignment algorithm used in ALE are here.

Program Structure

First, a merging renderer is instantiated. Then, program flags are used to determine what other renderers should be instantiated.

An iterative loop supplies to the renderers each of the frames in sequence, beginning with the original frame. The drizzling and merging renderers are incremental renderers, and immediately update their renderings with each new frame, while the USM and Irani-Peleg renderers do not act until the final frame has been received.

In the case of the incremental renderers, the original frame is used without transformation, and each supplemental frame is transformed according to the results of the alignment algorithm, which aligns each new frame with the current rendering of the merging renderer.

Once all frames have been aligned and merged, non-incremental renderers produce renderings based on input frames, alignment information, and the output of other renderers.

Examples

Sections below outline examples of cases in which output images satisfy various criteria.

For a projective input frame sequence { P_k = (Σ, q, d_k) }, these examples use the shorthand

I = composite(Σ, q)

and

D_k = d_k(I).

Reduced Noise

Assume a projective input frame sequence { P_k } of K frames, defined by n-tuples { P_k = (Σ, q, d_k) }, such that (∃a > 0) (∀x ∈ Domain[q]) (∀y ∈ {R, G, B}) (I(x)_y > a/2) and, for one such a, (∀x) (∀y) (D_k(x)_y = I(x)_y + n_k,x,y), where n_k,x,y are i.i.d. additive noise terms having uniform distribution over the interval (-a/2, a/2).

In this case, rendering input frames by merging or drizzling averages the additive noise terms of the input images to produce an output additive noise term; when K ≥ 2, this output noise has a smaller expected absolute value than that of any given input noise term.

To prove this, first observe that rendered output image O (for drizzling or merging) averages the K inputs with equal weight:

O(x)_y = ∑_k (D_k(x)_y / K)

Substituting for D_k:

O(x)_y = ∑_k [(I(x)_y + n_k,x,y) / K]

Moving the constant outside of the sum:

O(x)_y = (I(x)_y / K) * K + ∑_k (n_k,x,y / K)

O(x)_y = I(x)_y + ∑_k (n_k,x,y / K)

The last term in the equation above is the noise term for the output image. Leaving location and channel implicit, the expected absolute value for this term is:

E_out = E[|∑_k (n_k / K)|]

Since there is nonzero probability that both strictly negative and strictly positive terms appear in the sum, this gives rise to the strict inequality:

E_out < E[∑_k |n_k / K|]

By linearity of expectation, this is equivalent to:

E_out < ∑_k E[|n_k / K|]

E_out < ∑_k E[|n_k|] / K

Since the input noise distributions are identical, this reduces to:

E_out < E_in * K / K

E_out < E_in

where E_in = E[|n_k|] for any choice of k.

Reduced Aliasing

Assume that an image has been sampled at the highest frequency to which the sensors respond (i.e., half of the Nyquist frequency), resulting in aliasing, and that four such images are available, dithered (i.e., in this case, translated) so that the collection of samples from all images forms a regular grid with twice the sampling rate of the original image. Rendering these four images with correct alignment into an output image having the same dimensions as this grid results in an image with no aliased frequencies.

Increased Tonal Resolution

Assume a projective input frame sequence { P_k } of K frames, defined by n-tuples { P_k = (Σ, q, d_k) }, such that components of I assume a real value in the interval [0, 1] and (∀ x, y, k) (Probability[D_k(x)_y = 1] = 1 - Probability[D_k(x)_y = 0] = I(x)_y]).

Since the transformations of all frames are identical, rendering with merging or drizzling averages the values for each pixel D_k(x) over all values of k. Since the expected value of D_k(x) is equal to I(x), the expected value of the corresponding pixel in the rendered output is always equal to I(x). However, the probability of obtaining a value within a small neighborhood of I(x) is generally not high for small numbers of frames.

As the number of frames increases, however, the probability of obtaining a value within a neighborhood of any given size approaches 1. Hence, arbitrarily high tonal resolution can be achieved with arbitrarily small (but non-zero) likelihood of error. The exact probability can be determined according to the binomial distribution.

(The above constitutes an outline of a proof, but there may be holes in it.)

Increased Spatial Resolution

The reduced aliasing example, above, serves also as an example of increased spatial resolution.

Increased Spatial Extents

Assume a projective input frame sequence such that each snapshot can be associated with a point in the scene invisible from that snapshot, but visible from some other snapshot.

In this case, for every snapshot in the input sequence, the output rendering will include at least one point not present in that snapshot.

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.