ALE
Image Processing Software

Deblurring, Anti-aliasing, and Superresolution.

Local Operation

localhost
5393119533

ALE Technical Description

Abstract

ALE combines a series of input frames into a single output image possibly having:

Reduced noise.
Reduced aliasing.
Increased dynamic range.
Increased spatial resolution.
Increased spatial extents.

This page provides information on related work, models of program input, an outline of renderers, and an overview of the algorithm used in ALE.

Note: This document uses PNGs and HTML 4 character entities.

Related Work

ALE derives one of its rendering techniques from a method developed by Richard Hook and Andrew Fruchter for combining dithered images.

Steve Mann's work in Video Orbits on increased spatial extents and the use of projective transformations has influenced features incorporated by ALE.

ALE incorporates an iterative solver based on the work of Michal Irani and Shmuel Peleg on image reconstruction.

Models of Program Input

Definition of Discrete and Continuous Images

Using R⁺ to represent the non-negative real numbers, a discrete image D of size (d₁, d₂) is a function

D: {0, 1, …, d₁ - 1}×{0, 1, …, d₂ - 1} → R⁺×R⁺×R⁺

A continuous image I of size (c₁, c₂) is a function

I: [0, c₁]×[0, c₂] → R⁺×R⁺×R⁺

In this document, a member of the set R⁺×R⁺×R⁺ is sometimes called an RGB triple.

Definition of a Camera Snapshot

A camera snapshot is defined as an n-tuple consisting of:

A scene S.
A pyramid R with rectangular base.
A continuous image I.
A discrete image D.
A function i such that i(S, R) = I.
A function d such that d(I) = D.

S represents a physical scene.

R represents the viewing volume of a physical camera.

The value I(x, y) is the RGB triple representing the radiance that would be recorded from S by a directional light sensor located at the apex of R and aimed at the point (x, y) on the base of R. The only constraint on the sensor is that, given a fixed scene S, it must return a unique value for a given position and orientation. This sensor is assumed to be the same for all camera snapshots, and is called the canonical sensor.

Positioning of the canonical sensor

D represents the discrete pixel values reported by the camera.

The composite function composite(d, i) represents the optical and electronic properties of the camera.

Definition of a Camera Input Frame Sequence

For positive integer N, a sequence of camera snapshots { C₁, C₂, …, C_N }, defined by the n-tuples { C_j = (S_j, R_j, I_j, D_j, i_j, d_j) } is a camera input frame sequence if, for all j and j', S_j = S_j' and i_j = i_j'.

Definition of a Diffuse Surface

Given a camera input frame sequence { C₁, C₂, …, C_N }, defined by the n-tuples { C_j = (S, R_j, I_j, D_j, i, d_j) }, a surface in S is diffuse if the radiance of each point on the surface (as measured by the canonical sensor) is the same for all views R_j from which the point is visible.

Definition of the Extended Pyramid

If the view pyramids { R₁, R₂, …, R_N } of a sequence of N camera input frames all share a common apex and can be enclosed in a single rectangular-base pyramid R sharing the same apex and having base edges parallel to the base edges of R₁, then the smallest such R is the extended pyramid. Otherwise, the extended pyramid is undefined.

If a camera input frame sequence has an extended pyramid R, then an extended image is defined from R in a manner analogous to the definition of the image I from the view pyramid R in the definition of a camera snapshot.

Definition of a Projective Snapshot

A projective snapshot is defined as an n-tuple consisting of:

A continuous image Σ.
A continuous image I.
A discrete image D.
A projective transformation q such that I = composite(Σ, q)
A function d such that d(I) = D.

Σ represents the subject of the snapshot (somewhat analogous to S in the camera snapshot).

D represents discrete pixel values reported by the physical imaging device.

Definition of a Projective Input Frame Sequence

For positive integer N, a sequence of projective snapshots { P₁, P₂, …, P_N }, defined by the n-tuples { P_j = (Σ_j, I_j, D_j, q_j, d_j) } is a projective input frame sequence if, for all j and j', Σ_j = Σ_j'.

The first frame in the sequence of input frames is called the original frame, and subsequent frames supplemental frames.

Construction of Projective Input Frame Sequences from Camera Input Frame Sequences

From a camera input frame sequence, define a continuous image Σ as follows:

If an extended pyramid is defined for the set of camera input frames, then Σ is the associated extended image.
If an extended pyramid would be defined if all pyramids R_j were translated to share a common apex, and the scene S represents a physical configuration presenting to the camera only a single planar, diffuse surface, then define Σ so that there exists some projective transformation p such that Σ(p(x)) indicates the radiance at point x on the surface.

If such a Σ exists, then each camera input frame

C_j = (S, R_j, I_j, D_j, i, d_j)

admits a projective input frame

P_j = (Σ, I_j, D_j, q_j, d_j)

for some q_j, and these { P_j } form a projective input frame sequence.

Definition of a Projective Renderer without Extension

For a projective input frame sequence { P_j = (Σ, I_j, D_j, q_j, d_j) }, a projective renderer without extension is an algorithm that outputs a discrete image approximation of I₁. The assumptions used in calculating the approximation vary across rendering methods.

Definition of a Projective Renderer with Extension

For a projective input frame sequence { P_j = (Σ, I_j, D_j, q_j, d_j) }, a projective rendering method with extension is an algorithm that outputs a discrete image approximation of Σ. The assumptions used in calculating the approximation vary across rendering methods.

Renderers

Extension

All renderers can be used with or without extension (according to whether the --extend flag is used). The target image for approximation (either Σ or I₁) is generically called T.

Renderer Types

Renderers can be of incremental or non-incremental type. Incremental renderers update the rendering as each new frame is loaded, while non-incremental renderers update the rendering only after all frames have been loaded.

Incremental renderers contain two data structures that are updated with each new frame: an accumulated image A with elements A_{x, y} and the associated weight array W with elements W_{x, y}. The accumulated image stores the current rendering result, while the weight array stores information about contributions to each accumulated image pixel.

Renderer Details

These pages offer detailed descriptions of renderers.

Incremental Renderers

Non-incremental Renderers

Rendering Predicates

Renderers should output approximations of T when certain predicates are satisfied. Not all of these predicates are required for all renderers, and renderers may produce acceptable output even when their predicates are not satisfied.

Predicate Explanation
Alignment The projective input frame transformations q_j are known.
Translation All projective input frame transformations q_j are translations.
Point sampling with simple optics d_j assigns D_j(x) = I_j(x).
Very large, uniform input sequence A large number of input frames are provided, uniformly sampling the domain of T.
Small radius The radius parameter used with the rendering method is chosen to be sufficiently small.
Barlett filter approximation Convolution of T with a Bartlett filter remains an acceptable approximation of T.
USM approximation Applying the unsharp mask employed by the ALE --hf-enhance option to the output of drizzling or merging produces an acceptable approximation of T.
Correct Projection Filter The projection filter used in Irani-Peleg rendering approximates d_j.
Low Response Approximation Frequencies having low response in the Fourier domain representations of d_j need not be accurately reconstructed in the Fourier domain representation of program output.
Convergence Iterating Irani-Peleg on the input frames will eventually produce an acceptable approximation of T, and the number of iterations chosen is adequate to achieve this. This predicate may entail the very large, uniform input sequence predicate.

Predicate	Explanation
Alignment	The projective input frame transformations q_j are known.
Translation	All projective input frame transformations q_j are translations.
Point sampling with simple optics	d_j assigns D_j(x) = I_j(x).
Very large, uniform input sequence	A large number of input frames are provided, uniformly sampling the domain of T.
Small radius	The radius parameter used with the rendering method is chosen to be sufficiently small.
Barlett filter approximation	Convolution of T with a Bartlett filter remains an acceptable approximation of T.
USM approximation	Applying the unsharp mask employed by the ALE --hf-enhance option to the output of drizzling or merging produces an acceptable approximation of T.
Correct Projection Filter	The projection filter used in Irani-Peleg rendering approximates d_j.
Low Response Approximation	Frequencies having low response in the Fourier domain representations of d_j need not be accurately reconstructed in the Fourier domain representation of program output.
Convergence	Iterating Irani-Peleg on the input frames will eventually produce an acceptable approximation of T, and the number of iterations chosen is adequate to achieve this. This predicate may entail the very large, uniform input sequence predicate.

Summary of Rendering Predicates by Renderer

The following table indicates which rendering predicates are associated with each renderer. Note that renderers may produce acceptable output even when these predicates are not satisfied. Justification for non-obvious entries in this table should appear in the detailed descriptions; for entries where this is not the case, the value given should be considered unreliable.

M = Merging
D = Drizzling
H = High-frequency Enhancement
I = Irani-Peleg Iterative Image Reconstruction

M D H I
Alignment X X X
Translation X
Point sampling with simple optics X X
Very large, uniform input sequence X X
Small radius X
Barlett filter approximation X
USM approximation X
Correct Projection Filter X
Low Response Approximation X X X X
Convergence X

	M	D	H	I
Alignment	X	X		X
Translation	X
Point sampling with simple optics	X	X
Very large, uniform input sequence	X	X
Small radius		X
Barlett filter approximation	X
USM approximation			X
Correct Projection Filter				X
Low Response Approximation	X	X	X	X
Convergence				X

Space Complexity

Image storage space in memory for all renderers without extension is O(1) in the number of input frames and O(n) in the number of pixels per input frame. The worst-case image storage space in memory for all renderers with extension is O(n) in the size of program input.

Algorithm

First, a merging renderer is instantiated. Then, program flags are used to determine what other renderers should be instantiated.

An iterative loop supplies to the renderers each of the frames in sequence, beginning with the original frame. The drizzling and merging renderers are incremental renderers, and immediately update their renderings with each new frame, while the high-frequency enhancement and Irani-Peleg renderers do not act until the final frame has been received.

In the case of the incremental renderers, the original frame is used without transformation, and each supplemental frame is transformed according to the results of the alignment algorithm, which aligns each new frame with the current rendering of the merging renderer.

Once all frames have been aligned and merged, non-incremental renderers produce renderings based on input frames, alignment information, and the output of other renderers.

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.