Extraction of features for matching
Extraction of features was applied after pre-processing of the images to detect match points. Extracted match features should be as dense as possible, thus making the blunder detection and correction easier, reducing propagation of mismatches to the lower levels. Our consideration was to develop or to improve existing algorithms in such a way that we would gain in speed and robustness. Since each image could be kilometres long, which corresponds to more than 1 Gbyte, processing time should be short and memory should be efficiently managed. We performed several tests with existing operators, like Foerstner and Harris (used for extracting points), and Canny and Susan (used for extracting edges), and developed an edge extractor, which performs a radiometric analysis of the image content, calculates statistical measures and thresholds depending on a user defined value. A comparison of the two point detectors, showed similar results, with Foerstner permitting a feature localisation, while Harris was by 50% faster. Results showed that Canny and the in-house developed algorithm performed better than the other algorithms in terms of processing time and amount of match points. Compared to Canny, the in-house method needs one third of the processing time and extracted edges have width more than one pixel, allowing a better modelling of discontinuities. During edge extraction additional information on edges, like orientation and sign, are derived and can be used in further matching stages. The edge extraction can be combined with the derivation of the image types, mentioned at the end of Sec. 3.1. All algorithms were modified to work also on 14-bit data, so that the full image information is used during processing. Figure 2 shows the extracted points with the Harris operator and the extracted edges with the Canny operator. The feature extraction is either performed in all channels used in matching or only in the template image (if the grey value images are used and not the other image types).
Figure 2. Left: extracted points with Harris operator. Right: Extracted edges with Canny operator.
Approximate values and use of doublets in image pyramids
The concept of image pyramids is utilised in the algorithm, so that parallaxes become small and matching is applied in small search ranges. Level 1 images have small y-parallaxes, e.g. less than 100 pixels maximum. For Level 0 images, parallaxes may be larger although the known sensor orientation is taken into account and offsets are calculated depending on the viewing angles of the channels that are used. Image pyramids have the main disadvantage that texture may disappear in the upper levels. Therefore, we avoid using the commonly used of average filter for pyramid generation and we apply optimal filters that permaintain features as much as possible in the upper levels (Baltsavias, 1991). In addition, images are enhanced in each level and are radiometrically balanced using the Wallis filter.
Most matching methods employ a time consuming interpolation to pass coarse matching results to the lower levels. Therefore, the doublet strategy is being used. This strategy aims at reducing processing time and interpolation of match results from one pyramid level to the other, thus also restricting propagation of match errors . Doublets are consisting of 2 consecutive pyramid levels. Extraction of features is performed in the lower level of doublets and these features are transferred one level up, and kept only if on this level an extracted feature (e.g. edge) also exists. Then, matching in all 2 levels from top to bottom is performed. Thus, interpolation and propagation of errors to neighbouring points is avoided. For example, when 6 pyramid levels are used, instead of interpolating 5 times between the different levels, interpolation is applied between the three defined doublets, performing in total 2 interpolations.
Multi-patch matching, use of geometric constraints and combination of multiple channels
In matching, a multi-patch approach is adopted where at each match point 3 passes of matching are performed with different parameters (such as search range and patch dimensions). The use of multi-patch approach can be justified since the larger patches aim at reliability of a coarse solution and the smaller ones at accuracy. The large patch is less sensitive to noise, occlusions, multiple solutions etc. while the small one is more accurate and better preserves height discontinuities. Furthermore, the matching results for each of the 3 passes can be compared and used in the quality control for error detection.
Constraints may also be used in matching. First, the behaviour of epipolar lines in rectified images has been investigated. It is assumed that a feature is defined in the so-called template image and the corresponding features are searched for in the remaining images, called patch images. Epipolar lines do not really exist since each line composes an image with its own position and attitude. But by projecting a point of the template image to different heights and back projecting onto the patch image the trajectory of the corresponding point and the epipolar curve may be defined. For rectified images the epipolar curve, if it is not too long, can be approximated by a line. Two different approaches have been adopted for geometrically constrained matching. The first method intersects the template image ray with height planes with a step of 1 m, back projects these object point to each patch image and fits a straight line to these points.. The second method finds an approximate height by an initial forward intersection (pairwise, using the template and each patch image), the search range in image space is transformed to height search range on the ray of the template and the points that are selected for matching are back projections of 3D points along this ray, within the height search range, and with a height step that corresponds to one pixel step in image space. Both methods give similar results for the matched points, but the second method is faster than the first one.
In the tests up to now, 3 PAN lines have been used. As template image the one with the least possible occlusions, compared to the remaining ones i.e. in this case the nadir, has been used. The forward and backward channels can be matched either separately or simultaneously with the nadir. In the first case the matching is performed for backward and nadir and the extracted 3D points are back projected onto the forward channel and used as initials approximations for the matching. In the second case, backward and forward are matched simultaneously. In both cases a quality control procedure checks each ray, and the final 3D point coordinates are calculated by forward intersection using only the good rays. We are currently investigating the integration of red or green and near-infrared channels in the matching process. The RGB channels and the NIR are positioned on the focal plane on either side of the nadir, and thus can be combined pairwise with the backward and forward channels to model surfaces that are occluded in the other channels. Among the RGB channels, red or green are preferable due to the better sensitivity of the CCD in these spectral ranges.