next up previous
Next: 5.2 Zone segmentation when data uniqueness Up: 5. Computing the topological zone segmentation Previous: 5. Computing the topological zone segmentation

5.1 Computing zones when data uniqueness is satisfied

The algorithm proceeds by making several passes. We are most concerned with limiting the I/O performed and limiting the number of data readings that must be in memory simultaneously. We will assume that cell edge adjacency information for irregularly gridded data is stored on the disk. The algorithm proceeds by assigning labels to data points. Now the zone files will include both data points inside the zones and data points that are adjacent to the zone, i.e., where the zone boundary crosses a cell edge containing the point. Zones are continuous geometric objects and we need all cells through which a zone boundary passes to be included in a zone file. This is important; it is required for our applications, and is one reason why the data points on a superarc of an augmented contour tree [4] will not be sufficient for our purposes. Note also that some cells will be placed in multiple zones, so that our segmentation is not a true partition of the data.

Phase 1 : One passes the data and identifies the critical points, and labels each by its own coordinates, i.e., assigns l(p) = p, and creates a file for p. One places each critical point on a max-heap (ordered by value, with priority given to critical points to break ties). Critical disambiguation points are regarded as adjacent to the Highs in the 4-hit face.

Phase 2 : One then iterates the following step until a criticality is selected. Remove a maximally valued r (with label l(r) = p) from the heap. Examine the neighbors of r. If a neighbor q of r is unlabeled then assign l(q) = l(r) (label q by p) and place q on the heap. The label of r is now finalized and r is output to the file for criticality p. When a critical point q is removed from the the heap we examine the neighbors of q, and for each neighbor r of q that is labeled by a different criticality $p \neq q$, so that l(r) = p, make q the parent of p. At this point all the points on the heap with label p are relabeled by q, a copy of each is written to the file for p, and the algorithm iterates as above (see discussion below for efficient relabeling).

Phase 3 : One divides each zone file into its connected components. This can be done by employing sparse matrix technology and using either depth first search or the well known set union-find data structure. During this process, one insures that all data points for each cell that crosses a zone boundary are included (by checking the appropriate cell edge for each of the relabeled boundary points). Pointers are created from the node in the criticality tree to the zone components.

Observe that the algorithm labels all data readings and criticalities, by a highest valued first labeling, through the entire data set. At any point in time when precisely all points greater than or equal to a particular have been labeled, these are precisely the data points that comprise components of . Moreover, when phase 2 terminates, the data points labeled by p are precisely the data points contained in , and both these and the data points adjacent to are in the zone file for p. When the algorithm terminates, a zone component file will contain all cells that intersect the zone component.

Note that it is not required to presort the data set as in [6]. The most expensive step from phase 2 is the relabelling of the boundaries of a zone once it has been completed. Similiar steps (either contour merging or join component merging) are performed in [4], [5]. We can instead relabel as the points are later removed from the heap, by tracing the parent pointers of the label in the partially built criticality tree (forest) to a root node, and the data point is copied to the zone files of each node on this path (these points straddle several actual zone boundaries). Now we could have used the set union-find data structure to maintain the zone components during phase 2 (as is done with join components in [4]), but we chose to compute the components in a separate pass (phase 3) so that only a single zone needs to be maintained as a collection of sets in memory at any one time. In phase 2, data points are output once they are removed from the heap, so that the number of of nodes in core is proportional to the contour size. We can further reduce the memory by sorting the criticaliites and only labeling one zone at a time.


next up previous
Next: 5.2 Zone segmentation when data uniqueness Up: 5. Computing the topological zone segmentation Previous: 5. Computing the topological zone segmentation
Dr. Jim Cox
1999-12-14