The work reported in this thesis has sought to determine if a scale-specific notation for location can benefit GIS and cartography. After discussing some of the problems caused by relying on standard coordinate notations, the quaternary triangular mesh (QTM), a hierarchical coordinate system based on a planetary polyhedral tessellation is presented. This model encodes terrestrial locations as leaf nodes in a forest of eight triangular quadtrees, each of which can have up to 29 levels of detail (doublings of scale). Its multi-resolution properties are described in relation to common planar and spherical coordinate systems used for geographic location coding. One of QTM’s advantages derives from its inherent ability to document the locational accuracy of every point in a spatial dataset. Another is its ability to retrieve such data across a range of spatial resolutions, enabling users to evaluate the fitness of map data for use at particular scales. In addition, the tessellation’s spatial partitions can be exploited to identify regions where crowding of features occurs at certain scales. After describing this system’s origins and comparing it to related data models, techniques for encoding and decoding locations, accuracy assessment, spatial conflict detection and other operations are presented. The focus then moves to the use of QTM to generalize digital map data for display at smaller scales. After a brief overview of current approaches to generalization, a strategy for using QTM for line simplification and conflict detection is outlined. A basic strategy for using QTM for line simplification was developed that is embodied in an algorithm having eight control parameters. This method is capable of using embedded spatial metadata to guide selection of retained points, and a way to encode such metadata is described; exploitation of point-specific metadata was seen as one of the more successful and promising aspects of the strategy. A series of empirical tests of this algorithm using several island shoreline datasets is described and illustrated. Outcomes of these experiments are compared with results produced by an existing, widely-used line simplification algorithm. Similarities and differences in results are discussed, attempting to identify reasons for failure or success of either method. The thesis concludes with an evaluation of QTM’s potential contributions to this application area, along with a description of some of the current challenges and limitations to spatial data handling that QTM may be able to address. Additional overviews of contents may be found by consulting Scope of Work in chapter 1, as well as the summaries at the end of each chapter.
Zusammenfassung
Die vorliegende Arbeit geht der Frage nach, ob eine neue, massstabsunabhängige Notation für Positionen (Koordinaten) in Geographischen Informationssystemen (GIS) und in der Kartographie von Vorteil sein könnte. Zunächst werden einige der Probleme herkömmlicher Korrdinatennotationen untersucht, um dann das "Quaternary Triangular Mesh (QTM)" als Alternative zur Beschreibung und Speicherung räumlicher Daten einzuführen. QTM ist ein hierarchisches Koordinatensystem, das auf einer planetaren polyedrischen Tessellation basiert. Ausgehend von einem Oktaeder, beschreibt es Positionen auf der Erdkugel als Blätter in einem Wald von acht Dreiecks-Quadtrees (entsprechend den acht Oktaederflächen). Jeder Quadtree kann bis zu 29 Auflösungsstufen umfassen (je Stufe verdoppelt sich der Massstab). Die besonderen Eigenschaften dieser hierarchischen Datenstruktur, insbesondere die Möglichkeit mehrstufige Auflösungen innerhalb einer Datenstruktur zu repräsentieren, werden in Relation zu herkömmlichen planaren und sphärischen Koordinatensystemen beschrieben. Ein wesentlicher Vorteil dieses Datenmodells ist, durch seine Struktur die Genauigkeit eines jeden Datenpunktes beschreiben zu können; ein weiterer, dass man Daten für verschiedene Auflösungen extrahieren und so auf einfache Weise die Tauglichkeit von Daten für verschiedene Kartenmassstäbe evaluieren kann. Ausserdem eignet sich die mehrstufige Raumaufteilung gut, um Gebiete mit zuviel Detail in einem bestimmten Massstabsbereich zu erkennen. Nach der Diskussion der Ursprünge des QTM-Modells und dem Vergleich mit verwandten Systemen beschreibt die Arbeit mehrere Algorithmen zur Kodierung und Dekodierung von Positionen, Genauigkeitsabschätzung, Ermittlung räumlicher Konflikte (für die kartographische Generalisierung), sowie weitere Basisoperationen. Danach wendet sich die Arbeit vornehmlich der Verwendung von QTM zur digitalen Generalisierung kartographischer Daten zu. Nach einer kurzen Diskussion bestehender Ansätze für die Generalisierung wird eine Strategie für die Verwendung von QTM für die Linienvereinfachung und für die Erkennung potentieller Konflikte zwischen kartographischen Objekten vorgestellt. Unter anderem wurde eine Methode für die QTM-basierte Linienvereinfachung entwickelt, die zu einem Algorithmus geführt hat, der von acht Parametern gesteuert wird. Dieser Generalisierungsalgorithmus kann durch Metadaten, die im QTM abgelegt werden, gezielt gesteuert werden und so sicherstellen, dass wesentliche Punkte einer Linie erhalten bleiben. Eine Methode für die Extraktion von Metadaten zur Beschreibung der Sinuosität kartographischer Linien und deren Kodierung im QTM wird ebenfalls beschrieben. Die Verwendung von punktbezogenen Metadaten zur Steuerung der Generalisierungsalgorithmen hat sich als sehr vorteilhafter und vielversprechender Aspekt der vorliegenden Strategie erwiesen. Dies wird auch durch eine Reihe empirischer Tests erhellt, die anhand verschiedener Datensätze von Küstenlinien durchgeführt wurden. Die Ergebnisse dieser Untersuchungen werden einem herkömmlichen, weit verbreiteten Linienvereinfachungsalgorithmus gegenübergestellt. Ähnlichkeiten und Unterschiede in den Resultaten der zwei Ansätze werden diskutiert, um so die Gründe für Erfolg oder Misserfolg der jeweiligen Methode zu identifizieren. Die Arbeit schliesst mit einer Diskussion konkreter Anwendungsbereiche für QTM und seinem potentiellen Beitrag zur Lösung aktueller Probleme Geographischer Informationssysteme und in der Kartographie. Weitere Hinweise über den Inhalt der vorliegenden Arbeit können dem Kapitel 1 (Abschnitt Scope of Work) sowie den Kurzzusammenfassungen am Ende jedes Kapitels entnommen werden.
Preface
It is not uncommon for a student to take considerable time to perform doctoral-level research and write a Ph.D. dissertation. I have seen people struggle for five or more years to get through this frequently excruciating process. And when the work drags on a few more years than that, something has usually gone bad, and it becomes less and less likely that the task will ever be completed. I consider myself lucky in this regard: my Ph.D. project has its origins in work begun fifteen years ago, in what now seems like a different epoch of history. Yet here I am, polishing my dissertation, not bored, discouraged or worn down; rather, I am still intrigued with the work, and wish I had more time to improve on it. Two years is scarcely enough to complete a project of this sort, but I feel I have been able to use that time well, and have been able to check a fair number of items off my research agenda. Perhaps that is because I have mused about my topic for enough time, made enough notes, wrote enough code, and published enough papers to keep the ball in the air, even though I sometimes took my eye off it or temporarily lost interest in the game.
Working on a Ph.D. can be lonely, frustrating, discouraging. The main reason for my “luck” in being able to bring this project to some sort of closure is my connection to vital communities of extraordinary people, who if they did not actively encourage and support my efforts, gracefully tolerated my obscure and idiosyncratic visions with good humor, and possibly even respect. And during the past two years in Switzerland, I have been able to work steadily and uninterrupted, refreshed by periodic holidays with my family back in America. The unconditional support of my wife, Aygül Balcioglu Dutton, and my mother, Sophie Dutton have nourished my spirit throughout my expatriation. I love — and owe — both of you so much.
There are so many others who have helped, in many ways. While this work has been a very personal odyssey, others have collaborated with me from time to time and supported the project. I am especially grateful to the National Center for Geographic Information and Analysis (NCGIA) for encouraging this work, by including me in specialist meetings for Research Initiatives 1, 3, 7 and 8, which prompted me to explore interesting avenues related to this project and resulted in several publications. In particular, I wish to thank Michael Goodchild (Santa Barbara), Barbara Buttenfield (Buffalo, now at the University of Colorado), and Kate Beard (Orono) for their efforts. It was Mike who encouraged Yang Shiren and others to develop the first working implementation of my model, brought us together to work on it, and reported on our progress in several venues. Dr. Shiren was a fountainhead of clever, practical ideas for implementing my data model, and I am highly appreciative for his collaboration. The resulting software prototypes demonstrated to me possibilities I had not considered, and convinced me of the essential value and practicality of these concepts. I am equally indebted to Barbara Buttenfield for inviting me to NCGIA Buffalo on several occasions as a visiting scholar; there the central concepts that led to this project were distilled and algorithms were first developed for map generalization, some of which remain at the core of the current project.
All of the above would have come to naught had it not been for my professors, Kurt Brassel and Robert Weibel. It took a while (three years, actually) to come together, and at times it seemed hopeless (we learned that Bern does not give student visas to middle-aged grad students), but they did succeed in orchestrating the logistics and paperwork for what I call my sabbatical, and have been supportive and encouraging to me all the way. While in residence, my principal advisor has been Dr. Weibel, whose keen and detailed criticism and useful suggestions have helped to keep me on the right track. I am also grateful to the Swiss National Science Foundation for financial support for this research, through grants 2000-037649.93 and 2100-43502.95 to the Department of Geography. I am indebted to my colleague Frank Brazile as well; his cheerful skepticism prevented me from getting carried away too far with each new putative insight, and he also has brought to my attention tools that have been useful and could become more so should this project continue. One he used to compute some of the spatial orderings presented in appendix E which were critical to the interpretation of that material.
Despite the disgrace of my pathetic command of the German language, I found that department to be a friendly atmosphere and close to an ideal environment in which to work: I was provided with a workstation, a computer account, access to libraries and software packages, and basically left alone to follow my research agenda. There has been no coursework or exams to distract me, just occasional interesting colloquia, a few administrative chores and the task at hand. Having seized this opportunity, I hope I that this and other results of my efforts justify my advisors' and my family's faith in my capabilities.
Still, I have not managed to accomplish everything I had hoped for when this project started two years ago. My goals then — and now — included developing a more comprehensive set of tools to solve contextual map generalization problems, and while a good amount of conceptual progress was made toward this end, its implementation is far from complete, indeed barely begun. But this has been a work in progress for close to fifteen years now, and there is no reason for me to expect a sudden dénouement. It is my fond hope that others will find inspiration in my work, and will take up the challenge of making it more operational and useful in an increasing variety of contexts. If I can help any of these efforts move forward, it would be my pleasure to collaborate.
Forward
Think global, act loco.
— Zerbina
I was born as the Allies were mopping up the Axis powers, at the leading edge of the baby boom. My awareness of the world was kindled in the early 1950’s, and some of my earliest memories include images of freezing GI’s in Korea drifting across snowy television sets in quiet parlors. I can still hear my paternal grandfather lecturing us at the dinner table on the danger of reds in the State Department and the schools and the complicity of “one-world” liberals in the ongoing, insidious Soviet-orchestrated subversion of western civilization. Even then, I had a problem with the implicit assertion that my species occupied more than a single planet, or possibly might be sharing it with one or more alien races. What was so politically incorrect about acknowledging that we’re all in this lifeboat together?
Woodstock and Vietnam happened when I was in my twenties, and I knew which was for me. At the same time as Nixon was socking it to us, the first snapshots of our planet blew me away, the Whole Earth Catalog forged a community of enviro-nerds, and Gaia emerged from the mists of myth. I read promiscuously — geography texts, systems theory, science fiction, futurist speculation, Buckminster Fuller, Teilhard de Chardin — and started writing a novel in which Nixon’s enemies were forcibly dematerialized, teleported into a highly-connected electronic limbo which eventually came to be called the internet. So clear was it to me that we had and were abusing one world — why wasn’t it everywhere obvious? Why did our “leaders” identify the more compassionate and articulate social critics and communitarians as arch-enemies of civilization, saboteurs of progress? The fifties still had us by the throat, and thirty years later not much has changed except for the media and the spin.
Sometime during the seventies, while laboring in a lab at Harvard, I was seduced by a time-shared computer, and became a captive geek of computer cartography and spatial information technology. Fanning out from earlier studies with Warntz in macrogeography and spatial analysis, my interest was drawn to thematic mapping, and from there on to terrain modeling, which led to working on GIS data modeling in general, with coffee, Mandelbrot and a fractal practicum along the way. By the early 80’s I was envisioning planetary data models for terrain, and had yet to seriously contemplate map generalization (although I had created algorithms for its inverse, enhancement). In that decade, a number of my Harvard colleagues went on to achieve commercial and academic success. Eventually I too came to be regarded as a high priest of GIS-esoterica. But, by failing to trumpet my limited knowledge as holy gospel, I committed the greatest possible American sin — failing to cash in. In those days, I was motivated by the romantic concept that my (brilliant) ideas were what really mattered, not how much money I made. What a jerk I was.
When I started pursuing the line of inquiry reported here, Ronald Reagan was just learning his lines as Acting President. Before long, I sensed I was about to be let go from Harvard, and was developing morning-in-America-sickness in a big way. Peering through the corrupt corporatist smog of the 1980s, I could see that some of the technologies that America, Inc. was drably deploying to dominate global affairs could also be instruments of world-wide liberation. It seemed logical to me that a global perspective was the best antidote to the societal poison of self-interest, and I set out to implement this hope as a data model for all the world to see and use. Thus began a spatial obsession that generated about a dozen papers, most of which were written while I had other things to do during the day. Almost all those publications focused on minutia of handling global geospatial data in hierarchical fashion, solving one little problem at a time. Hardly anyone but me cared what the answers were.
But I did — and do still — care, and perhaps because I kept plugging away at it have seen others look into these concepts, taking their own slants and following their own preoccupations, carrying the ball, at least for a while. For example, at least one major software vendor has installed my model within the bowels of software that manages data for a CD-ROM atlas. That is the sort of development I had hoped my ideas would lead to, along with credit of authorship and some royalties. But I guess you can’t have everything.
Geoffrey Dutton
Zürich, October 1997
1133013130312011310
Chapter 1 Introduction and Problem Statement
One real world is enough.
— Santayana
When a mapping agency updates information on topographic maps or navigational charts, usually more than one map is affected by a given change. Often, the largest (most detailed) scale maps are edited first, and the changes then transferred to the appropriate map at the next smaller scale, until either the smallest scale is reached or the edited feature becomes too small to represent. As a result of changes in the world — or interpretations of it — marks on maps may change their shape or symbology, relative prominence, locations and labels, according to organizational standards and procedures as well as expert judgement, also known as “cartographic license.” This tedious graphic editing process has been described as “working through the scales,” as it systematically propagates edits from greater to less detailed cartographic representations. It is one strategy of attacking problems in the art of map generalization, one which specialists from many nations are struggling to understand, formalize and automate.
Until about a decade ago, map generalization was practiced almost exclusively by trained cartographers, who learned it by tutelage, example and intuition, deliberately, even rigorously, but not necessarily formally. Manual cartographic generalization involves an unruly set of techniques and precepts that, traditionally, were mainly practiced in public agencies and map publishing houses, and each organization tended to have its own procedures, standards, guidelines and aesthetics for compiling maps in fulfilment of its mission. Only recently has this situation begun to change, but it has done so dramatically and definitively, as maps — along with every other form of human communication — have become digital. And at the same time, the media, data, tools and enterprises involved in compiling and communicating maps are rapidly becoming global. The author of this thesis, in describing, demonstrating and assessing a specific digital approach to working through scales, hopes to contribute to progress in both map generalization specifically and spatial data handling generally.
1.1 Motivation
While the foregoing describes its application focus, this thesis has a deeper, more general concern: improving the representation of geographic space — the world at large — in digital computers. The latter topic is necessary to consider because addressing the former one will achieve limited success without re-thinking how encoding conventions for geospatial data bias the development and constrain the results of methods used to generalize it. This may not appear obvious or even necessary to many who work with environmental, cartographic and other spatial datasets on a regular basis, as the software technology for handling them seems to be relatively mature and generally effective. Furthermore, a number of standards for specifying and exchanging such data are in use; these may differ in detail, but generally assume certain encoding conventions for basic map elements that are rarely questioned anymore. As a consequence, this thesis argues, progress in map generalization, spatial data interoperability and analytic applications has been hampered due to encountering a number of vexing difficulties, many of which stem from inadequate descriptions of both geographic phenomena and the space they inhabit.
1.2 Limitations to Detail and Accuracy of Spatial Data
We start by confronting a paradox: when geographic phenom-ena that are spatially continuous — such as terrain, temperature or rainfall — are numerically modeled, an inherently discontinuous data structure (regular sampling grids, or rasters, sometimes called fields)1 is often employed. Yet when the geometry of inherently discontinuous, linear or sharp-edged phenomena — such as administrative boundaries, land ownership, hydrography or transportation — is encoded, it tends to be done using a continuous data structure capable of representing the shapes and connections between entities and their neighbors in great detail (so-called vector-topological models). Some highly interpreted geospatial data, such as land use, soils, geologic structures, classified remote-sensing imagery and results of migration and accessibility models may be represented in either vector or raster format, although making either choice requires certain compromises.
If they have not worked in the raster domain, many users of geographic information systems (GIS) may not be aware that digital map data they use are not as definitive and concrete as they may appear to be. Those who work with raster-based GIS and image processing (IP) systems, which store and manipulate images and data grids, need little reminding of the finite resolution of their databases. Each grid cell and pixel in every dataset they handle covers a specific amount of territory on, above or under the surface of the earth, and cannot disclose details of variations within these tiny but specific domains. And raster data models — even multi-resolution schemes such as quadtrees — cannot hide the fact that only a certain amount of detail is stored, amounting to very many discrete (but not independent) observations of single-valued attributes.
In non-raster GIS environments — with which this project is almost exclusively concerned — the discrete nature of data and the limits of its precision and accuracy tend to be hidden, and may go unnoticed, but not without consequence. Systems that employ vector-topological data models actually introduce two types of discretization, illustrated in fig. 1.1:
1 Continuous contours of spatial entities are represented by finite, ordered sets of vertex locations — although sometimes, continuous functions (such as splines or elliptical arcs) may be exploited to represent certain types of curves. The accuracy of such data are mainly limited by sampling during their capture;
2 Based on how much computer storage is assigned to primitive data types (integers and real numbers, which in practice usually is either 32 or 64 bits), the precision of individual spatial coordinates (vertex locations) is always finite.
Figure 1.1: Two types of discretization errors, along with human line tracing error
1.2.1 Scale Limitations
The first type, which we will denote as scale-limited2 representation, caricatures lines and polygonizes regions by sampling networks and boundaries at small (and usually irregular) intervals. The size of intervals is mainly influenced by data capture technology and procedures and the scale and resolution of the graphic source material, which typically are aerial photographs or printed maps, captured as digital data (digitized) either automatically or manually. If assumptions are made regarding the smallest mark or object that a map or a display is able to represent, one may numerically relate spatial accuracy and resolution to map scale (Tobler 1988, Goodchild 1991a, Dutton 1996). The spatial data model explored in this thesis makes use of such equivalencies (see table 2.1).
GIS users and cartographers are well-conditioned by now to line segment approximations of map features, and have some understanding of their limitations and of consequences to merging or overlaying digital map data. GIS system designers have come up with a variety of solutions to the “sliver problem,” the generation of small, spurious shapes resulting from mismatches of line-work that represents the same features using slightly different geometric descriptions. Paradoxically, as Goodchild (1978) has shown, the more diligent and detailed is one’s digitization effort, the more troublesome this problem gets. No practical, robust GIS solutions are yet available for automatically changing the scale of digital map data, or combining data from maps having major differences in the scales at which their features were captured. This is partially due to the fact that GISs — even when metadata are available to them — do not formally model the information content of the data they handle, even the critical yet restricted aspects of data quality involving spatial resolution, accuracy and positional error. In a Ph.D. thesis concerned with handling scale- and resolution-limited GIS data, Bruegger proposes an approach to data integration in which:
... mapping a representation to a coarser resolution becomes well-defined since the source and target knowledge content are precisely known. This compares to current GIS practice where the knowledge content of a representation is only vaguely known to the user and totally inaccessible to machine interpretation. Figure [1.2] illustrates this with an example of two vector representations of the same coast line. What knowledge about the world do the representations really contain? For example, what is found in location p? The vague definition of the knowledge content is closely related to the problem of precisely defining what it means to transform a representation from one scale to another. (Bruegger 1994: 4)
Figure 1.2: Scale-limited representational problems (from Bruegger 1994: 4)
Bruegger’s goal was to develop a “format-independent” representation to enable conversion between formats (specifically, raster and vector) to take place without unnecessary loss of information. Such a notation could specify the positional uncertainty of the geometry contained in a dataset as metadata equally applicable to each type of representation. This, he asserted, would avoid “problems of comparing incompatible concepts such as raster resolution and vector ‘scale’” (Bruegger 1994: 4). The latter statement is arguable, we maintain; there is nothing incompatible between these two concepts if one makes certain assumptions about what they mean in an operational sense, as they both are consequences of discretization processes, one with respect to space and the other with respect to phenomena. This aside, the goals of this thesis are in harmony with Bruegger’s, even though its approach, formalisms and application concerns are somewhat different.
1.2.2 Precision Limitations
We call the second type of discretization precision-limited3, as it is constrained by how many digits of precision computer hardware or software can generate or represent. This applies to data capture hardware (e.g., scanners and digitizing tables), which have finite-precision grids through which points are filtered, as well as to the word sizes supported by a computer's CPU (central processing unit) or implemented in software by programming languages. The basic issue is that geometric coordinates are often modeled as real numbers, but become represented by finite, floating point quantities that can fail to distinguish data points that are very close together and which can introduce numerical errors and instabilities, hence cause algorithms to make incorrect decisions when testing and comparing computed values. A good example of how geometric imprecision can lead to topological trouble is described and illustrated by Franklin (1984). In figure 1.3 scaling is applied to a triangle and a point, resulting in the movement of the point from inside to outside the triangle. The figure exaggerates the effect by using integer coordinates, but the principle holds true for floating point ones as well.
Figure 1.3: Scaling moves point P outside triangle (from Franklin 1984)
Franklin’s paper provides other sobering lessons in the hazards of computational geometry, and provides a look at some ways to avoid them. Although numerical methods have improved considerably since he wrote, his conclusions remain valid:
Existing computer numbers violate all the axioms of the number systems they are designed to model, with the resulting geometric inaccuracies causing topological errors wherein the database violates its design specifications. (Franklin 1984: 206)
It has found to be devilishly difficult to guarantee the robustness of algorithms that process geometric data, at least unless limiting assumptions are made about the nature and quality of input data, or should approximate answers suffice. Considerable effort in computer graphics and computational geometry has been devoted to handling the consequences of finite-precision computations in processing geometric data (Schirra 1997). While these issues are relatively well-understood, and robust techniques for dealing with some of them exist, this is not the end of the story where processing geospatial data is concerned. One reason is that many of the techniques for avoiding errors in geometric computations utilize software representation of numbers that extend their intrinsic precision generally, or as required by specific computations. While they give more accurate results than floating point computations done in hardware, they are hundreds or thousands of times slower to execute. This may be tolerable for some kinds of geometric applications, but GIS applications tend to be too data-intensive to make such approaches practical.
Also, even if GIS processing modules could be made provably correct in handling all the precision available for map and other locational data, its degree tends to be fixed, and is rarely sensitive to variations in the actual precision of coordinates within datasets. Finite precision is basically regarded as a nuisance, to be overcome by force, usually by allocating more bits than will ever be needed to store and process coordinate data (e.g., double-precision floating point words). This almost always fails to reflect the true amount of information that a coordinate contains about a geographic location, tending to exaggerate it.
Standardizing (hence falsifying) the precision of locations to machine word sizes may result in false confidence in the quality of computations performed on them, even if all numerical instabilities are carefully avoided. It may lead one to believe, for example, that a boundary location is known to within 2 cm, when in fact the measurement is at best accurate to about 20 meters. Some modules of some GISs do provide ways to simulate reductions to precision (by storing accuracy parameters and introducing error tolerances); these, however, are almost always based on global estimates or information about measurements (error attributes or metadata), rather on the actual precision of measurements themselves. Another common strategy is to employ rational arithmetic, using integral numerators and denominators to represent real numbers. While this can ensure unambiguous results for computations such as line intersection or point-in-polygon tests, the results tend to be only as good as the least accurate coordinate involved.
Share with your friends: |