Why the use of database (Problems which database-processing system has solved)
1. Data is separated and isolated. If some related data is needed, do so the system manager must determine which parts of the file are needed; then he must decide how to the files are related; and he must co-ordinate the processing of the files so that the correct data is extracted. Database-processing system data is stored in one place and the database management system accesses the stored data. So, the general structure of all database applications is (users - database application - DBMS - Database). (Kroenke, 1995)
2. Data is often duplicated. This waste files space and brings the problem of data integrity. If data items differ, they will produce inconsistent results, making difficult determine which is true thus reducing the credibility of the data. (Kroenke, 1995)
3. Application programs are dependent on the file format. If changes are made in the file format, the application program also must be changed
4. Files are often incompatible with one another. File format of the program language or the product used to generate it e.g. COBOL and C program. (Kroenke, 1995)
5. The difficult of present the data the way the users view it.
Benefits of relational model
The data is stored, at least conceptually, in a way the user can readily understand (Kroenke, 1995). Data is stored in tables, and the relationship between the rows of the table is visible in the data. Unlike the earlier database models where the DBMS stored the relationship in the systems data such as indexes which would hide the relationship, RDBMS enables the user to obtain information from the database without the assistance of the professional as the relationship is always stored in user-visible the data. RDBMS are particularly in the Decision-support system (DSS).
Microcomputer DBMS products
There has been a lot of development in the DBMS: started with dBase II was not a DBMS, dBase III that was a DBMS, dBase IV that was RDBMS. Today DBMS products provide rich and robust user interfaces using graphical user interface such as Microsoft windows. Two products are Microsoft's access and Borland's paradox for windows (Kroenke, 1995).
Client server Database Applications
The development of local area networks (LAN) which led to linking of micro computer CPUs so that they can work simultaneously, which was advantageous (greater performance) and more problematic, led to a new style of database processing called the client server database architecture (Kroenke, 1995).
Distributed Database processing
Organisational database applications address the problems of file processing and allow more integrated processing of organisational data. Personal and work-group database systems bring in database technology even closer to the user by allowing him access locally managed database. Distributed database combine these types of databases processing by allowing personal, work-group, and organisational databases to be combined into integrate but distributed system (Kroenke, 1995).
Object-oriented DBMS (OODBMS)
These are the DBMS, which come as a result of development of a new type of programming called Object-oriented programming. It difficult and has different types of data structures.
Data modelling
Is the processing of creating a representation of the user's view of the data.
There are two data modelling tools. The entity-relation approach and semantic object approach.
Development of database
Components of database systems
Features and functions
Of DBMS
!.Design tools subsystem
-Tables creation tool
-Form ceation tool developer
-Query creation tool
Database -Report creation tool
User's data -Procedural lanngauge Application
Metadata compiler program
Overhead data 2.Run tool subsystem
Indexes -Form processor
Linked lists -Query processor
Application -report writer
-Procedural language application users
run time program
3.DBMS Engine
figure from (Kroenke, 1995)
General strategies
To develop a database, we build a data model that identifies the things to be stored in the database and defines their structure and the relationship among them. This familiarity must be obtained early in the development process, by interviewing the user and the building the requirements.
There are two general strategies for developing a database: Top-down development and Bottom-up development (Kroenke, 1995)
Top-down development: proceeds from the to the specific. It begins with goals of the organisation, the means by which the goals can be accomplished, the information requirements that must be satisfied to reach those goals, an abstract of data model is constructed. Using this high-level model, the development team progressively works down-wards towards more and more detailed descriptions and models. Intermediate-level models also are expanded with more detail until the particular databases and related applications can be identified. One or more applications are selected for development. Over time, the entire high-level data model is transformed into lower-level models, and the indicated systems, databases, and applications are created. For Bottom-up development is the reverse. The entity-relation approach is more effective with top-down development, and the semantic object approach is more effective with bottom-up development.
FUNDAMENTAL DATA BASE ELEMENTS
Elements of reality modelled in a GIS data base have two identities: entity and object.
Entity
An entity is the element in reality. An entity is something that can be by the user's work environment, something important to the user of the system, e.g. someone's name (Kroenke, 1995). An entity is "a phenomenon of interest in reality that is not further subdivided into phenomena of the same kind". e.g. a city could be considered an entity and subdivided into component parts but these parts would not be called cities (they could be districts, neighbourhoods, etc.) therefore it would be an entity. e.g. a forest could be subdivided into smaller forests therefore it would not be an entity.
Similar phenomena to be stored in a data base are identified as entity types. An entity type is any grouping of similar phenomena that should eventually get represented and stored in a uniform way, e.g. roads, rivers, elevations and vegetation. Entities are grouped into entity classes, or collection of entities of the same type (Kroenke, 1995).
Entity class is the general form or description of a thing, where as an instance of an entity class is the representation of a particular entity, e.g. car 123
Attributes
Entities have attributes, or as they are some called, properties which the entity's characteristics
An attribute is a characteristic of an entity selected for representation. It is usually non-spatial though some may be related to the spatial character of the phenomena under study (e.g. area and perimeter of a region).
The actual value of the attribute that has been measured (sampled) and stored in the database is called attribute value.
An entity type is almost always labelled and known by attributes (e.g. a road usually has a name and is identified according to its class such as freeway, state road, etc.).
Attribute values often are conceptually organised in attribute tables, which list individual entities in the rows and attributes in the column. Entities in each cell of the table represent the attribute value of a specific attribute for a specific entity.
Identifiers
Entities instances have names that identify them. The identifier of an instance is one or more of its attributes (Kroenke, 1995). An identify may be ethier unique or not. If it is unique , its value will identify one, entity instance.
Relationships
Entity can be associated with one another in the relationship. The E-R model contains both relationship classes and relationship instances. Relationship classes are associations among entity classes, and relationship instances are associated among entity instances. Relationships can have attributes. (Kroenke, 1995).
A relation can include many entities; the number of entities in a relationship is the degree of relationship. Although the E-R model allows relationships of any degree, most application of the model involve only relationship of degree two.
Relationship is an association established between common fields (columns) in two or more4 tables. Such relationships are sometimes called binary relationships. relationship can be one-to-one, one-to-many, or many-to-many. (1:1, 1:N, N:M)
one-to-one relationship, each record in Table A can have only one matching record in Table B, and each record in Table B can have only one matching record in Table A. This type of relationship is not common, because most information related in this way would be in one table. You might use a one-to-one relationship to divide a table with many fields, to isolate part of a table for security reasons, or to store information that applies only to a subset of the main table. For example, you might want to create a table to track employees participating in a fundraising soccer game.
A one-to-many relationship
A one-to-many relationship is the most common type of relationship. In a one-to-many relationship, a record in Table A can have many matching records in Table B, but a record in Table B has only one matching record in Table A.
many-to-many relationship
In a many-to-many relationship between two tables, one record in either table can relate to many records in the other table. In a many-to-many relationship, a record in Table A can have many matching records in Table B, and a record in Table B can have many matching records in Table A. This type of relationship is only possible by defining a third table (called a junction table) whose primary key consists of two fields - the foreign keys from both Tables A and B. A many-to-many relationship is really two one-to-many relationships with a third table.
Weak entities
These entities whose presence in the database depends on the presence of another entity e.g. apartment depends on the building. E-R model includes a special type of weak entity called an ID-dependent entity
Object
An object is the element as it is represented in the data base. An object is "a digital representation of all or part of an entity".
The method of digital representation of a phenomenon varies according to scale purpose and other factors. e.g. a city could be represented geographically as a point if the area under consideration were continental in scale; while the same city could be geographically represented as an area if we are dealing with a geographical data base for a state or a country.
The digital representation of entity types in a spatial data base requires the selection of appropriate spatial object types. The object types are listed in Table 3-1 (and illustrated in Figure 3-1) based on the following definition of spatial dimensions.
0-D: an object having a position in space, but no length (e.g. a point).
1-D: an object having a length and is composed of two or more 0-D objects (e.g. a line)
2-D: an object having a length and width and is bounded by at least three 1-D line segment objects (e.g. an area).
3-D: an object having a length, width and height/depth and is bounded by at least four 2-D objects (e.g. a volume).
An object class is the set of objects which represent the set of entities, e.g. the set of points representing the set of wells.
0-dimensional object types
point - specifies geometric location
node - a topological junction or end point, may specify location
|
1-dimensional object types
line - a one dimensional object
line segment - a direct line between two points
string - a sequence of line segments
arc - a locus of points that forms a curve that is defined by a mathematical function
link - a connection between two nodes
directed link - a link with one direction specified
chain - a directed sequence of nonintersecting line segments and/or arcs with nodes at each end
ring - a sequence of non-intersecting chains, strings, links or arcs with closure
|
2-dimensional object types
area - a bounded continuous object which may or may not include its boundary
interior area - an area not including its boundary
polygon - an area consisting of an interior area, one outer ring and zero or more non-intersecting, nonnested inner rings
pixel - a picture element that is the smallest nondivisible element of an image
|
Figure 3-1. Spatial object types.
Data base model
A data base model is a conceptual description of a database defining entity type and associated attributes.
Each entity type is represented by specific spatial objects.
After the database is constructed, the data base model is a view of the database, which the system can present to the user.
Examples of data base models can be grouped by application areas; e.g. transportation applications require different data base models than do natural resource applications.
Layers
Spatial objects can be grouped into layers, also called overlays, coverages or themes.
One layer may represent a single entity type or a group of conceptually related entity types. E.g. a layer may have only stream segments or may have streams, lakes, coastline and swamps.
What kind of objects do you use to represent these entity types?
SPATIAL OBJECTS AND DATA BASE MODELS
The objects in a spatial database are representations of real-world entities with associated attributes. The power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities. Thus a GIS data base is much more than a collection of objects and attributes.
How are lines linked together to form complex hydrologic or transportation networks?
How can points, lines, or areas be used to represent more complex entities like surfaces?
Point Data
Points represent the simplest type of spatial object.
Choice of entities which will be represented as points depends on the scale of the map or application. e.g. on a large scale map, building structures are encoded as point locations. e.g. on a small scale map - cities are encoded as point locations.
The coordinates of each point can be stored as two additional attributes.
Information on a set of points can be viewed as an extended attribute table.
Each row (or record) represents a point recording all information about the point.
Each column is an attribute (or a field), two of which are the x, y coordinates.
Each point is independent of each other, represented as a separate row (Figure 3-2).
Figure 3-2. Point data attribute table.
Line Data
Lines represent network entities such as:
infrastructure networks
transportation networks - highway and railway
utility networks - gas, electricity, telephone and water pipe
airline networks - hubs and routes
natural networks
river channels
Network characteristics (Figure 3-3):
A network is composed of nodes and links.
The valency of a node is the number of links at the node. e.g.
ends of dangling lines are "1-valent"
4-valent nodes are most common in street networks
3-valent nodes are most common in hydrology
A tree network has only one path between any pair of nodes, no loops or circuits are possible. Most river networks are trees.
Figure 3-3. Nodes and links in network entities.
Attributes of network entities:
Link attributes
transportation: direction of traffic, length, number of lanes, time to pass
pipe lines: diameter of pipe, direction of gas flow
electricity: voltage of electrical transmission line, height of towers
Node attributes
transportation: presence of traffic lights and overpass, names of intersecting streets
electricity: presence of shutoff valves, transformers
Area Data
Area data are represented on area class maps, or choropleth maps. Boundaries may be defined by natural phenomena such as lakes, or by man such as forest stands, CENSUS zones.
Types of areas that can be represented include:
Environmental/natural resource zones: land cover, forest, soil, water bodies
Socio-economic zones: CENSUS tracts, postcodes
Land records: land parcel boundaries, land ownership
Area coverage (Figure 3-4):
Type 1: Entities are isolated areas, possibly overlapping
Any place can be within any number of entities, or none.
Areas do not exhaust the space.
Type 2: Any place is within exactly one entity
Areas exhaust the space.
Every boundary line separates two areas, except for the outer boundary.
Areas may not overlap.
Any layer of the first type can be converted to one of the second type.
Holes and islands (Figure 3-5):
Areas often have "holes" or areas of different attributes wholly enclosed within them.
More than one primitive single-boundary area (islands) can be grouped into an area object.
Figure 3-4. Area coverage: (a) Entities are separate; (b) Entities fill the space; (c) First type represented as second type.
Figure 3-5. Holes and islands.
Representation of Continuous Surfaces
Examples of continuous surface:
elevation (as part of topographic data)
rainfall, pressure, temperature
population density
General nature of surfaces
Critical points
peaks and pits - highest and lowest points
ridge lines, valley bottoms - lines across which slope reverses suddenly
passes - convergence of 2 ridges and 2 valleys
faults - sharp discontinuity of elevation - cliffs
fronts - sharp discontinuity of slope
slopes and aspects can be derived from elevations
Data structures for representing surfaces
Traditional data models do not have a method for representing surfaces, therefore surfaces are represented by the use of points, lines or areas.
Points - grid of elevations
Lines - digitised contours
Areas - TIN (Triangulated irregular network)
THE VECTOR GIS
What is a vector data model?
based on vectors (as opposed to space-occupancy raster structures) (Figure 3-6)
fundamental primitive is a point
objects are created by connecting points with straight lines
areas are defined by sets of lines (polygons)
Figure 3-6. Example of vector GIS data.
Arcs
When planar enforcement is used, area objects in one class or layer cannot overlap and must exhaust the space of a layer.
Every piece of boundary line is a common boundary between two areas.
The stretch of common boundary between two junctions (nodes) may be called edge, chain, or arc.
Arcs have attributes which identify the polygons on either side (e.g. "left" and "right" polygons)
In what direction by which we can define "left" or "right"?
Arcs (chains/edges) are fundamental in vector GIS (Figure 3-7).
Figure 3-7. An arc in vector GIS.
Node
The beginning of any point of the arc or the location of a point feature, or the inntersection of two arcs
Polygon
It is closed chain of arcs that represent all area features.
Data Base Creation
Data base creation involves several stages:
input of the spatial data
input of attribute data
linking spatial and attribute data
Once points are entered and geometric lines are created, the topology of the spatial object must be "built" (Figure 3-8).
Building topology involves calculating and encoding relationships between the points, lines and areas.
This information may be automatically coded into tables of information in the data base.
Topology is recorded in 3 data tables. One for each type of spatial element, arc (polygon attribute table), node (node topology table), polygon (arc topology table) and a fourth table is used to store nodes, coordinates and vertices.
Figure 3-8. Example of "built" topology
Editing
During the topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically.
Automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined. Tolerance value is related to the precision with which locations can be digitised.
Adding Attributes
Once the objects have been formed by building topology, attributes can be keyed in or imported from other digital data bases.
Once added to the data base, attributes must be linked to the different objects.
Attribute data is stored and manipulated in entirely separate ways from the locational data. Usually a Relational Data Base Management System (RDBMS) is used to store and manage attribute data and their links to the corresponding spatial objects.
Things to consider in Database
Essential Differences between raster and vector (Hazelton, 1999)
The essential differences between the two approach is how they deal with representing things in space.
Raster systems are built on the premise that there is something at all points of interest, so we will record something for every location. The space under consideration is exhausted by a regular data structure. The concern is not with the boundaries between objects as much as with the objects themselves, their interiors, or the phenomena they are representing. Spatial resolution is not as important as complete coverage. The presence of something is more important than its exact extent.
Vector systems are built on the premise that we only need to record and deal with the essential points. If there isn’t something of significance at a location, don’t record anything. Not all locations in space are referenced, and many are simply referenced indirectly, as being inside a polygon. The data structure supports irregular objects and very high resolution. We are interested in the boundaries between objects at least as much as with the objects themselves, often more so. We need precision representation of linear objects, and this need overrides other needs for surface and area modeling of all but the simplest kind. Precision is the watchword in vector GIS, together with making spatial relationship explicit.
The explicit nature of the relationships in vector GIS requires ‘topology’, . It also allows much easier analysis of these kinds of relationships, especially connectivity between locations (points), which is done with lines. In raster GIS, we can figure out which cells are the eight surrounding the one we are currently in, so connectivity is implicit in the data structure, and we don’t need all this extra stuff
Spaghetti (Hazelton, 1999)
We drew a polygon by just making lines; we never explicitly say in the database ‘this is a polygon’. We often call this kind of representation the Spaghetti data model, after the way that a plate of spaghetti looks (and is structured).
While the spaghetti looks fine, it doesn’t really satisfy our needs. While we just need to see a picture, it’s fine. As soon as we need to do anything beyond looking, when we need to get the machine to do some analysis rather than the user, we run into problems.
A simple question might be: “If I follow this road (a line), what other roads join it?” This is rather an important question if you want the machine to tell you how to get from A to B. If we have a spaghetti model for our data storage, we can find out the answer to the question. However, we need to look at every single line and co-ordinate pair in the database and compare it to every line segment in the line we are starting from.
Why is this? At any moment, another line might cross ours, and we won’t know it unless we test every single segment of that line against all the segments of our line, to see if they cross or meet. If there is a crossing or meeting, we can tag that line and keep going, but every such query requires a complete search and test for the entire database. As most GIS databases are fairly large, this is horribly inefficient. We can only consider it in a small system, such as MapInfo, and even there we do give the system a few hints to make things simpler. For big GIS, we use a thing called Topology
Topology (Hazelton, 1999)
When we look into the display of the spaghetti, we quickly see the polygons, the intersections and the like. This is because our brains are very powerful parallel-processing systems adapted to make sense of visual data very rapidly. Our lives depend on this capability, today and since the dawn of the species. But computers are painfully slow and awkward at this operation. They are good at crunching numbers, so we have to make the structure in the mess of spaghetti obvious to the machine, in a numerical form. We call this ‘topology.’
Topology is a branch of mathematics that deals with very basic geometric concepts. Way before we think about angle and distance and size, there are more fundamental properties of objects, properties which don’t change when we do a wide range of things to the object. For example, no matter how we manipulate an object, provided we don’t tear it, we don’t change the number of pieces there are of it, or the number of complete holes there are in it. If two objects are connected, the connectedness remains constant no matter how much we rotate, scale, move or otherwise manipulate them. One object is inside another until we tear open the outer one to remove the inner. So you can see that there are some very basic properties of objects that remain constant (or invariant) under a range of things that can be done to them (operations).
Extending this to vector GIS databases, we find that an object is a polygon no matter how many sides it has (beyond a basic minimum); that the holes in an object remain the same no matter how it is transformed into different map projections; that a line goes from one end to the other at all times (i.e., there is a direction associated with it). We can build these things into the GIS by making them explicit. To do this requires a more developed data structure than just the spaghetti
Topological Relationships (Hazelton, 1999)
1) Ownership and Component-ness
The most fundamental topological relationship is ‘owns’ or ‘is a component of’. (You will note that these two relationships are actually two sides of the one relationship.) This allows us to build a definite structure into all the objects in the vector GIS database. It works like this.
We still have the great line strings of points, with the nodes at the ends. But we now give the line strings a definite identifier, a unique number (to keep the computer happy). The nodes we also keep a special note of in a separate area, linked back to the original data (which includes the co-ordinates). As we build up things like polygon boundaries, we link the line strings into chains, of which we also keep a special note. So we have a series of two-way relationships being built here. The nodes are a component of the lines, which are a component of the chains. The chains own the lines, which in turn own the nodes. We make this explicit in our database, so that we can find where everything is.
One of the beauties of this system is that ownership and component-ship are not exclusive. A node can be the end-point of several lines, for example, and a line obviously forms a boundary between (and is owned by) two polygons. This means that we only need to store things once in the database, reducing redundancy.
At the next level, we take collections of chains that form a closed loop, and reference these as the boundary of a polygon. The polygon then explicitly owns this collection of chains as its boundary. A polygon may have an outer boundary and several inner ones, such as islands in a lake.
We need to make sure that the relationships are two-way, so we need to have the chains refer to the polygons that own them. Each chain can only be owned by a maximum of two polygons in a 2-D representation, so we make explicit which two these are.
You will notice that in all the work to this point, co-ordinates have not been involved. This is an essential part of topology, and it means that the relationship hold true for any map projection of the data
2) Direction
Another important topological relationship is the direction of a line. We take this to be from it ‘from- node’ to its ‘to-node’, naturally. In most cases, the from-node, is just where the line started to be entered. The user never sees this direction, as it doesn’t affect anything outside the topological structure. In many GIS, you can add a direction to a line, making it a directed line, where the direction has a special meaning, such as a one-way street. The GIS will store this information, but the direction of the line that the user want can be either direction.
With direction, an interesting property occurs in 2-D (but not in 3-D). As we move along the line in its proper direction, we find that the two polygons that the line bounds are on either side, to the left and to the right. So we record the left and right polygon identifiers with the line data, and we use this to provide a link from the line to each polygon.
In fact, because of this role of the line being linked to all the other components, the line assumes major importance in 2-D GIS. In addition, there is a topological property called ‘duality’, which means that there are strong relationships between the components in both directions which must be made explicit for the data structure to work properly. By working with the lines as the basis of the data structure, we have a single step to get to all the other components.
3) Connectivity
The topological relationships we now have as explicit in the database enable us to tell very quickly what lines meet at which nodes. We can choose a single node and find all the lines to which it belongs, and so all the polygons in whose boundary it is a component. This makes it very quick to determine how pipeline networks inter-connect. This simplifies a lot of the kinds of queries that are involved in network analysis.
4) Adjacency
The other aspect of connectivity is the relationship between polygons. If two polygons share a boundary, they are adjacent. If they share just a common point, their adjacency is of a lower order. But by making the fundamental relationships explicit in the database, it is quick and easy to determine this adjacency and its degree. This helps in a number of different applications of spatial analysis
5) Nestedness
Another topological relationship is that of having things inside other things. The database handles this by referring to closed loops of chains as boundaries, and noting which of them are internal boundaries, i.e. inside another polygon, and which are external. It is then very simple to search for common boundaries and seek nested objects
6) How Many More?
In a 2-D GIS, there are quite a few different topological relationships. There are those between polygons, between polygons and lines, between lines and lines, between lines and points, between polygons and points, and between points and points. Some are quite simple; others are more complex.
As far as polygons and lines are concerned, a topologically sound database can handle only two kinds of relationships (fundamentally). Either the two objects touch along a common boundary (polygons meet each other at lines and points, for example) or they do not touch at all. This is the state in a database when we have ‘built’ the topology
Improper Relationships (Topological Division) (Hazelton, 1999)
When we enter data into a GIS, we can have ‘improper’ topological relationships occur. For instance, we may digitize two polygons in a single layer that overlap, so that at one point we will have two values for a single attribute. This presents us with problems in analysis, so we don’t want this to happen. With spaghetti we can’t control this, but if we build the topology correctly, we can ensure that for each attribute layer, we have single values at any point.
(Remember that part of the basic idea of a vector GIS is that within a polygon, the attribute value is constant, changing sharply at the boundary. This is exactly the same as with a raster GIS: sharp changes at boundaries, no change within. We have not yet got to the point of a GIS that allows continuous variability.)
A similar circumstance arises when we undertake topological overlay. Here we have two layers that while having ‘good’ topology, will naturally have overlapping polygons. It is perfectly reasonable to have a map where you have overlapping polygons representing different attributes. However, when we want to build the topology in the newly created layer (created by the overlay operation), we need to start breaking the large polygons down into small polygons, such that any one attribute has just one value within the polygon.
So in all cases of importance to us in this course, when we have the data structure in the state where all polygons are in one of the two basic topological states, and the same for lines, etc., we have eliminated all the improper relationships and can now proceed with analysis. We can be confident that we don’t have any ambiguities in the spatial relationships expressed in the database, so that analysis will work properly.
Scale, Accuracy, Precision, Resolution (Hazelton, 1999)
An interesting myth that has grown around vector GIS is that of the scale-less database. The argument runs that since we can represent locations to fractions of a millimeter, we can work on a 1 : 1 scale, and so avoid the problems of scale in maps and the like. While this is a nice idea, even with GPS it is still a long way off.
The question is, how good is the input data? If I digitize locations from a 1:24,000 map, a location is good to about 12 meters. If I digitize a 1:500 map, the locations are good to about 0·25 meter. Note that for maps, this data quality is for ‘well-defined points’ only. Points that aren’t well-defined, lines, polygons and the like don’t count! So only a very small part of the map will actually be to that precision. How good is the rest? There are no standards for that part of the map.
I can measure objects on the Earth with GPS and get precision to 0·1 meter. With good surveying gear, I may even be able to get to 0·01 meter. We are still a long way from a millimeter, let alone a fraction of a millimeter. Yet it is easy to pull up co-ordinates to whatever number of decimal places one wishes.
The quality of the data, its accuracy if you like, is based very much on the precision of the measurements used in the database. But there is nothing in a GIS, in almost every case, to let the user know how good the data actually as, while it is being used. You pick a point, and read out the co- ordinates to the fraction of a millimeter, and nothing springs up to say “Well actually, that’s only good to ± 50 meters, you know.” It is very misleading.
As GIS users, you need to be very aware of this issue. It is soeasy to be led astray here, and many of your less well-educated users may fall into these pitfalls. Remember the Garbage In, Gospel Out situation. Here is very easy place to see it happen.
The resolution of the computer hardware is also an issue here. ARC/INFO work with real numbers (floating point) for co-ordinates, and these can be either single precision or double precision. Single precision is good to 6 or 7 significant figures, while double precision is good to 14 to 16 significant figures. If you are recording locations using UTM co-ordinates, you will only get meter resolution if you use single precision and the full co-ordinates. MGE, on the other hand, uses integers, so that every location is ultimately expressed as a number between 0 and 4·2 billion. There can be questions of the fine-grained nature of this, but if you are aware of the differences and what is happening, things will be OK.
Problems with Vector GIS (Hazelton, 1999)
A full-blown vector GIS, especially with an associated raster component, is an awesome system. ARC/INFO sports over 3,000 commands, while MGE is not far behind it (although it has a much more mouse-windows oriented interface). It is very easy to get lost in the complexity and intricacies of these products.
We have already looked at questions of precision, accuracy, resolution and scale. Vector systems have no built-in ‘check’ of the raster cell size as a give-away about their resolution. In many cases there is no metadata or data quality information to let you know about the data in the database. You may never know that part of your database was digitized from a 1:500,000 map, while all the rest is 1:24,000, and yet that difference could play havoc with analyses performed on the data.
Another issue we haven’t touched on is the question of data conversion from raster to vector. We often need to do this to help a vector analysis. When the vectors are produced, there may be nothing (in the lineage part of data quality) to let you know that these were converted from a raster dataset of some resolution. When the vectors are smoothed and the data is included, how will we know there is anything different about those lines and polygons?
Similar problems going from vector are raster are less of an issue, as the vector looks like it should be of a higher resolution and converts easily. But was the vector as good as the raster resolution? How can you tell? It is surprising how many raster GIS have the same resolution as a vector system.
As with all science, you can avoid fooling other people if you first don’t fool yourself. If you know about the system, its capabilities, the data and what it should be able to achieve, you can do well with vector GIS.
Conclusions (Hazelton, 1999)
Vector GIS is a powerful tool for spatial representation and analysis. Yet it is open to misuse and abuse, like any other information system. Some of the potential traps have been pointed out, and you must be aware of them.
If you can focus on the application rather than the hardware and software, you will do a good job with GIS in general
GIS database design
Before actually building the tables, forms, and other objects that will make up your database, it is important to take time to design your database. A good database design is the keystone to creating a database that does what you want it to do effectively, accurately, and efficiently.
Database design stages
Conceptual database design (focus on the content of the database i.e. user GIS data needs to get GIS functional and data requirements, it is done by listing the database elements)
1. physical database design (the actual structure of the database is developed and documented based on the content (features and attributes) identified above)
2. database implementation (the actual coding of the physical database)
These are the basic steps in designing a database:
1 Determine the purpose of your database.
2 Determine the tables you need in the database.
3 Determine the fields you need in the tables.
4 Identify fields with unique values.
5 Determine the relationships between tables.
6 Refine your design.
7 Add data and create other database objects (Tables, queries, forms, reports, macros, and modules.
8 Use Microsoft Access analysis tools.
Determine the purpose of your database
The first step in designing a database is to determine the purpose of the database and how it's to be used. You need to know what information you want from the database. From that, you can determine what subjects you need to store facts about (the tables) and what facts you need to store about each subject (the fields in the tables).
Talk to people who will use the database. Brainstorm about the questions you'd like the database to answer. Sketch out the reports you'd like it to produce. Gather the forms you currently use to record your data. Examine well-designed databases similar to the one you are designing.
Determine the tables you need
Determining the tables can be the trickiest step in the database design process. That's because the results you want from your database — the reports you want to print, the forms you want to use, the questions you want answered — don't necessarily provide clues about the structure of the tables that produce them.
Table is the fundamental structure of a relational database management system. A table is an object that stores data in records (rows) and fields (columns). The data is usually about a particular category of things, such as employees or orders.
A table should not contain duplicate information, and information should not be duplicated between tables.
When each piece of information is stored in only one table, you update it in one place. This is more efficient, and also eliminates the possibility of duplicate entries that contain different information. For example, you would want to store each customer address and phone number once, in one table.
Each table should contain information about one subject.
When each table contains facts about only one subject, you can maintain information about each subject independently from other subjects. For example, you would store customer addresses in a different table from the customers' orders, so that you could delete one order and still maintain the customer information.
In table Datasheet view, you can add, edit, or view the data in a table. You can also check the spelling and print your table's data, filter or sort records, change the datasheet's appearance, or change the table's structure by adding or deleting columns.
You can sort, filter, or find records in the rows of your datasheet by the data in one or more adjacent columns.
You use a unique tag called a primary key to identify each record in your table. Just as a license plate number identifies a car, the primary key uniquely identifies a record. A table's primary key is used to refer to a table's records in other tables.
Determine the fields you need
Each table contains information about the same subject, and each field in a table contains individual facts about the table's subject. For example, a customer table may include company name, address, city, state, and phone number fields.
Field is an element of a table that contains a specific item of information, such as last name. A field is represented by a column or cell in a datasheet.
When sketching out the fields for each table, keep these tips in mind:
Relate each field directly to the subject of the table.
Don't include derived or calculated data (data that is the result of an expression).
Include all the information you need.
Store information in its smallest logical parts (for example, First Name and Last Name, rather than Name.)
Identify fields with unique values
In order for DBMS to connect information stored in separate tables - for example, to connect a customer with all the customer's orders- each table in your database must include a field or set of fields that uniquely identifies each individual record in the table. Such a field or set of fields is called a primary key.
The power of a relational database system such as Microsoft Access comes from its ability to quickly find and bring together information stored in separate tables using queries, forms, and reports. Once you designate a primary key for a table, to ensure uniqueness, DBMS will prevent any duplicate or Null values from being entered in the primary key fields.
A query is a question about the data stored in your tables, or a request to perform an action on the data. A query can bring together data from multiple tables to use as the source of data for a form or report.
A form is a database object on which you place controls for taking actions or for entering, displaying, and editing data in fields.
A report is a database object that presents information formatted and organized according to your specifications. Examples of reports are sales summaries, phone lists, and mailing labels.
There are three kinds of primary keys that can be defined in Microsoft Access: AutoNumber, single-field, and multiple-field.
Determine the relationships between tables
Now that you've divided your information into tables and identified primary key fields, you need a way to tell DBMS how to bring related information back together again in meaningful ways. To do this, you define relationships between tables.
Foreign key is one or more table fields that refer to the primary key field or fields in another table. A foreign key indicates how the tables are related - the data in the foreign key and primary key fields must match.
Refine the design
After you have designed the tables, fields, and relationships you need, it's time to study the design and detect any flaws that might remain. It is easier to change your database design now, rather than after you have filled the tables with data.
Use Microsoft Access to create your tables, specify relationships between the tables, and enter a few records of data in each table. See if you can use the database to get the answers you want. Create rough drafts of your forms and reports and see if they show the data you expect. Look for unnecessary duplications of data and eliminate them.
Enter data and create other database objects
When you are satisfied that the table structures meet the design goals described here, then it's time to go ahead and add all your existing data to the tables. You can then create any queries, forms, reports, macros, and modules that you may want.
Use Microsoft Access analysis tools
Microsoft Access includes two tools that can help you to refine your database design. The Table Analyzer Wizard can analyze the design of one table at a time, can propose new table structures and relationships if appropriate, and can restructure a table into new related tables if that makes sense. For information on running the Table Analyzer Wizard, click .
The Performance Analyzer can analyze your entire database and make recommendations and suggestions for improving it. The wizard can also implement these recommendations and suggestions. For information on using the Performance Analyzer, click .
For additional ideas on designing a database, you may want to look at the Northwind sample database and the database schemas for one or more of the databases that you can create with the Database Wizard. For information on using the Database Wizard, click .
What is involved in the design of a database
the logic elements ( they provide for the positional (x,y) reference structure that holds graphic information, and are designated as nodes, links, chains, and areas)
the graphic elements, which are assigned to logic elements (i.e. to design the graphics elements for the features that have to be represented graphically, maintained, and accessed in graphic fashion)
the attributes (alphanumeric data), which are assigned/linked to the features, and the display rules for the attributes are also included
GIS data relationship (i.e., relation between feature classes and their attribute types, relation among attribute types, and relationships among features.)
the digital data to be included in the database e.g. raster images, satellite imagery, existing digital landbased data(maps) and facilities/assets data
database has to be logically structured to relate similar data types to each other , either through laying or object-based approaches
Should I use a macro or Visual Basic?
In Microsoft Access, you can accomplish many tasks with macros or through the user interface. In many other database programs, the same tasks require programming. Whether to use a macro or Visual Basic for Applications often depends on what you want to do.
When should I use a macro?
Macros are an easy way to take care of simple details such as opening and closing forms, showing and hiding toolbars, and running reports. You can quickly and easily tie together the database objects you've created because there's little syntax to remember; the arguments for each action are displayed in the lower part of the Macro window.
In addition to the ease of use macros provide, you must use macros to:
· Make global key assignments.
· Carry out an action or series of actions when a database first opens. However, you can use the Startup dialog box to cause certain things to occur when a database opens, such as open a form.
When should I use Visual Basic?
You should use Visual Basic instead of macros if you want to:
· Make your database easier to maintain. Because macros are separate objects from the forms and reports that use them, a database containing many macros that respond to events on forms and reports can be difficult to maintain. In contrast, Visual Basic event procedures are built into the form's or report's definition. If you move a form or report from one database to another, the event procedures built into the form or report move with it.
· Create your own functions. Microsoft Access includes many built-in functions, such as the IPmt function, which calculates an interest payment. You can use these functions to perform calculations without having to create complicated expressions. Using Visual Basic, you can also create your own functions either to perform calculations that exceed the capability of an expression or to replace complex expressions. In addition, you can use the functions you create in expressions to apply a common operation to more than one object.
· Mask error messages. When something unexpected happens while a user is working with your database, and Microsoft Access displays an error message, the message can be quite mysterious to the user, especially if the user isn't familiar with Microsoft Access. Using Visual Basic, you can detect the error when it occurs and either display your own message or take some action.
· Create or manipulate objects. In most cases, you'll find that it's easiest to create and modify an object in that object's Design view. In some situations, however, you may want to manipulate the definition of an object in code. Using Visual Basic, you can manipulate all the objects in a database, as well as the database itself.
· Perform system-level actions. You can carry out the RunApp action in a macro to run another Windows-based or MS-DOS–based application from your application, but you can't use a macro to do much else outside Microsoft Access. Using Visual Basic, you can check to see if a file exists on the system, use Automation or dynamic data exchange (DDE) to communicate with other Windows-based applications such as Microsoft Excel, and call functions in Windows dynamic-link libraries (DLLs).
· Manipulate records one at a time. You can use Visual Basic to step through a set of records one record at a time and perform an operation on each record. In contrast, macros work with entire sets of records at once.
· Pass arguments to your Visual Basic procedures. You can set arguments for macro actions in the lower part of the Macro window when you create the macro, but you can't change them when the macro is running. With Visual Basic, however, you can pass arguments to your code at the time it is run or you can use variables for arguments — something you can't do in macros. This gives you a great deal of flexibility in how your Visual Basic procedures run.
Structured Query Language (SQL)
A language used in querying, updating, and managing relational databases. SQL can be used to retrieve, sort, and filter specific data to be extracted from the database.
The design of database had to take care of the basic data elements (micro data) and aggregated data (macro data) as data was obtained in both formats. Here records of individual persons and households collected in the survey in their raw form as well as in their final corrected form, and the results of processing in the form of aggregations are stored with a view to preserving them for the future and to making access as easy as possible at all times. Some of the main advantages of a micro-database are the possibilities to retrieve data theoretically at any level of detail, and to build sampling frames. Since micro data could be used illegally in efforts to disclose sensitive information, privacy concerns must always be taken into consideration, in this the names were restricted and removed from the general display ( ). For the case, aggregated census data were stored in that format to preserve earlier aggregations, to provide readily usable information. Micro data were saved to allow aggregations to be made that were not programmed initially
Demographic Geocoding/Georeferencing
Georeferencing being the process of assigning a geographic location (e.g. latitude and longitude) to a geographic feature based on its address, this was carried to able to convert automatically existing addresses into a GIS database. For this to be accomplishable the digital record for the feature must have a field which can be linked to a geographic base file with known geographic coordinates.
Considering the way population data is collected using field survey, which is the main source of data, this data has to be georeferenced (geocoded) before it is analyzed in GIS. Demographic data is usually referenced by point and area, the integration of the two has been highlighted by Bracken (1994), can be done in three ways. First, point address locations may be added allocated to census zones so that in effect address data becomes another aggregate field of zonal record. Second, each address location can be assigned data from its enveloping zone, so that the point takes on the attributes of its surrounding area. Third, both types of data can be re-represented geographically onto a neutral base in the form of a georeferenced grid. It is this third alternative in which Bracken (1994) developed a surface model which generate a spatial distribution of population as a fine and variable resolution geographical grid which is advocated for and developed further by (Bracken and Martin, 1995). It is this technique being partly employed to derive surface from the points to represent polygons but this time using buildings as the georeferencing spatial units as among the information recorded in population data collection is the place of residence mostly building number. This done to provide a way to disaggregate demographic analysis and this can be easily combined with other spatial analysis. It is accomplished by transferring the attributes of the bigger feature (e.g. road) on to the smaller dimensional features (buildings) so that individuals are geocoded on the right road; it explained further below.
There are many techniques of georeferencing (Cowen, 1997), in this thesis employed three techniques 1) totally assign new unique field, 2) Database Queries to read fields from tables and join them to other tables, 3) Any set of addresses can be accurately georeferenced by joining to this file on the basis of common fields.
With that we are in position
Carry out direct queries of the spatial database.
One may also determine the coordinate of a single address with a direct query of the GIS database by entering the address in a dialog box.
Once the particular address is located on a map then the coordinates can usually be read directly from the screen.
For the building road number was added to the building
This can simply be a relational data base join in which the geographic coordinates of the basemap are linked to the address records and made spatial.
Computer-Based Analysis for Public Management
Relational Database Design
Thomas H. Grayson
2000
Relational Database Design
In this study, use a relational database and designing a relational model: all data have been represented as tables. Tables are comprised of rows and columns; rows and columns are unordered (i.e., the order in which rows and columns are referenced does not matter). Each table has a primary key, a unique identifier constructed from one or more columns. A table is linked to another by including the other table's primary key. Such an included column is called a foreign key. Let me talk more about how the primary keys were created. For individual data, each was given a unique identifier number which is the first column in table . for the building
Qualities of a Good Database Design
Reflects real-world structure of the problem
Can represent all expected data over time
Avoids redundant storage of data items
Provides efficient access to data
Supports the maintenance of data integrity over time
Clean, consistent, and easy to understand
Introduction to Entity-Relationship Modeling
Entity-Relationship (E-R) Modeling: A method for designing databases
A simplified version is presented here
Represents the data by entities that have attributes.
An entity is a class of distinct identifiable objects or concepts
Entities have relationships with one another
Result of the process is a normalized database that facilitates access and avoids duplicate data
E-R Modeling Process
Identify the entities that your database must represent
Determine the cardinality relationships among the entities and classify them as one of
One-to-one (e.g., a parcel has one address)
One-to-many (e.g., a parcel may be involved in many fires)
Many-to-many (e.g., parcel sales: a parcel may be sold by many owners, and an individual owner may sell many parcels)
Draw the entity-relationship diagram
Determine the attributes of each entity
Define the (unique) primary key of each entity
From E-R Model to Database Design
Entities with one-to-one relationships should be merged into a single entity
Each remaining entity is modeled by a table with a primary key and attributes, some of which may be foreign keys
One-to-many relationships are modeled by a foreign key attribute in the table representing entity on the "many" side of the relationship (e.g., the FIRES table has a foreign key that refers to the PARCELS table)
Many-to-many relationships among two entities are modeled by a third table that has foreign keys that refer to the entities. These foreign keys should be included in the relationship table's primary key, if appropriate
Commercially available tools can automate the process of converting a E-R model to a database schema
Database Design Rules of Thumb
Keep data items atomic (e.g., first and last names are separate). Concatenating columns together later on-the-fly is generally easy, but separating them is not.
What is an example of where parsing subfields from a column may go awry?
When might you want to include the combined fields in a column anyway?
Define the primary key first. Use a descriptive name (PARCELID, not ID)
In fact, use descriptive names that give a new user a decent chance of guessing what they mean for all your columns! (E.g., use PARCEL_COUNT rather than PACT)
Use a single column for the primary key whenever possible; multi-column primary keys are appropriate for many-to-many relationships
Use lookup tables rather than storing long values
Use numeric keys whenever possible (What about ZIP codes?)
Avoid intelligent keys (exception: lookup tables)
Avoid using multiple columns to represent a one-to-many relationship (e.g., columns such as CHILD1, CHILD2 in a table called PARENT rather than putting the children in a separate table.
For readability, use the primary key name for foreign keys unless the same foreign key is used multiple times in the same table (e.g., state of work and state of residence for a person might both be foreign keys that reference a table of states)
Do not include two columns whose values are linked together (e.g., county name and county ID) unless one of the columns is the primary key of the table
Avoid allowing NULL values in columns that have a discrete range of possible values (e.g., integers between 1 and 10, inclusive)
(not applicable to DBF files, which do not support NULLs)
Avoid using multiple tables with similar structures that represent minor variants on the same entity (e.g., putting Boston parcels and Cambridge parcels in separate tables).
Why is this rule often hard to practice with GIS?
Plan ahead for transferring data to a different database. For example, you may want to move data from Oracle to DBF, or Microsoft Access to Oracle.
Avoid column names with characters with other than UPPER CASE letters (A-Z), digits (0-9), and the underscore (_). Other characters may not be accepted by a database. Some database systems may be case sensitive with regard to column names, while others are not.
Keep your column names relatively short. Different databases support different numbers of characters in column names (e.g., 30 for Oracle, 64 for Microsoft Access, 10 for DBF). Try to make column names differ in the first few characters rather than at the end to avoid column name duplication if the names are truncated during the conversion process (e.g., use COL1 and COL2, not LONG_COLUMN_NAME_1 and LONG_COLUMN_NAME_2).
Note that keeping column names short may be at odds with keeping your column names meaningful for neophytes. Be aware that you are making a tradeoff!
Remember that these are rules of thumb, not absolute laws! Bend the rules if you must but have a justification for your decision. The limitations of a GIS software package often provide a good reason.
Example: The Parcels Database
-
Table
|
Primary Key
|
PARCEL
|
PID, WPB
|
OWNERS
|
OWNERNUM
|
FIRES
|
PID, WPB, FDATE
|
TAX
|
PID, WPB
|
Cardinality Relationships
-
Primary Table Columns
|
Foreign Table Columns
|
Cardinality
|
OWNERS.OWNERNUM
|
PARCEL.ONUM
|
One-to-many
|
PARCEL.PID, PARCEL.WPB
|
FIRES.PID, FIRES.WPB
|
One-to-many
|
PARCEL.PID, PARCEL.WPB
|
TAX.PID, TAX.WPB
|
One-to-one
|
Parcels Database Enhancements
Eliminate intelligent key for parcel ID
Make the primary key of PARCEL a single column (e.g., PARCELID)
Merge the TAX and PARCEL tables or add the year to the tax table to keep track of taxes over time (changes the relationship to one-to-many)
Rename PARCEL table to PARCELS for consistency with other tables
Rename PARCEL foreign key ONUM to be consistent with the OWNERS table
Improve column names
Last modified 29 October 2000 by Wadembere, M. I.
Share with your friends: |