|DATABASES, DESIGN, AND ORGANISATION
Database management system
A database is a collection of information that's related to a particular subject or purpose, such as tracking residential population or maintaining a music collection. If your database isn't stored on a computer, or only parts of it are, you may be tracking information from a variety of sources that you're having to coordinate and organize yourself.
Within a database, divide your data into separate storage containers called tables; view, add, and update table data by using online forms; find and retrieve just the data you want by using queries; and analyse or print data in a specific layout by using reports. Allow users to view, update, or analyse the database's data from the Internet or an intranet by creating data access pages.
To store your data, create one table for each type of information that you track. To bring the data from multiple tables together in a query, form, report, or data access page, define relationships between the tables.
To find and retrieve just the data that meets conditions that you specify, including data from multiple tables, create a query. A query can also update or delete multiple records at the same time, and perform predefined or custom calculations on your data. To easily view, enter, and change data directly in a table, create a form
The issue of designing and organising a GIS database has to be considered in its entirety and needs a conceptual understanding of different disciplines, - cartography and mapmaking, geography, GIS, databases etc. here an overview of the design procedure that could be adopted and the organisational issues have been addressed. The issue of updating the database and the linkage aspect of the GIS database to other databases has also been addressed.
The Geographical Information System (GIS) has two distinct utilisation capabilities - the first pertaining to querying and obtaining information and the second pertaining to in targeted analytical modelling. The importance of the GIS database stems from the fact that the data elements of the database are closely interrelated and thus need to be structured for easy integration and retrieval. The GIS database has also to cater to the different needs of applications. In general, a proper database organisation needs to ensure the following [Healey, 1991; NCGIA, 1990]:
a) Flexibility in the design to adapt to the needs of different users.
b) A controlled and standardised approach to data input and updation.
c) A system of validation checks to maintain the integrity and consistency of the data elements.
d) A level of security for minimising damage to the data.
e) Minimising redundancy in data storage.
THE DATA IN GIS
Broadly categorised, the basic data for the GIS database has two components:
a) Spatial data - consisting of maps and which have been pr-pared either by field surveys or by the interpretation of Remote-ly Sensed (RS) data. Some examples of the maps are the soil survey map,geological map, landuse map from RS data, village map etc. Much of these maps are available in analog form and it is of late that some map information is available directly in digital format. Thus, the incorporation of these maps into a GIS depends upon whether it is in analog or digital format - each of which has to be handled differently.
b) Non-spatial data - attributes as complementary to the spatial data and describe what is at a point, along a line or in a polygon and as socio-economic characteristics from census and other sources. The attributes of a soil category could be the depth of soil, texture, erosion, drainage etc and for a geological category could be the rock type, its age, major composition etc. The socio-economic characteristics could be the demographic data, occupation data for a village or traffic volume data for roads in a city etc. The non-spatial data is mainly available in tabular records in analog form and need to be converted into digital format for incorporation in GIS. However, the 1991 census data is now available in digital mode and thus direct incorporation to GIS database is possible.
2.1 MEASUREMENT OF GEOGRAPHICAL DATA
The data in a GIS is generally having a geographical connotation and thus it carries the normal characteristics of geographical data. The measurement of the data pertains to the description of what the data represents - a naming or legending or classification function and the calculation of their quantity - a counting or scaling or measurement function. Thus, scaling of the data is important while organising a GIS database. There are four scales by which data is represented [Brien, 1992]:
a) nominal, where the data is principally classified into mutually exclusive sets or levels based on relevant characteristics. The landuse information on a map representing the different categories of landuses is a nominal representation of data. The nominal scale is the commonly used measure for spatial data.
b) ordinal, which is a more sophisticated measurement as the classes are placed into some form of rank order based on a logical property of magnitude. A Ground water prospect map showing different classes of prospects and categorised from "high prospect" to "low prospect" is an ordinal scale measurement.
c) interval, which is continous scale of measurement and is crude representation of numeric data on a scale. Here, the class definition is a rank order where the differences between the ranks are quantified. The representation of population density in rank order is an example of interval data.
d) ratio, which is also a continous scale where the original of the scale is real and not imaginary. Further ratio interval represents the scaling between individual observation in the dataset and not just between datasets. An example of the ratio scale is when each value is normalised against a reference - generally an average or maxima or minima.
The above four scales have been defined as an hierarchy and thus the ratio scale exhibits all the defining operations while those further down the hierarchy possess fewer. Thus, a ratio scale may be reexpressed as an interval, ordinal or nominal data but nominal data cannot be expressed as ratios. Further, the nominal and ordinal scale are used to define categorical data - which is the method of representing maps or spatial data and the interval and ratio data are used to define continous data. TABLE - 1 shows the characteristics of the scales.
GIS database design
Just as in any normal database activity, the GIS database also needs to be designed so as to cater to the needs of the application that proposes to utilise it. Apart from this the design would also:
a) provide a comprehensive framework of the database.
b) allow the database to be viewed in its entirety so that interaction and linkages between elements can be defined and evaluated.
c) permit identification of potential bottlenecks and problem areas so that design alternatives can be considered.
d) identify the essential and correct data and filter out irrelevant data
e) define updation procedures so that newer data can be incorporated in future.
The design of the GIS database will include three major elements [NCGIA, 1990]:
a) Conceptual design, basically laying down the application requirements and specifying the end- utilisation of the database. The conceptual design is independent of hardware and software and could be a wish-list of utilisation goals.
b) Logical design, which is the specification of the database vis-a-vis a particular GIS package. This design sets out the logical structure of the database elements determined by the GIS package.
c) Physical design, which pertains to the hardware and software characteristics and requires consideration of file structure, memory and disk space, access and speed etc.
Each stage is interrelated to the next stage of the design and impacts the organisation in a major way. For example, if the concepts are clearly defined, the logical design is easier done and if the logical design is clear the physical design is also easy. FIGURE 1 shows a framework of the design elements and their relationship. The success or failure of a GIS project is determined by the strength of the design and a good deal of time must be allocated to the design activity. SAC has evolved a set of design guidelines for the GIS database creation [Rao et al (1990)] which has been adopted for implementation of GIS projects for Bombay Metropolitan Region (BMR) [SAC and BMRDA, 1992]; Regional planning at district level for Bharatpur [SAC and TCPO, 1992]; Wasteland Development for Dungarpur [SAC, 1993]. Much of what has been discussed here is based on the design guidelines evolved and also the experience gained in the execution of the different GIS projects. To illustrate the design aspects of a GIS database examples from design of the Bharatpur district database will be explained and referred.
Designing a database
Good database design is the keystone to creating a database that does what you want it to do effectively, accurately, and efficiently.
Steps in designing a database
Determine the purpose of your database
Determine the tables you need
Determine the fields you need
Identify the field or fields with unique values in each record
Determine the relationships between tables
3.1 GIS - CORE OF THE DATABASE
The Geographical Information system (GIS) package is the core of the GIS database as both spatial and non-spatial databases have to be handled. The GIS package offers efficient utilities for handling both these datasets and also allows for the spatial database organisation; non-spatial datasets organisation - mainly as attributes of the spatial elements; analysis and transformation for obtaining the required information; obtaining information in specific format (cartographic quality outputs and reports); organisation of a user-friendly Query-system. Different types of GIS packages are available and the GIS database organisation depends on the GIS package that is to be utilised. Apart from the basic functionality of a GIS package, some of the crucial aspects that impact the GIS database organisation are as follows:
a) data structure of the GIS package. Most GIS packages adopt either a raster or vector structure, or their variants, internally to organise spatial data and represent realworld features.
b) attribute data management. Most of the GIS packages have embedded linkage to a Data Base Management System (DBMS) to manage the attribute data as tables.
c) a tiled concept of spatial data handling, which is fundamental to the way maps are represented in real world. For example, 16 SOI 1:50,000 map sheets make up 1 1: 250,000 sheet and 16 1:250,000 sheet make 1 1:1,000,000 sheet. This map tile graticule could also be represented in a GIS and some GIS package allow tile-data handling.
4.0 GIS DATABASE - CONCEPTUAL DESIGN
The Conceptual Design (CD) of a GIS database defines the application needs and the end objective of the database. Generally, this is a statement of end needs and is defined fuzzily. However, it crystallises and evolves as the GIS database progresses but within the framework of the broad statement of intentions. However, the clearer and well defined the CD the easier it is for the logical designing of the GIS database. Some of the key issues that merit consideration for the CD are:
a) Specifying the ultimate use of the GIS database as a single statement. Some examples could be GIS DATABASE FOR URBAN PLANNING AT MICRO-LEVEL; GIS DATABASE FOR WATER SUPPLY MANAGEMENT; GIS DATABASE FOR WILDLIFE HABITAT MANAGEMENT. The important aspect here is the management of a particular resource, facility etc and thus the statement would generally include the management activity.
b) Level or detail of GIS database which indicates the scale or level of the data contents of the database. A database designed for MICRO-LEVEL would require far more details than one designed for MACRO-LEVEL applications. TABLE 1 illustrates the relationship between level and applications which could be used as a guideline In most of the cases the level or detail is implicit in the statement of end use.
c) Spatial elements of GIS database, which depends upon the end use and defines the spatialdatasets that will populate the database. The spatial elements is application specific and is mainly made of maps obtained from different sources.
The spatial elements could be categorised into primary elements, which are the ones that are digitised or entered into the database and derived elements, those that are derived from the primary elements based on a GIS operation. For example, the contours/elevation points could be primary elements but the slope that is derived from the contours/elevation points is a derived element. This distinction of the primary and secondary element is useful in estimating the database creation load and also in scheduling GIS operations. TABLE 2 illustrates some of the primary elements and derived elements of a GIS database for district level planning applications.
d) Non-spatial elements of GIS database which are the non-spatial datasets that would populate the GIS database. The actual definition of the non-spatial elements would depend upon the end use and is application specific. For example, non-spatial data for forest applications would include data on tree species, age, production etc and non-spatial data for urban applications would include wardwise population, services and facilities data and so on. TABLE 3 shows some of the typical non-spatial data elements for a district planning application. Much of the non-spatial data comes from sources like the Census department, municipalities, resource survey agencies etc.
e) Source of spatial and non-spatial data is an important design issue as it brings about the details of the data collection activity and also helps identify the need for data generation. Most of the spatial data or thematic maps are available from the central and state survey agencies and non-spatial data is available as Census records or from the survey departments.
f) Age of data is an important design issue as it, in turn, defines the age of the database - making it either useful or useless for a particular end application. For example, if the application is to study the impact of pollution in an urban area then the pollution data needs to be current and the use of past data would render the impact analysis ineffective.
g) Spatial data domain, pertaining to the basic framework of the spatial datasets. Most of the spatial data sets follow the Survey of India (SOI) latitude-longitude coordinate system (as is given in the SOI maps) and thus, the spatial data base needs to follow the standards of the SOI mapsheets.
h) Impact of study area extent, defining the actual geographical area for which the GIS database is to be organised. Mostly, if SOI framework is adopted, the coverage will be in non-overlapping SOI map sheets - extent in certain mapsheets is partial as against the full extent in certain mapsheet. The extent definition also lays down the limits of the database and also helps in the logical design of the spatial elements.
i) Spatial Registration framework, is essential to adopt a standard registration procedure for the database. This is generally done by the use of registration points - also called TIC points in GIS. These registration points could be the corners of the graticule of the spatial domain - say the four corners of the SOI mapsheet at 1: 50,000 scale or control points that can be discerned - road intersections, railline-road intersections, bridges etc in each spatial element that is to populate the database. Unique identifiers for each registration point helps in locating and registering the database. FIGURE 2 shows the scheme of registration points used for the Bharatpur project. This scheme is a "shared" method of points where each registration is a part of more than one mapsheet. This helps in the map joining/mosaicking and sheet-by-sheet data digitisation process.
j) Non-spatial data domain specifying the levels of non-spatial data. The non-spatial datasets are available at different levels and it is essential to organise the non-spatial data at the lowest unit. The higher levels could then be abstracted from the lowest unit whenever required. For example for the Bharatpur database non-spatial data was available at different levels of administrative units - district, taluk and village. The village was the lowest unit at which the non-spatial data was available and thus non-spatial data domain was considered at the village level.
5.0 GIS DATABASE - LOGICAL DESIGN
The Logical Design of the GIS database pertains to the logical definition of the database and is a more detailed organisation activity in a GIS. Most of the design issues are specific to GIS and thus the scope varies with the type and kind of GIS package to be utilised. However, in an overall manner most of these issues are common over the different GIS packages. SAC has evolved a set of guidelines for the logical designing of the GIS database which have been adopted in the organisation of GIS databases for BMR, Bharatpur, Dungarpur etc. TABLE 4 shows some of these critical design guidelines adopted which could be adopted for the GIS database organisation. Some of the key issues are:
a) Coordinate system for database, which determines the way coordinates are to be stored in the GIS packages. Most GIS package offer a range of coordinate systems depending on what projection systems are employed. The coordinate system for the GIS database needs to be in appropriate units that represent the geographic features in their true shape and sizes. The coordinate system would generally get defined by the spatial domain of the GIS database. For example, if the SOI 1:50, 000 graticule has been adopted for the database, it is essential to have the same coordinate/projection system that SOI adopts. All SOI toposheets on 1:50, 000 scale adopt the Polyconic projection system. Further, the units of the polyconic projection are represented in actual ground distances - meters. As a result all spatial elements of the GIS database are referenced in an uniform coordinate system. This would allow for easy integration of spatial datasets as part of the analysis and also maintain a homogeneity in the GIS database.
b) Spatial Tile design pertains to the concept of a set of map tiles composing the total extent. For example, the district of Bharatpur is organised in 19 map tiles of SOI sheets at 1:50,000 scale. Certain GIS packages allow for the organisation of tiles which facilitates the systematic data entry on a tile-by-tile basis and also the horizontal organisation of spatial data.
FIGURE 3 shows the concept of horizontal and vertical organisation of the spatial data in the database.
c) Defining attribute data dictionary: The data dictionary is an organised collection of attribute data records containing information on the feature attribute codes and names used for the spatial database. The dictionary consists descriptions of the attribute code for each spatial data element. TABLE 5 shows a partial listing of the attribute data dictionary adopted for Bharatpur database.
d) Spatial data normalisation is akin to the Normalisation of relations and pertains to finding the simplest structure of the spatial data and identifying the dependency between spatial elements. Normalisation avoids of general information and also reduces redundancy. A process of normalisation of the spatial data is also essential to identify master templates and component templates. This normalisation process insures that the coincident component features of the various elements are coordinate coincident - thus limiting overlay sliver problems. This also ensures the redundancy in digitisation process as master templates are digitised only once and form a part of all elements. For example, in the Bharatpur database, the following features have been identified as master templates: - district /taluka boundary- rivers/streams- water bodies These elements need to occur in each spatial element and also because they need to be coordinate coincident.
e) Tolerances definitions are an important aspect of the GIS database design. The tolerances specify the error-level associated with each spatial element. The different tolerances that need to be considered are:
- Coordinate Movement Tolerance (CMT) which specifies the limit upto which coordinates could move as part of a GIS operation. If the tolerance is not stringent then repeated GIS operation could move the coordinates significantly so as to distort the size and shape of the features.
- Weed Tolerance (WT) which pertains to the minimum separation between coordinates while digitising. For example a straight line could be represented by two vertices and intermediate vertices are redundant. A proper weed tolerance would not create the intermediate vertices at all and thus not populate the database unnecessarily.
- Minimum Spatial Unit (MSU) which indicates the smallest representable area in the database. Any polygon feature having lesser area than the MSU would be aggregated. The MSU is an indication of the resolution of the database. The concept of MSU is pertinent for vector GIS databases and is not applicable for raster GIS databases as in a raster GIS a raster/grid becomes the MSU as no features below the MSU can be resolved. The tolerances are all dependent on the scale or level of database. Some of the general guidelines suggested for different scales are listed in TABLE 7.
f) Spatial and non-spatial data linkage where the interlinkages of the spatial and non-spatial data are defined. These linkages and interrelationships are an important element of the GIS database organisation as they define the userrelations or userviews that can be created. There are two major linkage aspects involved:
- for all spatial data sets representing resources information or thematic information and those other than administrative maps, the linkage is achieved through the data dictionary feature code at the time of creation/digitisation itself.
- for administrative maps - village and taluk maps, the linkage is achieved on a one-to-one relation based on a unique code for each village or the taluk. For example, in Bharatpur database this code has been identified as the census code for the 1463 villages/settlements in the district. Thus the internal organisation of the spatial village/taluk boundaries is flexible to relate to the village-wise non-spatial database on a one-to-one basis. FIGURE 4 shows the type of relation that was adopted for the Bharatpur database.
6.0 GIS DATABASE PHYSICAL DESIGN
The Physical Design (PD) pertains to the assessment of the load, disk space requirement, memory requirement, access and speed requirements etc for the GIS. Much of these pertain to the hardware platform on which the GIS will operate. There are no standards on PD aspects available and much of the design has to be based on experience. However, some of the key issues are as follows:
a) Disk space requirement is a major concern for GIS database designers. The paradigm THERE IS NO END TO A GIS DATABASE sums it up all as most GIS databases have realised how fast their disk space estimates have gone awry. As an illustration of this aspect, the Bharatpur database takes up about 54 MB space for the actual data. Any further integrated analysis which would create intermediate outputs would take anywhere between 3-4 times the normal space. Thus, experience shows that for a district database a 300 MB disk is just sufficient and a higher disk space would be appropriate. The differences in space utilisation of different GIS packages is reflected in a benchmark application run on PC-ARC/INFO and ISROGIS packages. The PC-ARC/INFO utilised 26.84 MB space while for the same dataset the ISROGIS utilised 35.22 MB [Rao et al, 1993]. This is to illustrate the range of space utilisation variation.
b) Load of database is also difficult to determine as there is no way of estimating the number of points, line, polygons in each spatial element. However, broad guidelines could be evolved and estimates made. For example, the National Capital Region Planning Board (NCRPB) have adopted a two way categorisation of spatial elements - three-level qualitative density based categorisation of maps and two-level full or partial coverage based categorisation for the 67 maps covering the NCR. Based upon this the total spatial maps to be organised is estimated as 657 map sheets.
c) Access and speed requirements are more oriented towards the ability to handle large and dense maps rather that the time involved in processing. The GIS applications are not real-time applications and thus the access time or speed becomes a secondary aspect. A benchmark study on PC-ARC/INFO and ISROGIS has sown that the time taken for an overall application - consisting of various steps is 11.5 hrs and 7.5 hours respectively [Rao et al, 1993]. The point to be noted is that even though there is a 4 hrs difference the implication on the application is not driven by the difference as it is not real-time. d) File and data organisation in GIS is an activity which is taken care of by the GIS package itself and no design aspects need be considered for the physical organisation of files. Each GIS package has its own file system organisation which could be either a single file or a set of files and are transparent to the user.