Databases, design, and organisation

Download 2 Mb.

Page	2/25
Date	11.05.2018
Size	2 Mb.
	#48547

1 2 3 4 5 6 7 8 9 ... 25

7.0 GIS DATABASE CREATION

7.1 Spatial database creation

Based on the design, the steps of database creation are worked out and a procedure laid down. The procedure for the spatial database creation is described below:

a) Master template creation: As discussed earlier, a master template is created as a reference layer and consisting of the district boundary, rivers etc. This template is then used for the component themes digitisation.

b) Thematic map manuscript preparation - Based on the spatial domain (in Bharatpur database it was the SOI graticule of 1:50,000 scale), the different theme oriented information is transferred from the base map to a mylar/transparent sheet. Spatial data manuscripts are mylars consisting features that are to be digitised. These manuscripts are prepared on a sheet-by- sheet basis for digitisation. These manuscripts consist "instructions" for digitisation or scanning - which include:

- Registration point locations and identifiers - feature codes as per the dictionary defined earlier. - feature boundaries - tolerance specifications - any other digitisation/scanning instructions

c) Digitisation of features: The theme features of the spatial dataset are then digitised/scanned using the GIS package. The digitisation is done for each mapsheet of the spatial reference. The master registration-point reference are used for the digitisation. The theme digitisation is done as a component into a copy of the master template layer.

d) Coverage editing - The digitised coverage is processed for digitisation errors such as dangles, constituting the overshoots or undershoots, and labels for polygons. This constitutes obtaining a report of these errors and then a manual editing of these features. Finally the coverage is processed for topology creation. As in the case of digitisation, the editing has also to be done on a mapsheet basis.

In the case of raster GIS packages, the topology construction may not be relevant. However, a clumping process to identify the clump of rasters having similar characteristics is essential.

d) Appending of mapsheets thematic features: The next step in the procedure is the appending or mosaicking of the different mapsheets into a single theme map for the whole extent. The graticule of registration points are used for this purpose.

e) Attribute coding verification: The attribute codes for the different categories need to be then verified and additional attributes - featurename, description etc. are added into the feature database. It is only after this procedure that the theme coverage is ready for GIS analysis. FIGURE 5 shows the procedure for spatial database creation.

7.2 Non-spatial database organisation

Non-spatial data elements are listed in TABLE 3 and most of these are available in analog mode- specifically the census data of 1981 and earlier. Towards converting it into a digital mode, a suitable application package could be used to configure a data entry system. At SAC, a dBASE interface module has been developed for the census data capture. It is a user friendly module for easy entry and editing of census data and organisation into a database and is based on the taluk-village hierarchy of the district. To this end it makes use of a primary file containing taluka-wise village names and their census code as listed in the census abstract. The module can be directly used for entering the census data in sector-wise databases which are created as secondary files. These secondary files are related to the primary file of villages based on the census village code as the keyitem [Rangwala et al, 1988]. Using this module, the census data for Bharatpur district has been organised into different sectoral databases. The Census data of 1991 is available in digital format as a set of database files. These database files could be structured into a sectoral organisation so that incorporation in GIS is easier.

7.3 Defining relations between spatial and non-spatial data

The GIS allows for the spatial data and the non-spatial features to be related or linked based upon a defined relationship. The relation in the GIS is a method of relating the same spatial entity to different non-spatial entities based on a linkkey.

The linkages are more pertinent for the village-wise data where village-boundary theme or settlement theme represents the spatial distribution of villages or the settlements and a one-to-one relationship can be defined for each of the village/settlement entity and the non-spatial data for the village/settlement. Apart from this, the village-taluk hierarchy can also be "forced" into all spatial datasets so as to be able to extract taluk-wise spatial feature information - either in spatial format or as non-spatial tabular output.

7.4 Integration of village boundaries - Issues

One of the important aspects of GIS database for districts/regions is the combined analysis of the tabular socioeconomic data and the thematic natural resources data. These two discrete datasets have different characteristics. The socioeconomic and developmental data is mainly the data collected by the Census which is on a village-wise basis. This dataset is based on a villa ge-taluk-district hierarchy and is mainly tabular. As against this, the thematic data on natural resources is based on a spatial framework. These datasets follow the SOI toposheet graticule and thus are based on the Polyconic projection system. An integrated planning exercise would require that these two datasets be combined/analysed together to derive meaningful plan inputs. The integration would be to:

a) merge the attributes of the villages and the natural resources for generating plan scenarios

b) spatial representation of the non-spatial tabular attributes of the villages.

c) amenability to aggregate and abstract the village attributes and the natural resources to the village-taluk-district and SOI graticule (for example 1:50,000 and 1:250,000 scale)

d) generate the village/taluk-wise information of natural resources for tabular updation.

A methodology for integrating the village boundary to a SOI mapbase has been developed at SAC and is based on projection of census village boundaries from a transparency to a standard SOI map base and transfer of village boundaries to the base [SAC and TCPO, 1992].

8.0 DATABASE UPDATION AND LINKAGES

Both the spatial and non-spatial database will have to be updated frequently so as to have the latest data for the further analysis/modeling. Some of the data elements could be relat ively static and thus could be created once and updated only when there are changes. Such elements are mainly administrative boundaries, elevation points, drainage maps etc. However, the data elements that have to be more frequently updated are as follows:

a) Spatial database: The updation of the spatial database will have to be based mainly on the inputs from RS data as also from the periodic surveys carried out by different agencies . Updation can be categorised as follows:

- RS data based updation - mainly landuse/cover (every year); forest type/densit y maps (once in two years); urban landuse maps (once a year for major cities and once in 3 years for towns/small cities); geological maps (once in 10 years); geomorphological/hydrogeomorphological maps (once in 3 years); GW potential maps (once in 2 years or whenever drought occurs); flood maps (pre- and post-flood season every year) etc. - Updation based on survey agency maps - mainly soil maps; forest maps; detailed geological and mineral data; road maps etc. These maps could be acquired from the respective agency and digitised into the database. These could be taken up whenever available - ideally once in 10 years.

b) Non-spatial data: Much of the non-spatial data are based on the census records and thus would be updated once every 10 years. However, it would be more proper if some of the non-spatial data is available more frequently - say, once every five years so as to be optimal for the planning process. A ten year schedule is not commensurate with the ongoing development as the database needs to be updated for intermediate developments in a more frequent manner. Otherwise, data of a decade would be used for a planning process and suggesting developmental plans which would have already taken place. Exchange of data from the GIS database to other computerised databases at district level can be done so as to be able to provide data for further use. This exchange would mean:

a) a non-spatial data exchange as the district does not have the capability to handle data in spatial format. In case the capability to handle spatial data is available then the spatial data exchange can also be visualised.

b) the non-spatial representation of all datasets in the GIS database. This non-spatial representation of data could be on a taluk-basis or village-basis.

Database management systems

DBMS

The origins of DBMS data models is in computer science (Clarke, 1997).

A DBMS contains:

 A data definition language

 A data dictionary

 A data entry module

 A data update module

 A report generator

 A query language

Data definition language (DDL):

DDL is the language used to describe the contents of the database (Modarres, 1998). DDL is the part of the DBMS that is allows the user to set up a new database, to specify how many attributes there will be what types and lengths or numerical ranges of each attribute will be and how much of the user is allowed to do (Clarke, 1997).

It is used to describe, for example, attribute names (field names), data types, location in the database, etc.

This establishes the data dictionary, a catalog of all of the attributes with their legal values and ranges.

The most management function is data entry, and since most entry of attribute data

Monotonous and may be by transcription from paper records, the DBMS's data-entry system should be able to enforce the ranges and limit entered into the data dictionary by definition language.

All data entry is subject to error, and first step after entry should be verification, and after that updated to reflect change.

Then the DBMS can be used to perform functions such as sorting, reordering, subsetting, and searching; to do so requires the use of query language, the part that allows the user to interact with the data to perform those tasks(Clarke, 1997).

Data manipulation and query language: Normally a fourth-generation language (4GL) is supported by a DBMS to form commands for input, edit, analysis, output, reformatting, etc. Some degree of standardisation has been achieved with SQL (Structured Query Language) (Modarres, 1998).

DBMS queries are sorting, renumbering, subsetting, and searching.

The query language is the user interface for searching.

GIS DATABASE

INTRODUCTION

The real world is too complex for our immediate and direct understanding. We create "models" of reality that are intended to have some similarity with selected aspects of the real world. Data bases are created from these "models" as a fundamental step in coming to know the nature and status of that reality (Modarres, 1998).

The Geographical Information System (GIS) has two distinct utilisation capabilities - the first pertaining to querying and obtaining information and the second pertaining to in tegrated analytical modelling. However, both these capabilities depend upon the core of the GIS - the database that has been organised. Many a GIS utilisation have been limited because of improper database organisation. The importance of the GIS database stems from the fact that the data elements of the database are closely interrelated and thus need to be structured for easy integration and retrieval. The GIS database has also to cater to the different needs of applications. In general, a proper database organisation needs to ensure the following [Healey, 1991; NCGIA, 1990]:

 flexibility in the design to adapt to the needs of different users.

 a controlled and standardised approach to data input and updation.

 a system of validation checks to maintain the integrity and consistency of the data elements.

 a level of security for minimising damage to the data.

 minimising redundancy in data storage.

While the above is a general consideration for database organisation, in a GIS domain the considerations are pertinent with the different types and nature of data that need to be organised and stored.

What is a database?

Database: is a large collection of data in a computer system, organized so that it can be expanded, updated, and retrieved rapidly for various uses. It could be a file or a set of files (Ronli, 1999). File: is a collection of organized records of information. A record has usually a record number and record content. The file has a name give by the system or user (Ronli, 1999).

A database is a collection of information related to a particular subject or purpose, such as tracking customer orders or maintaining a music collection (microsoft, 1997).

Database: is self-describing collection of integrated records (Kroenke, 1995)

Database is self-describing: It contains, in addition to the user's source data, a description of its structure. This description is called a data dictionary (or data directory or metadata). It is the data dictionary that makes program/data independence possible.

A database is a collection of integrated records: Bits are aggregated into bytes or characters; characters are aggregated into fields; fields are aggregated into records; and records into files. Bits - characters - fields - records -files

Others are metadata, indexes that are used to represent relationship among the data and also to improve the performance of database application, the database often contains data about the applications that uses the database. The structure of data entry form, or a report, is sometimes part of the database, which is called application metadata. Thus, database contains four types of data: files of the user's data, indexes, and application metadata.

Files + metadata + indexes + application = Database.

A spatial data base is a collection of spatially referenced data that acts as a model of reality.

Spatial database: stores GEOREFERENCED data. For example, wells with their locations, bank account holders with addresses, and property taxes with boundaries

INTRODUCTION TO DATABASE PROCESSING

A successful GIS begins with a database, so it important to first take a look at database

In GIS, the database is important as its creation will often account for up to three-quarters of the time and effort involved in developing a geographic information system. (Kenneth et al, 1996).

It is important, however, to view these GIS databases as more than simple stores of information. The database is used to abstract very specific sorts of information about reality and organize it in a way that will prove useful. The database should be viewed as a representation or model of the world developed for a very specific application (Kenneth et al, 1996). There are very many things involved in the design of a database.

GIS has become more powerful as database products has be more powerful and database technology more accessible (Kroenke, 1995). This has been so because

 The personal computer DBMS have become more powerful and easier to use, and their price has decreased substantially. Products such as Microsoft's access not only provide the power to a true relational DBMS on a PC, but also include facilities for developing GUI-based forms, reports, and menus.

 The new modelling methodologies and tools, especially those based on the object-oriented thinking, have become available. studies show that semantic object modelling( say with SALSA) to be far superior to the old techniques, such as the entity-relationship modelling (say with IEF) approach: able to create better models, faster, and with greater satisfaction

 There has the emergence of client server processing in general and especially client server database processing in particular. This enables companies to download main frame to a server database on a PC, making ease for personal to access to database.

THE DATA IN DATABASE

Broadly categorised, the basic data for the GIS database has two components:

a) Spatial data - consisting of maps and which have been pr-pared either by field surveys or by the interpretation of Remote-ly Sensed (RS) data. Some examples of the maps are the soil survey map,geological map, landuse map from RS data, village map etc. Much of these maps are available in analog form and it is of late that some map information is available directly in digital format. Thus, the incorporation of these maps into a GIS depends upon whether it is in analog or digital format - each of which has to be handled differently.

b) Non-spatial data - attributes as complementary to the spatial data and describe what is at a point, along a line or in a polygon and as socio-economic characteristics from census and other sources. The attributes of a soil category could be the depth of soil, texture, erosion, drainage etc and for a geological category could be the rock type, its age, major composition etc. The socio-economic characteristics could be the demographic data, occupation data for a village or traffic volume data for roads in a city etc. The non-spatial data is mainly available in tabular records in analog form and need to be converted into digital format for incorporation in GIS. However, the 1991 census data is now available in digital mode and thus direct incorporation to GIS database is possible.

Data input In database

There are very many methods of data entry in database, these include use of advancing technologies, such as scanning, feature recognition, raster-to-vector conversion, and image processing, along with traditional digitizing and key entry methods (Kroenke, 1995).

Digitizing on a tablet captures map data by tracing lines from a map by hand, using a cursor and an electronically-sensitive tablet. The result is a string of points with (x, y) values.

Scanning places a map on a glass plate, and passes a light beam over it measuring the reflected light intensity. The result is a grid of pixels. Image size and resolution are important to scanning. Small features on the map can drop out if the pixels are too big.

Attribute data can be thought of as being contained in a flat file. This is a table of attributes by records, with entries called values.

How data is represented in database

Kenneth et al, 1996).

It is important to realize that this non-spatial data can be filed away in several different forms depending on how it needs to be used and accessed. Perhaps the simplist method is the flat file or spreadsheet, where each geographic feature is matched to one row of data

Flat Files and Spreadsheets

A flat file or spreadsheet is a simple method for storing data. All records in this data base have the same number of "fields". Individual records have different data in each field with one field serving as a key to locate a particular record (Kenneth et al, 1996). For a person, or a tract of land there could be hundreds of fields associated with the record. When the number of fields becomes lengthy a flat file is cumbersome to search. Also the key field is usually determined by the programmer and searching by other determinants may be difficult for the user. Although this type of database is simple in its structure, expanding the number of fields usually entails reprogramming. Additionally, adding new records is time consuming, particularly when there are numerous fields. Other methods offer more flexibility and responsiveness in GIS.

Hierarchical Files

Hierarchical files store data in more than one type of record. This method is usually described as a "parent-child, one-to-many" relationship (Kenneth et al, 1996). One field is key to all records, but data in one record does not have to be repeated in another. This system allows records with similar attributes to be associated together. The records are linked to each other by a key field in a hierarchy of files. Each record, except for the master record, has a higher level record file linked by a key field "pointer". In other words, one record may lead to another and so on in a relatively descending pattern. An advantage is that when the relationship is clearly defined, and queries follow a standard routine, a very efficient data structure results. The database is arranged according to its use and needs. Access to different records is readily available, or easy to deny to a user by not furnishing that particular file of the database. One of the disadvantages is one must access the master record, with the key field determinant, in order to link "downward" to other records.

Relational Files

Relational files connect different files or tables (relations) without using internal pointers or keys. Instead a common link of data is used to join or associate records. The link is not hierchical (Kenneth et al, 1996). A "matrices of tables" is used to store the information. As long as the tables have a common link they may be combined by the user to form new inquires and data output. This is the most flexible system and is particularly suited to SQL (structured query language). Queries are not limited by a hierarchy of files, but instead are based on relationships from one type of record to another that the user establishes. Because of its flexibility this system is the most popular database model for GIS. They remain the dominant form of DBMS today (Clarke, 1997).

They are simple, and user's standpoint is an extension of the flat file model. The major difference is that a database can consist of several flat files, and each can contain different attributes associated with a record.

Flat, Hierarchical, and Relational Files Compared

Structure	Advantages	Disadvantages
Flat Files	 Fast data retrieval  Simple structure and easy to program	 Difficult to process multiple values of a data item  Adding new data categories requires reprogramming  Slow data retrieval without the key
Hierarchical Files	 Adding and deleting records is easy  Fast data retrieval through higher level records  Multiple associations with like records in different files	 Pointer path restricts access  Each association requires repetitive data in other records  Pointers require large amount of computer storage
Relational Files	 Easy access and minimal technical training for users, as data is kept in different files.  Flexibility for unforeseen inquiries as it allows to assemble any combination of attributes and records as long as they are linked by a key attribute.  Easy modification and addition of new relationships, data, and records  Physical storage of data can change without affecting relationships between records	 New relations can require considerable processing  Sequential access is slow  Method of storage an disks impacts processing time  Easy to make logical mistakes due to flexibility of relationships between records

Now, let us consider a couple of examples of matching applications to database structures.

Exploratory research--flat files are easy to organize, space is not particular problem

Government agencies--hierarchical systems are particularly attractive

Planning and development--relational might be justified for flexibility

Directory: elearning -> material
material -> Surveying is the art of making suitable measurements in horizontal or vertical planes. This is one of the important subjects of civil engineering. Without taking a survey of the plot where the construction is to be carried out
material -> Consumer behaviour
material -> The language spectrum
elearning -> Arc 359 environmental psychology architectural determinism
material -> Selected Poems
material -> Lecture one: The Demographic Impact of the Trans-Atlantic Slave Trade on African Societies
material -> A life given up for the people
material -> Study Unit 4 computer systems and data communication

Download 2 Mb.

Share with your friends:

1 2 3 4 5 6 7 8 9 ... 25