Mining real estate listings using oracle data warehousing and predictive regression method



Download 42.71 Kb.
Date06.08.2017
Size42.71 Kb.


MINING REAL ESTATE LISTINGS USING

ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION METHOD

Wuri Wedyawati

B. S., Sekolah Tinggi Teknik Surabaya, Indonesia, 2000


PROJECT

Submitted in partial satisfaction of

the requirements for the degree of
Master of Science

in

Computer Science



at

CALIFORNIA STATE UNIVERSITY, SACRAMENTO

SPRING

2004


MINING REAL ESTATE LISTINGS USING

ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION METHOD

A Project

By

Wuri Wedyawati


Approved by:




, Committee Chair

Dr. Meiliu Lu



, Second Reader

Dr. Don Warner


Date:

Student: Wuri Wedyawati

I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the project.

Dr. Cui Zhang, Graduate Coordinator Date

Department of Computer Science

Abstract
of


MINING REAL ESTATE LISTINGS USING

ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION METHOD


by
Wuri Wedyawati
Statistical data analysis is the most well-established set of methodologies for data mining. Statistics offered a variety of methods for data mining, including different types of regression analysis. The object of this project is to develop a knowledge discovery system for prospective real estate sellers and buyers to determine properties price, based on available sold listings in an area. The prediction of continuous values, such as properties selling price, is modeled by a statistical technique called predictive regression.

The prerequisite of data mining is to design a data warehouse that contains a wide variety of real estate listings. The data source is extracted from Multiple Listing Services (MLS) database. It is cleansed and transformed at the staging area. The data warehouse design for this project is a star schema with one large fact table surrounded by many dimension tables. Loading data into the warehouse is the final step in creating data warehouse as preparation for data mining.

This project uses Visual Basic .NET to create user interface. The communication between Oracle and .NET framework is established by adding Oracle Provider for OLE DB (OraOLEDB) component as reference in the project. The result of this project can serve as a case study for a data warehousing and data mining course, such as CSC 196K, Introduction to Data Mining and Data Warehousing, at California State University Sacramento.

, Committee Chair
Dr. Meiliu Lu

Date:


Acknowledgements
I would like to thank Dr. Meiliu Lu, my Master Project Advisor, who has received my proposal and become my advisor. She has guided and supported me from the beginning until the end of this project. She has spent her invaluable time to review and revise this report. She gave me a lot of valuable inputs and feedbacks in this report which make this report better and better. She gave me an opportunity to present my master project in her class. I am very grateful to her for her constant encouragement and help.

I also would like to thank Dr. Don Warner, my second reader, who has received my master project proposal and become my second reader. I appreciate for his valuable time to review and revise this report. I am thankful to him as the Computer Science Department Chair for he has been a good leader that makes us very proud as CSUS Computer Science alumni.

I am thankful to Dr. Cui Zhang, my Graduate Coordinator, who always cares to all of Computer Science students. I would like to express my appreciation to all Computer Science faculties and staffs for their support and help.

I would like to thank my wonderful husband, Agus Hartono, for all his love, support, and encouragement so that we can do our master project and graduate at the same time. I want to thank my dearest friends in KTM Saint Anne who always support me in prayers. Finally, I thank to God for His exceptional strength and love in every steps of my life.

Table of Contents
Page

Acknowledgments ………………………………………………………………………vi

List of Tables ……………………………………………………………………………ix

List of Figures ...…………………………………………………………………………x

Real Estate Price Prediction System Requirements .…………………………………….xi

Chapter


1. Introduction 1

2. Data Warehouse Methodology 4

2.1 Extraction ……………………………………...…..……………….5

2.2 Transformation and Cleansing …………………………...………...6

2.3 Modeling .………………………………………..…………………7

2.4 Transport ………………………………………………………….10

3. MasterDW Data Warehouse Design and Implementation ……………....11

3.1 Extraction ……………………………………...…..……………...13

3.2 Transformation and Cleansing …………………………………....15

3.2.1 Transformation and Cleansing 1 …………….…………….15

3.2.2 Update Process for the Result of Transformation and

Cleansing 1 …………………………………..…………….19

3.2.3 Transformation and Cleansing 2 ………..………………....22

3.2.4 Duplication Detection for Office and Agent Records ……..29

3.3 Modeling ………………………………….……………..………..30

3.3.1 Setting Up the Environment …...………………………….31

3.3.2 Creating the Tablespaces and Data Files .………………....32

3.3.3 Creating the Tables, Contraints, and Indexed ………….….36

3.3.4 Defining Security …….…………...……………………….43

3.4 Transport ………………………………………………………….44

4. Real Estate Price Prediction …..…………………………………………47

5. Results …………………………………………………………………...52

6. Conclusion ...………………………………………………………….....58

Appendix A. List of Residential Fields …..………………………………………........61

Appendix B. Transformation and Cleansing 1 Source Code …………………….........70

Appendix C. Transformation and Cleansing Log File …………………………...........78

Appendix D. Transformation and Cleansing 2 Source Code .......................................111

Appendix E. Duplicate Agent and Office Records Detection Source Code …..……..134

Appendix F. Area Table Load ……….…………………………………….…..…….140

Appendix G. Office Table Load …….…………………………………….…..……...143

Appendix H. Agent Table Load …….…………………………………….…..……...146

Appendix I. Residential Table Load ………………………………………………...150

Appendix J. Real Estate Price Prediction Source Code ...…………………………...157

Bibliography…………………………………………………………………………....169

List of Tables
Table 3.1 MASTERDW Tablespaces, Datafiles, and Its Content Meaning …....……..36

Table 4.1 Result of An Example Query for Predictive Regression Calculation ..……..50

Table 4.2 The Value of X and Y of Table 4.1.……………….………………………..50

List of Figures


Figure 2.1 An Example of Star Schema ……………...…………………………………8

Figure 2.2 An Example of Snowflaking Schema ……………...………………………..9

Figure 3.1 Building MASTERDW Data Warehouse Process ……….………………..12

Figure 3.2 Star Schema for MASTERDW Data Warehouse ………...………………..30

Figure 3.3 MASTERDW Data Warehouse Diagram ………………………………….39

Figure 5.1 Real Estate Price Prediction Initial Form ……………………………….....52

Figure 5.2 Area Number Error Message Box …………………………………………53

Figure 5.3 Approximate Square Footage Error Message Box .………………………..53

Figure 5.4 Number of Bedrooms Error Message Box ...………………………………54

Figure 5.5 Number of Full Bathrooms Error Message Box …………………………...54

Figure 5.6 Number of Half Bathrooms Error Message Box …………………………..55

Figure 5.7 Number of Bedrooms Error Message Box ………………………………...55

Figure 5.8 No Record Found Error Message Box …………………………….............56

Figure 5.9 Real Estate Price Prediction Result Form………………………………….57


Real Estate Price Prediction System Requirements
Software requirements:

  • Operating System: Microsoft® Windows 98 or higher.

  • .NET™ Framework for all Windows version that does not come with it.

  • Microsoft Visual Studio .NET™ (to compile the output codes).

  • Oracle Provider for OLE DB (Iterop.ORAOLEDBLib.dll)

  • Oracle 8i

Minimum hardware requirements:



  • All the minimum hardware requirements for installing and running Microsoft® Windows 98 or higher.

  • Additional hard disk space for .NET™ Framework (20 MB).

  • Additional hard disk space for Microsoft® Visual Studio .NET™ (to compile the output codes). The hard disk size depends on the Visual Studio .NET™ version, the complete Professional version needs about 1.7 GB.

  • Additional hard disk space for Oracle 8i (about 2 GB)

  • Additional hard disk space for MASTERDW data warehouse (minimum 100 MB)

Recommended hardware requirements:



  • Processor: Pentium® II or higher.

  • Memory (RAM): 64 MB or higher.

  • VGA card that supports at least 1024 x 768 resolutions in 16-bit color mode.

  • Screen resolution: 1024 x 768.



Download 42.71 Kb.

Share with your friends:




The database is protected by copyright ©ininet.org 2020
send message

    Main page