MINING REAL ESTATE LISTINGS USING
ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION METHOD
Wuri Wedyawati
B. S., Sekolah Tinggi Teknik Surabaya, Indonesia, 2000
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
Master of Science
in
Computer Science
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
SPRING
2004
MINING REAL ESTATE LISTINGS USING
ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION METHOD
A Project
By
Wuri Wedyawati
Approved by:
, Committee Chair
Dr. Meiliu Lu
, Second Reader
Dr. Don Warner
Date:
Student: Wuri Wedyawati
I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the project.
Dr. Cui Zhang, Graduate Coordinator Date
Department of Computer Science
Abstract
of
MINING REAL ESTATE LISTINGS USING
ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION METHOD
by
Wuri Wedyawati
Statistical data analysis is the most well-established set of methodologies for data mining. Statistics offered a variety of methods for data mining, including different types of regression analysis. The object of this project is to develop a knowledge discovery system for prospective real estate sellers and buyers to determine properties price, based on available sold listings in an area. The prediction of continuous values, such as properties selling price, is modeled by a statistical technique called predictive regression.
The prerequisite of data mining is to design a data warehouse that contains a wide variety of real estate listings. The data source is extracted from Multiple Listing Services (MLS) database. It is cleansed and transformed at the staging area. The data warehouse design for this project is a star schema with one large fact table surrounded by many dimension tables. Loading data into the warehouse is the final step in creating data warehouse as preparation for data mining.
This project uses Visual Basic .NET to create user interface. The communication between Oracle and .NET framework is established by adding Oracle Provider for OLE DB (OraOLEDB) component as reference in the project. The result of this project can serve as a case study for a data warehousing and data mining course, such as CSC 196K, Introduction to Data Mining and Data Warehousing, at California State University Sacramento.
, Committee Chair
Dr. Meiliu Lu
Date:
Acknowledgements
I would like to thank Dr. Meiliu Lu, my Master Project Advisor, who has received my proposal and become my advisor. She has guided and supported me from the beginning until the end of this project. She has spent her invaluable time to review and revise this report. She gave me a lot of valuable inputs and feedbacks in this report which make this report better and better. She gave me an opportunity to present my master project in her class. I am very grateful to her for her constant encouragement and help.
I also would like to thank Dr. Don Warner, my second reader, who has received my master project proposal and become my second reader. I appreciate for his valuable time to review and revise this report. I am thankful to him as the Computer Science Department Chair for he has been a good leader that makes us very proud as CSUS Computer Science alumni.
I am thankful to Dr. Cui Zhang, my Graduate Coordinator, who always cares to all of Computer Science students. I would like to express my appreciation to all Computer Science faculties and staffs for their support and help.
I would like to thank my wonderful husband, Agus Hartono, for all his love, support, and encouragement so that we can do our master project and graduate at the same time. I want to thank my dearest friends in KTM Saint Anne who always support me in prayers. Finally, I thank to God for His exceptional strength and love in every steps of my life.
Table of Contents
Page
Acknowledgments ………………………………………………………………………vi
List of Tables ……………………………………………………………………………ix
List of Figures ...…………………………………………………………………………x
Real Estate Price Prediction System Requirements .…………………………………….xi
Chapter
1. Introduction 1
2. Data Warehouse Methodology 4
2.1 Extraction ……………………………………...…..……………….5
2.2 Transformation and Cleansing …………………………...………...6
2.3 Modeling .………………………………………..…………………7
2.4 Transport ………………………………………………………….10
3. MasterDW Data Warehouse Design and Implementation ……………....11
3.1 Extraction ……………………………………...…..……………...13
3.2 Transformation and Cleansing …………………………………....15
3.2.1 Transformation and Cleansing 1 …………….…………….15
3.2.2 Update Process for the Result of Transformation and
Cleansing 1 …………………………………..…………….19
3.2.3 Transformation and Cleansing 2 ………..………………....22
3.2.4 Duplication Detection for Office and Agent Records ……..29
3.3 Modeling ………………………………….……………..………..30
3.3.1 Setting Up the Environment …...………………………….31
3.3.2 Creating the Tablespaces and Data Files .………………....32
3.3.3 Creating the Tables, Contraints, and Indexed ………….….36
3.3.4 Defining Security …….…………...……………………….43
3.4 Transport ………………………………………………………….44
4. Real Estate Price Prediction …..…………………………………………47
5. Results …………………………………………………………………...52
6. Conclusion ...………………………………………………………….....58
Appendix A. List of Residential Fields …..………………………………………........61
Appendix B. Transformation and Cleansing 1 Source Code …………………….........70
Appendix C. Transformation and Cleansing Log File …………………………...........78
Appendix D. Transformation and Cleansing 2 Source Code .......................................111
Appendix E. Duplicate Agent and Office Records Detection Source Code …..……..134
Appendix F. Area Table Load ……….…………………………………….…..…….140
Appendix G. Office Table Load …….…………………………………….…..……...143
Appendix H. Agent Table Load …….…………………………………….…..……...146
Appendix I. Residential Table Load ………………………………………………...150
Appendix J. Real Estate Price Prediction Source Code ...…………………………...157
Bibliography…………………………………………………………………………....169
List of Tables
Table 3.1 MASTERDW Tablespaces, Datafiles, and Its Content Meaning …....……..36
Table 4.1 Result of An Example Query for Predictive Regression Calculation ..……..50
Table 4.2 The Value of X and Y of Table 4.1.……………….………………………..50
List of Figures
Figure 2.1 An Example of Star Schema ……………...…………………………………8
Figure 2.2 An Example of Snowflaking Schema ……………...………………………..9
Figure 3.1 Building MASTERDW Data Warehouse Process ……….………………..12
Figure 3.2 Star Schema for MASTERDW Data Warehouse ………...………………..30
Figure 3.3 MASTERDW Data Warehouse Diagram ………………………………….39
Figure 5.1 Real Estate Price Prediction Initial Form ……………………………….....52
Figure 5.2 Area Number Error Message Box …………………………………………53
Figure 5.3 Approximate Square Footage Error Message Box .………………………..53
Figure 5.4 Number of Bedrooms Error Message Box ...………………………………54
Figure 5.5 Number of Full Bathrooms Error Message Box …………………………...54
Figure 5.6 Number of Half Bathrooms Error Message Box …………………………..55
Figure 5.7 Number of Bedrooms Error Message Box ………………………………...55
Figure 5.8 No Record Found Error Message Box …………………………….............56
Figure 5.9 Real Estate Price Prediction Result Form………………………………….57
Real Estate Price Prediction System Requirements
Software requirements:
-
Operating System: Microsoft® Windows 98 or higher.
-
.NET™ Framework for all Windows version that does not come with it.
-
Microsoft Visual Studio .NET™ (to compile the output codes).
-
Oracle Provider for OLE DB (Iterop.ORAOLEDBLib.dll)
-
Oracle 8i
Minimum hardware requirements:
-
All the minimum hardware requirements for installing and running Microsoft® Windows 98 or higher.
-
Additional hard disk space for .NET™ Framework (20 MB).
-
Additional hard disk space for Microsoft® Visual Studio .NET™ (to compile the output codes). The hard disk size depends on the Visual Studio .NET™ version, the complete Professional version needs about 1.7 GB.
-
Additional hard disk space for Oracle 8i (about 2 GB)
-
Additional hard disk space for MASTERDW data warehouse (minimum 100 MB)
Recommended hardware requirements:
-
Processor: Pentium® II or higher.
-
Memory (RAM): 64 MB or higher.
-
VGA card that supports at least 1024 x 768 resolutions in 16-bit color mode.
-
Screen resolution: 1024 x 768.
Share with your friends: |