FIGURES
Figure 1: Results produced by exploratory search on ‘fishing’ using clusty.com
Figure 2: Results produced by exploratory search on ‘fishing’ using kartoo.com
Figure 3: Results produced by exploratory search on ‘US locations fishing’ using clusty.com
Figure 4: Hits for ‘fishing’ by state, from a CDB populated with American cities
Figure 5: Population of categories with relevant documents
Figure 6: Screen from our software prototype, showing word suggestion tool to allow user to select an expansion or disambiguation of the current search term
Figure 7: Computing hits for each category
Figure 8: Relative prevalence of the terms “smoothness”, “strength” and [“wet” or “damp”] in various stone quarrying industry segments. Three bars are shown for each industry: from left to right, the three bars for that industry are “smoothness” for that industry, “strength” for that industry, and [“wet OR damp”] for that industry.
Figure 9: Excel-based collapsible tree view provided for the exploration of
aggregate statistics (e.g. hits) per category in various taxonomies
Figure 10: Web-based collapsible tree view provided for the exploration of
aggregate statistics (e.g. hits) per category, in the UNSPSC taxonomy
Figure 11: Collaborative annotation interface for sharing of human observations on interesting categories amongst a team: illustration of users sharing comments on possible applications of a biodegradable molecule with foam reduction properties.
Categories
Column Name
|
Data Type
|
ParentID
|
Int(11) (Primary Key)
|
CategoryID
|
Int(11)
|
CategoryName
|
Char(255)
|
Flag
|
Int(11)
|
DateCompleted
|
DateTime
|
CategoryAssignment
Column Name
|
Data Type
|
CategoryID
|
Int(11) (Primary Key)
|
DateAndTimeAssigned
|
Timestamp
|
IPAddressOfServant
|
Char(255)
|
ServantComputerName
|
Char(255)
|
CategoryCompleted
Column Name
|
Data Type
|
CategoryID
|
Int(11) (Primary Key)
|
DateAndTimeCompleted
|
Timestamp
|
IPAddressOfServant
|
Char(255)
|
ServantComputerName
|
Char(255)
|
Documents
Column Name
|
Data Type
|
DocumentID
|
Int(11) (Primary Key)
|
DocumentURL
|
Text
|
CategoryID
|
Int(11)
|
DateCompleted
|
DateTime
|
Lexicon
Column Name
|
Data Type
|
WordSenseID
|
Int(11) (Primary Key)
|
WordText
|
Char(255)
|
Words
Column Name
|
Data Type
|
DocumentID
|
Int(11) (Primary Key)
|
WordSenseID
|
Int(11)
|
WordPosition
|
Int(11)
|
Figure 12: Database tables used to represent and index documents in the CDB
Request
Column Name
|
Data Type
|
RequestID
|
Int(11) (Primary Key)
|
UserID
|
Char(255)
|
TopCategoryID
|
Int(11)
|
SearchPhrase
|
Char(255)
|
DateAndTimeRequested
|
TimeStamp
|
DateAndTimeAssigned
|
DateTime
|
DateAndTimePopulated
|
DateTime
|
RequestedData
Column Name
|
Data Type
|
RequestID
|
Int(11) (Primary Key)
|
CategoryID
|
Int(11)
|
SearchPhrase
|
Char(255)
|
WordCount
|
Int(11)
|
DocumentCount
|
Int(11)
|
Servant
|
Char(255)
|
DateAndTime
|
DateTime
|
Figure 13: Database tables used to store and cache query results from the CDB
User
Column Name
|
Data Type
|
UserID
|
Int(11) (Primary Key)
|
Username
|
Char(255)
|
Company
|
Char(255)
|
Email
|
Char(255)
|
Project
Column Name
|
Data Type
|
ProjectID
|
Int(11) (Primary Key)
|
Project Name
|
Char(255)
|
CreatedByUserID
|
Int(11)
|
Annotation
Column Name
|
Data Type
|
Annotation_ID
|
Int(11) (Primary Key)
|
Category_ID
|
Int(11)
|
Project_ID
|
Int(11)
|
CreatedByUserID
|
Int(11)
|
Application
|
Varchar(500)
|
Notes
|
Varchar(600)
|
Review
|
Varchar(60)
|
Rank
|
Int(11)
|
DateAndTime
|
DateTime
|
SharedAnnotation
Column Name
|
Data Type
|
AnnotationID
|
Int(11) (Primary Key)
|
SharedWithUserID
|
Int(11)
|
Figure 14: Database tables used for collaborative annotation features of the CDB
Figure 15: Bubble Chart Showing Integration of
Aggregate Data Gleaned From Text, with Structured Data.
X-Axis represents the Asset Turnover for the industry (i.e. category).
Y-Axis is relative prevalence of the search term “biodegradable” in each category.
.
Figure 16: Sample of publicly traded companies in the plastics packaging industry, obtained from the SEC, after a CDB exploration revealed that documents in the plastics packaging industry frequently mention a compound being marketed by a chemical industry salesperson.
Share with your friends: |