Creating and exploiting aggregate information from text using a novel Categorized Document Base abstract



Download 355.17 Kb.
Page9/10
Date18.10.2016
Size355.17 Kb.
#1911
1   2   3   4   5   6   7   8   9   10

FIGURES


Figure 1: Results produced by exploratory search on ‘fishing’ using clusty.com



Figure 2: Results produced by exploratory search on ‘fishing’ using kartoo.com



Figure 3: Results produced by exploratory search on ‘US locations fishing’ using clusty.com



Figure 4: Hits for ‘fishing’ by state, from a CDB populated with American cities


Figure 5: Population of categories with relevant documents

Figure 6: Screen from our software prototype, showing word suggestion tool to allow user to select an expansion or disambiguation of the current search term


Figure 7: Computing hits for each category


Figure 8: Relative prevalence of the terms “smoothness”, “strength” and [“wet” or “damp”] in various stone quarrying industry segments. Three bars are shown for each industry: from left to right, the three bars for that industry are “smoothness” for that industry, “strength” for that industry, and [“wet OR damp”] for that industry.



Figure 9: Excel-based collapsible tree view provided for the exploration of


aggregate statistics (e.g. hits) per category in various taxonomies


Figure 10: Web-based collapsible tree view provided for the exploration of


aggregate statistics (e.g. hits) per category, in the UNSPSC taxonomy




Figure 11: Collaborative annotation interface for sharing of human observations on interesting categories amongst a team: illustration of users sharing comments on possible applications of a biodegradable molecule with foam reduction properties.



Categories

Column Name

Data Type

ParentID

Int(11) (Primary Key)

CategoryID

Int(11)

CategoryName

Char(255)

Flag

Int(11)

DateCompleted

DateTime


CategoryAssignment

Column Name

Data Type

CategoryID

Int(11) (Primary Key)

DateAndTimeAssigned

Timestamp

IPAddressOfServant

Char(255)

ServantComputerName

Char(255)


CategoryCompleted

Column Name

Data Type

CategoryID

Int(11) (Primary Key)

DateAndTimeCompleted

Timestamp

IPAddressOfServant

Char(255)

ServantComputerName

Char(255)


Documents

Column Name

Data Type

DocumentID

Int(11) (Primary Key)

DocumentURL

Text

CategoryID

Int(11)

DateCompleted

DateTime


Lexicon

Column Name

Data Type

WordSenseID

Int(11) (Primary Key)

WordText

Char(255)


Words

Column Name

Data Type

DocumentID

Int(11) (Primary Key)

WordSenseID

Int(11)

WordPosition

Int(11)


Figure 12: Database tables used to represent and index documents in the CDB
Request

Column Name

Data Type

RequestID

Int(11) (Primary Key)

UserID

Char(255)

TopCategoryID

Int(11)

SearchPhrase

Char(255)

DateAndTimeRequested

TimeStamp

DateAndTimeAssigned

DateTime

DateAndTimePopulated

DateTime


RequestedData

Column Name

Data Type

RequestID

Int(11) (Primary Key)

CategoryID

Int(11)

SearchPhrase

Char(255)

WordCount

Int(11)

DocumentCount

Int(11)

Servant

Char(255)

DateAndTime

DateTime


Figure 13: Database tables used to store and cache query results from the CDB
User

Column Name

Data Type

UserID

Int(11) (Primary Key)

Username

Char(255)

Company

Char(255)

Email

Char(255)


Project

Column Name

Data Type

ProjectID

Int(11) (Primary Key)

Project Name

Char(255)

CreatedByUserID

Int(11)


Annotation

Column Name

Data Type

Annotation_ID

Int(11) (Primary Key)

Category_ID

Int(11)

Project_ID

Int(11)

CreatedByUserID

Int(11)

Application

Varchar(500)

Notes

Varchar(600)

Review

Varchar(60)

Rank

Int(11)

DateAndTime

DateTime


SharedAnnotation

Column Name

Data Type

AnnotationID

Int(11) (Primary Key)

SharedWithUserID

Int(11)


Figure 14: Database tables used for collaborative annotation features of the CDB

Figure 15: Bubble Chart Showing Integration of


Aggregate Data Gleaned From Text, with Structured Data.
X-Axis represents the Asset Turnover for the industry (i.e. category).
Y-Axis is relative prevalence of the search term “biodegradable” in each category.


.

Figure 16: Sample of publicly traded companies in the plastics packaging industry, obtained from the SEC, after a CDB exploration revealed that documents in the plastics packaging industry frequently mention a compound being marketed by a chemical industry salesperson.





Download 355.17 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   10




The database is protected by copyright ©ininet.org 2024
send message

    Main page