TABLES
Type of Taxonomy
|
Examples
|
Product hierarchies
| -
United Nations Standard Products and Services Code (UNSPSC) *
-
United Nations Central Product Classification (CPC)
-
United States Patent Classification (USPTO USPC) *
-
International Patent Classification (IPC)
-
Proprietary corporate product catalogues (e.g. Amazon, Wal-Mart, Sears, or any other catalogue defined by any large or small company)
|
Industry taxonomies
| -
North American Industry Classification Scheme (NAICS) †
-
United States Standard Industrial Classifications (SIC) †
-
International Standard Industrial Classification (ISIC)
-
SITC3 (Standard International Trade Classification)
|
Company classifications
| -
Fortune 500 † and Fortune 1000
-
S&P 500 †
-
Inc. 500 † and Inc. 5000
-
Entrepreneur Magazine’s Franchise 500
-
Internet Retailer 500
|
Activity taxonomies
| -
WordNet (Verb relationships) †
-
United States Bureau of Labor Statistics Standard Occupation Classification System (SOC) †
|
Place (Location) taxonomies
| -
United States General Services Administration Geographic Locator Codes (US GSA GLC) / Geographic Names Service *
-
United States Direct Marketing Areas (DMA) †
-
Getty Thesaurus of Geographic Names (TGN)
|
Time taxonomies
| |
Topic taxonomies
| -
Library of Congress Classification system (LoC) †
-
UK Joint Academic Coding System (JACS) †
-
UK Higher Education Standard Authority Coding (HESACODE) †
|
Medical taxonomies
| -
International Classification of Diagnoses (e.g. ICD10) †
-
International Classification of Primary Care (ICPC)
-
Current Procedural Terminology (CPT)
-
US FDA Classification of Medical Devices †
|
Table 1: Popular Classification Schemes
† indicates the taxonomy (category names and relations) was imported into our prototype system,
and a random selection of approximately 10% of categories were populated with documents
* indicates that the taxonomy was imported into our prototype system and all categories were populated with documents
Table 2: Absolute Hits for a Number of Search Terms, by Document Category
Type of Taxonomy
|
Examples of data indexed using standard taxonomies
|
Product hierarchies
|
Sales data for each product category, from an internal company database, indexed by product category (e.g. UNSPSC, or UCC Stock Keeping Unit [SKU]).
|
Industry taxonomies
|
Industry size figures from the Bureau of Economic Analysis (BEA.gov), or from the Internal Revenue Service (IRS.gov), indexed by NAICS code.
|
Company classifications
|
Company profit figures, from the Securities and Exchange Commission (SEC.gov), indexed by NAICS code.
|
Activity / Employee taxonomies
|
Salary data for each profession, from the Bureau of Labor Statistics (BLS.gov), indexed by SOC occupation classification.
|
Place (Location) taxonomies
|
Population, land area, and other geographic data from the United States Geological Survey (USGS.gov), indexed by Geographic Locator Code (GLC).
United States General Services Administration Geographic Locator Codes (US GSA GLC).
|
Time taxonomies
|
Sales data for each date, from an internal company database, indexed by time.
|
Topic taxonomies
|
Enrollment data for each academic subject, from the National Center for Education Statistics (NCES.ed.gov), indexed by educational field.
|
Medical taxonomies
|
Infection rate, for each illness, in each area, indexed by ICD9 or ICD10 disease code.
|
Table 3: Structured Data Sources for Various Taxonomies
Corporate Franchise
|
Total US Franchise Outlets
|
CDB Search Terms Used
|
Pearson’s : Per-capita franchise outlets per state vs CDB search term frequency for state44
|
Population Correlation45
|
McDonalds
|
11,318
|
“burger”
|
-0.25
|
0.98
|
“hamburgers”
|
-0.15
|
Pizza Hut
|
5,676
|
“pizza”
|
0.04
|
0.34
|
KFC
|
4,378
|
“chicken”
|
-0.17
|
0.95
|
Intercontinental
|
3,023
|
“hotels”
|
-0.34
|
0.93
|
Starbucks
|
9,869
|
“coffee”
|
-0.21
|
0.90
|
RE/MAX
|
4,628
|
“property”
|
0.15
|
0.95
|
Supercuts
|
1,644
|
“hair”
|
-0.35
|
0.79
|
Jackson Hewitt
|
2,475
|
“tax”
|
0.10
|
0.87
|
Carlson Wagonlit
|
340
|
“travel”
|
0.26
|
0.87
|
“flight”
|
0.13
|
Jiffy Lube
|
1,923
|
“car”
|
-0.10
|
0.82
|
Miracle Ear
|
1,349
|
“hearing”
|
0.02
|
0.89
|
Table 4: Summary of Experimental Results – Selected Population-Sensitive Industries
Industry
|
External Data Used
|
CDB Search Term Used
|
Pearson’s 46
|
Population Correlation47
|
Wind energy
|
DoE wind generating capacity
|
“windy”
|
0.07
|
0.10
|
NREL wind resource availability
|
0.25
|
-0.36
|
Solar energy
|
Thermomax solar energy (BTUs)
|
“warm”
|
-0.11
|
-0.09
|
“sunny”
|
-0.28
|
“sunshine”
|
0.22
|
Rain
|
NOAA precipitation per square mile 2008
|
“rain”
|
0.29
|
-0.02
|
NationalAtlas.gov
1961-1990
|
0.27
|
0.01
|
Fishing
|
USFWS Non-resident fishing licenses sold
|
“fishing”
|
0.46
|
0.19
|
Coal
|
NMA Number of coal mines
|
“coal”
|
0.75
|
0.18
|
NMA Coal production
|
0.74
|
0.08
|
Gemstone
|
NMA Gemstone production
|
“gemstone”
|
0.30
|
0.09
|
Gold
|
NMA Gold revenues
|
“gold”
|
0.29
|
-0.06
|
Forests
|
NFS Forest area
|
“forest”
|
0.30
|
0.18
|
Oil
|
EIA Oil production
|
“oil”
|
0.39
|
-0.04
|
Mountain climbing
|
USGS Elevation Data
|
“mountain climbing”
|
0.65
|
-0.10
|
Eco-tourism
|
USBLS Number of
eco-tourism employees
|
“ecotourism”
|
0.39
|
0.35
|
Gaming
|
USBLS Number of game dealers
|
“gambling”
|
0.29
|
0.32
|
Table 5: Summary of Experimental Results – Non-Population-Sensitive Industries
Page of
Share with your friends: |