Creating and exploiting aggregate information from text using a novel Categorized Document Base abstract



Download 355.17 Kb.
Page7/10
Date18.10.2016
Size355.17 Kb.
#1911
1   2   3   4   5   6   7   8   9   10

6.3Discussion


The CDB clearly appears to produce comparisons of varying validity across population-sensitive versus non-population sensitive industries, with performance seemingly better on non-population sensitive industries. We speculate that this is the case because the distinction between states in non-population sensitive industries is far more stark than for population-sensitive industries, and the CDB is a relatively approximate instrument that is only capable of discerning stark differences. For instance oil production and mountain climbing vary considerable between states, whereas hamburger consumption does not and the CDB is incapable of ascertaining the more subtle distinction.

The reader may notice from the experiments that a number of important challenges remain for the CDB, such as disambiguating multiple word senses, and determining whether the statistics generated are indicative of consumer demand, or of market supply. We leave these challenges for future work (see Section 8).

Attempting to provide quantitative assessments as to the magnitude of a market phenomenon from qualitative text is not new. For instance, Romano et al [112] developed a qualitative data analysis methodology that successfully predicts box office opening success based on pre-release free-form comments about the movie on the web. Their work showed promising results. Our findings, above, concur, and affirm Romano et al’s contention that an appropriate methodology for the analysis of free-form text can reveal meaningful evaluative information of market phenomena.

7.APPLICATIONS


CDBs have a number of useful applications, including market research, sales lead prospecting, competitor or substitute identification, or exploring unfamiliar collections of topics or items.

For market research, the experiments shown in the previous section indicate that CDBs can be a plausible means of assessing industry penetration by state in a number of industries, as the quantitative data for certain industries has been shown to correlate with the CDB rankings for descriptive terms for that industry. While the assessments produced by the CDB are certainly flawed, they are nevertheless demonstrably better than random, and therefore possess some information value. The CDB method should be generally useful when one wants a ranking of categories in non-population sensitive industries, and can tolerate some error, and independent data of good quality does not exist. While we would not recommend that investments in product roll-out be fashioned directly around the CDBs findings, we see the CDB as a useful exploratory tool that is able to suggest locations of interest for further investigation or trials. We have employed the CDB in an engagement with a growing pet insurance company, PetPlan USA ( http://www.gopetplan.com/ ). In our engagement, we compared the locations of current customers of the company, with hit counts for ‘dog’ across all US locations, to determine promising locations of future interest. This information, in conjunction with other intelligence gathered by the organization, is used to inform PetPlan’s marketing strategy. However, as the CDB provides only an incidental contribution to the overall marketing decisions, it is not possible to attribute specific dollar benefit to the information obtained from the CDB in this case.

In the area of sales lead prospecting, Du Pont corporation, a Fortune 500 chemical company, has experimented with our CDB for the identification of prospective markets and customers for their products. In one of their exploratory investigations, Du Pont made use of a taxonomy of industries, the North American Industry Classification Systems (NAICS), and searched for hits for various attributes of a chemical surfactant they manufacture across those industries. To assist the team of business development managers and engineers with their investigations, we implemented web-based collaboration features to allow users to capture and share their comments on particular industries that showed high scores (see §4.6 and Figure 11). For instance a business analyst who comments “this industry is a large market with few barriers to entry and should be investigated further” may receive a response from a chemical engineer stating “this industry is unfortunately not feasible as the surfactant is not food-safe”. Though we cannot attribute any specific new revenue to the CDB, there is anecdotal evidence that the CDB uncovered industries of interest: trial users at Du Pont reported that the CDB uncovered unusual industries they had not previously considered as potential markets. One trial user also reported receiving an unexpected contact from a company in an industry identified by the CDB as interesting.

For reasons of confidentiality, the following example is fabricated, but illustrative of the process that can be followed to find new sales prospects. Assume that a salesperson has identified, using a CDB, that, in comparison to other industries, documents from the plastics packaging industry mention the attributes of the chemical that she is trying to sell with unusual frequency. The salesperson concludes that companies in the plastics packaging industry may be interested in her compound. The salesperson is able to use the NAICS or SIC code for the plastics packaging industry to retrieve a list of potential clients from a public source, such as the United States Securities and Exchange Commission (SEC)43: Figure 16 shows a portion of the company listing she obtained in this way. The “Navigate” button in Figure 16 allows the salesperson to select a prospect from the list and click the button to quickly navigate to the company’s financial reports in order to further qualify the prospect. The salesperson has successfully integrated knowledge gleaned from the CDB (unstructured data indicating that a certain industry mentions her product with unusual frequency) with a structured data source (list of companies in the identified candidate industry from the SEC), and has been rapidly able to identify a previously unrealized lucrative target market, and construct a list of specific potential prospects.

In the area of competitor and substitute identification, we speculate that, when used in conjunction with a taxonomy of industries or companies or products, CDBs can be used to identify particular industries or companies or products that mention certain attributes with unusual frequency. We have not, however, yet undertaken any academic or commercial trials in this sphere and are currently seeking research partners to progress such studies.

In the area of exploring unfamiliar collections of topics or items, we speculate that the CDB may be useful for uncovering topics or items with particular attributes amongst large set of unfamiliar topics or items. For example, when populated with the most relevant pages for a list of hospitals, the CDB could be helpful in identifying hospitals with particular specialties (e.g. ‘cardiology’). Similarly, when populated with the top pages for a list of universities or schools, the CDB could conceivably identify those with particular attributes (e.g. universities with a specialty in ‘chemical engineering’, or schools that frequently mention students going on to ‘Ivy League colleges’). The CDB would be especially useful if the aggregate data the CDB produced from text were combined – ‘mashed up’ – with structured data from other sources (see Section 4.5), to allow for multi-criterion decision-making. This would, for example, allow a student to compare colleges offering ‘chemical engineering’ while simultaneously looking at the annual fees and geographic locations for those colleges. Similarly, a middle-school parent who would like to relocate nationally to a better school district for their child, may be able to use a CDB to identify high schools reporting students going onto Ivy League colleges while simultaneously looking at the median house price in the school’s neighborhood and the property taxes for the county to assess affordability. Again, these applications are conjectured, and no exploratory trials have been performed.




Download 355.17 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   10




The database is protected by copyright ©ininet.org 2024
send message

    Main page