Description of the workshop topic and the associated research issues

Download 55.67 Kb.

Date	28.05.2018
Size	55.67 Kb.
	#50871

ICDM 2005 Workshop Proposal
Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources
Description of the workshop topic and the associated research issues
Recent advances in high performance computing, high speed and high bandwidth communication, massive storage, and software (e.g., web services) that can be remotely invoked on the Internet present unprecedented opportunities in data-driven knowledge acquisition in a broad range of applications in virtually all areas of human endeavor including collaborative cross-disciplinary discovery in e-science, bioinformatics, e-government, environmental informatics, health informatics, security informatics, e-business, education, social informatics, among others. Given the explosive growth in the number and diversity of potentially useful information sources in many domains, there is an urgent need for sound approaches to integrative and collaborative analysis and interpretation of distributed, autonomous (and hence, inevitably semantically heterogeneous) data sources.
Machine learning offers some of the most cost-effective approaches to automated or semi-automated knowledge acquisition (discovery of features, correlations, and other complex relationships and hypotheses that describe potentially interesting regularities from large data sets) in many data rich application domains. However, the applicability of current approaches to machine learning in emerging data rich application domains presents several challenges in practice:

Centralized access to data (assumed by most machine learning algorithms) is infeasible because of the large size and/or access restrictions imposed by the autonomous data sources. Hence, there is a need for knowledge acquisition systems that can perform the necessary analysis of data at the locations where the data and the computational resources are available and transmit the results of analysis (knowledge acquired from the data) to the locations where they are needed.
Ontological commitments associated with a data source (that is, assumptions concerning the objects that exist in the world, the properties or attributes of the objects, the possible values of attributes, and their intended meaning) are determined by the intended use of the data repository (at design time). In addition, data sources that are created for use in one context often find use in other contexts or applications. Therefore, semantic differences among autonomous data sources are simply unavoidable. Because users often need to analyze data in different contexts from different perspectives, there is no single privileged ontology that can serve all users, or for that matter, even a single user, in every context. Effective use of multiple sources of data in a given context requires reconciliation of such semantic differences from the user’s point of view.
Explicitly associating ontologies with data repositories results in partially specified data, i.e., data that are described in terms of attribute values at different levels of abstraction. For example, the program of a student in a data source can be specified as Graduate, while the program of a different student in the same data source (or even a different data source) can be specified as Doctoral.

Against this background, the proposed workshop seeks to bring together researchers in relevant areas of artificial intelligence (machine learning, data mining, knowledge representation, ontologies), information systems (information integration, databases, semantic web) distributed computing, and selected application areas (e.g., bioinformatics, security informatics, environmental informatics) to address several questions such as:

What are some of the research challenges presented by emerging data-rich application domains such as bioinformatics, health informatics, security informatics, social informatics, environmental informatics?
How can we perform knowledge discovery from distributed data (assuming different types of data fragmentation, e.g., horizontal or vertical data fragmentation; different hypothesis classes, e.g., naïve Bayes, decision tree, support vector machine classifiers; different performance criteria, e.g., accuracy versus complexity versus reliability of the model generated, etc.)?
How can we make semantically heterogeneous data sources self-describing (e.g., by explicitly associating ontologies with data sources and mappings between them) in order to help collaborative science from autonomous information sources?
How can we represent, manipulate, and reason with ontologies and mappings between ontologies?
How can we learn ontologies from data (e.g., attribute value taxonomies)?
How can we learn mappings between semantically heterogeneous data source schemas and between their associated ontologies?
How can we perform knowledge discovery in the presence of ontologies (e.g., attribute value taxonomies) and partially specified data (data that are described at different levels of abstraction within an ontology)?
How can we achieve online query relaxation when an initial query posed to the data sources fails (i.e., returns no tuples)? That is, how do we perform a query-driven mining of the individual sources that will result in knowledge that can be used for query relaxation?

Reasons why an ICDM workshop on this topic should take place
As noted above, the explosive growth in the number and diversity of potentially useful information sources in many domains, there is an urgent need for sound approaches to integrative and collaborative analysis and interpretation of distributed, autonomous (and hence, inevitably semantically heterogeneous) data sources. At present, while there are several research conferences focus on well-established research areas (e.g., machine learning, data mining, knowledge representation, databases), there is relatively little interaction among the different research communities. For example, machine learning researchers working on algorithms for learning predictive models from distributed data, are isolated from the large community of database researchers working on data integration, and the community of artificial intelligence researchers focused on knowledge representation and inference. Researchers in this area can also benefit from a better understanding of specific challenges posed by emerging informatics-enabled application domains such as bioinformatics, health informatics, security informatics, environmental informatics.
Fundamental advances in collaborative approaches to knowledge acquisition and data-driven decision making from distributed, autonomous, semantically heterogeneous data and knowledge sources require synergistic synthesis of research advances, insights, algorithms, and results in multiple areas of:

artificial intelligence – especially machine learning, data mining, knowledge representation and inference, intelligent agents and multi-agent systems;
information systems – especially databases, information integration, semantic web;
distributed computing (e.g., service-oriented computing).

The proposed workshop aims to bring them together in order to enable discussion of research problems, approaches, insights, and results drawn from multiple, and at present, largely disparate areas of artificial intelligence, computer science, and emerging informatics-enabled disciples. At present, there is no annual conference or workshop dedicated to this topic. It is hoped that the resulting exchanges will stimulate further interaction between these communities and result in the development of new approaches that would advance the current state of the art in collaborative systems for collaborative analysis, interpretation, and decision making from distributed, autonomous, semantically heterogeneous data and knowledge sources.

Workshop Format
The workshop will consist of:

An opening session for introducing the workshop topics, goals, participants, and expected outcomes
A small number of invited talks carefully intermixed with presentation of contributed papers. The invited talks will give overviews of the key topics (learning from distributed data, semantic Web, ontology-based information integration, distributed description logics, selected applications, etc.). A possible list of invited speakers:

Alex Borgida (ontologies and databases) - Rutgers University
Katy Borner (information visualization) -- Indiana University
Foster Provost (machine learning and data mining) – New York University
James Hendler (semantic web) - University of Maryland at College Park
Alon Halevy (information integration) – University of Washington
Dieter Fensel (ontologies) - University of Innsbruck
Tom Dietterich (machine learning and environmental informatics) – Oregon State University
H. Jagadish (biological data management) – University of Michigan
Daphne Koller (probabilistic models) – Stanford University
Munindar Singh (service-oriented computing) – North Carolina State University
Michael Pazzani (intelligent information systems) – National Science Foundation

Presentations of contributed papers that represent completed work.
Breaks between sessions, meant to encourage informal discussions related to the topics discussed in the sessions and to create opportunities for collaborations.
A panel discussion on challenges and future research directions
A wrap-up session summarizing the workshop (including formal or informal discussions).

Description of the anticipated target group(s) of attendees
The workshop is of interest to researchers, students, and practitioners in a number of areas of artificial intelligence, information systems, and related areas including: machine learning and data mining, information extraction, information integration, knowledge representation, semantic web, software agents and multi-agent systems, and service-oriented computing. The workshop is also of interest to researchers and practitioners in emerging informatics-enabled application domains such as bioinformatics, environmental informatics, health informatics, security informatics, e-business, social informatics.
The organizers will make an effort to ensure a good mix of established researchers as well as graduate students and junior researchers on the one hand and academic and industrial participants on the other.
Potential authors and attendees
A number of researchers who were informally contacted have expressed an interest in the proposed workshop. We put together a short list of potential participants. (The list below does not include members of the program committee or participants named on the list of potential invited speakers). We expect the target size of the workshop to be around 40 participants to allow for fruitful interactions and discussion in an informal setting among the workshop participants.
AnHai Doan -- University of Illinois at Urbana-Champaign

Lise Getoor – University of Maryland

Barbara Eckman -- IBM Life Sciences Solution Development

George Forman – Hewlett Packard Labs

Simon Kasif – Boston University

Zoe Lacroix -- Arizona State University

Pat Langley – Stanford University

Bertram Ludaescher -- University of California, Davis and San Diego Supercomputer Center

Sanjay Madria -- University of Missouri-Rolla

Nina Mishra – Stanford University and IBM

Vibhu Mittal – Google

Joyce Mitchell – University of Utah

Katia Sycara – Carnegie Mellon University

Lee Giles – Pennsylvania State University

Peter Tarczy-Hornoch – University of Washington
Workshop Organizing Committee – Contact Information
Dr. Doina Caragea (Contact Person)

226 Atanasoff Hall

Department of Computer Science

Iowa State University

Ames, IA 50011-1040 USA

dcaragea@cs.iastate.edu

Phone: 1-515-292-3704

Professor Vasant Honavar

226 Atanasoff Hall

Department of Computer Science

Iowa State University

Ames, IA 50011-1040 USA

honavar@cs.iastate.edu

Phone: 1-515-294-4377

Dr. Ion Muslea

Language Weaver, Inc.

4640 Admiralty Way

Suite 1210

Marina del Rey, CA 90292

imuslea@languageweaver.com

Phone: 1-310-437-7300

Professor Raghu Ramakrishnan

Department of Computer Sciences

University of Wisconsin-Madison

1210 West Dayton Street

Madison, WI 53706-1685 USA

raghu@cs.wisc.edu

Phone: 1-608-262-9759

Preliminary Program Committee
Naoki Abe - IBM

Liviu Badea – ICI, Romania

Marie desJardins - University of Maryland, Baltimore County

Tim Finin -- University of Maryland, Baltimore County

Joydeep Ghosh -- University of Texas

Hillol Kargupta – University of Maryland, Baltimore County

Sally McClean -- University of Ulster at Coleraine

Dragos Margineantu – Boeing

Bamshad Mobasher – DePaul University

Jay Modi – Carnegie Melon University

C. David Page Jr. – University of Wisconsin, Madison

Alexandrin Popescul - Ask Jeeves, Inc.

Adrian Silvescu – Iowa State University

Steffen Staab -- University of Koblenz

Previously Organized Related Workshops

IJCAI-2001 Workshop on “Knowledge Discovery from Heterogeneous, Distributed, Autonomous, Dynamic Data and Knowledge Sources”.Vasant Honavar, Chair.
AAAI-2004 workshop on "Adaptive Text Extraction and Mining". Ion Muslea, Chair.

IJCAI-2001 workshop on "Adaptive Text Extraction and Mining". Ion Muslea, Co-chair.
AAAI-99 workshop on "Machine Learning for Information Extraction". Ion Muslea, Co-Chair.

Call for Papers
ICDM 2005 Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources

November 27^th, New Orleans, Louisiana, USA

Important Dates
Aug. 12^th: Paper Due

Sept. 4^th: Notification

Sept. 26^th: Camera Ready

Nov. 27^th: Workshop

Organizing Committee
Doina Caragea

Iowa State University

dcaragea@cs.iastate.edu
Vasant Honavar

Iowa State University

honavar@cs.iastate.edu
Ion Muslea

Language Weaver, Inc.

imuslea@languageweaver.com
Raghu Ramakrishnan

University of Wisconsin-Madison

raghu@cs.wisc.edu
Program Committee
Naoki Abe, IBM

Liviu Badea, ICI, Romania

Doina Caragea, Iowa State Univ.

AnHai Doan, UIUC

Marie desJardins, UMBC

Joydeep Ghosh, Univ. of Texas

C. Lee Giles, Penn State Univ.

Vasant Honavar, Iowa State Univ.

Hillol Kargupta, UMBC

Sally McClean, U. of Ulster, UK

Bamshad Mobasher – DePaul U.

Jay Modi – Carnegie Mellon Univ.

C. David Page, Univ. of Wisconsin

Alexandrin Popescul - Ask Jeeves

Raghu Ramakrishnan, Univ. of Wisconsin

Zbigniew Ras, UNC-Charlotte

Steffen Staab – Univ. of Koblenz

Workshop Goals
The workshop aims to bring together researchers in relevant areas of artificial intelligence (machine learning, data mining, knowledge representation, ontologies), information systems (information integration, databases, semantic web) distributed computing, and selected application areas (e.g., bioinformatics, security informatics, environmental informatics) to address several questions that arise in the process of knowledge acquisition from distributed, autonomous, semantically heterogeneous data and knowledge sources.
Topics of Interest
Topics of interest include, but are not restricted to:

Challenges presented by emerging data-rich application domains such as bioinformatics, health informatics, security informatics, social informatics, environmental informatics.

Knowledge discovery from distributed data (assuming different types of data fragmentation, e.g., horizontal or vertical data fragmentation; different hypothesis classes, e.g., naïve Bayes, decision tree; different performance criteria, e.g., accuracy versus complexity versus reliability of the model generated, etc.).
Making semantically heterogeneous data sources self-describing (e.g., by explicitly associating ontologies with data sources and mappings between them) in order to help collaborative science .
Representation, manipulation, and reasoning with ontologies and mappings between ontologies.
Learning ontologies from data (e.g., attribute value taxonomies).
Learning mappings between semantically heterogeneous data source schemas and between their associated ontologies.
Knowledge discovery in the presence of ontologies (e.g., attribute value taxonomies) and partially specified data (data described at different levels of abstraction within an ontology)?
Online query relaxation when an initial query posed to the data sources fails (i.e., returns no tuples), or equivalently, query-driven mining of the individual sources that will result in knowledge that can be used for query relaxation.

Submission Instructions
Postscript or PDF versions of papers, no more than 10 pages long (including figures, tables, and references) in the ICDM camera-ready

format (IEEE 2-column format), should be submitted electronically to dcaragea@cs.iastate.edu by August

12^th. Each paper will be rigorously refereed by at least 2 reviewers for technical soundness, originality, and clarity of presentation. Accepted papers will be included in informal workshop proceedings published by ICDM and distributed at the workshop. More details about the workshop can be found at www.cs.iastate.edu/~dcaragea/ICDM-KA.

Download 55.67 Kb.

Share with your friends: