Modern organizations are increasingly relying on computerized information processes and infrastructures but as the underlying systems evolve and become progressively more sophisticated, their users and managers are facing an exponentially growing volume of complex data. Exploring, analyzing and interpreting this information increasingly requires capabilities that cannot be met by conventional, standalone data mining solutions. Based around eleven international real life case studies and including contributions from leading experts in the field this groundbreaking book explores the need for the grid-enabling of data mining applications and provides a comprehensive study of the technology, techniques and management skills necessary to create them.
List of contributors.
1. Data mining meets grid computing: time to dance
1.2 Data mining.
1.3 Grid computing.
1.4 Data mining grid - mining grid data.
1.6 Summary of chapters in this volume.
2. Data analysis services in the Knowledge Grid
2.3 Knowledge Grid services.
2.4 Data analysis services.
2.5 Design of Knowledge Grid applications.
3. GridMiner: an advanced support for e-science analytics
3.2 Rationale behind the design and development of GridMiner.
3.3 Use case.
3.4 Knowledge discovery process and its support by GridMiner.
3.5 Graphical user interface.
3.6 Future developments.
4. ADaM services: scientific data mining in the service-oriented architecture paradigm
4.2 ADaM system overview.
4.3 ADaM toolkit overview.
4.4 Mining in a service-oriented architecture.
4.5 Mining Web services.
4.6 Mining grid services.
5. Mining for misconfigured machines in grid systems
5.2 Preliminaries and related work.
5.3 Acquiring, pre-processing and storing data.
5.4 Data analysis.
5.5 The GMS.
5.7 Conclusions and future work.
6. FAEHIM: Federated Analysis Environment for Heterogeneous Intelligent Mining
6.2 Requirements of a distributed knowledge discovery framework.
6.3 Workflow-based knowledge discovery.
6.4 Data mining toolkit.
6.5 Data mining service framework.
6.6 Distributed data mining services.
6.7 Data manipulation tools.
6.9 Empirical experiments.
7. Scalable and privacy preserving distributed data analysis over a service-oriented platform
7.2 A service-oriented solution.
7.4 Model-based scalable, privacy preserving, distributed data analysis.
7.5 Modelling distributed data mining and workflow processes.
7.6 Lessons learned.
7.7 Further research directions.
8. Building and using analytical workflows in Discovery Net
8.2 Discovery Net system.
8.3 Architecture for Discovery Net.
8.4 Data management.
8.5 Example of a workflow study.
8.6 Future directions.
9. Building workflows that traverse the bioinformatics data landscape
9.2 The bioinformatics data landscape.
9.3 The bioinformatics experiment landscape.
9.4 Taverna for bioinformatics experiments.
9.5 Building workflows in Taverna.
9.6 Workflow case study.
10. Specification of distributed data mining workflows with DataMiningGrid
10.2 DataMiningGrid environment.
10.3 Operations for workflow construction.
10.5 Case studies.
10.6 Discussion and related work.
10.7 Open issues.
11. Anteater: service-oriented data mining
11.2 The architecture.
11.3 Runtime framework.
11.4 Parallel algorithms for data mining.
11.5 Visual metaphors.
11.6 Case studies.
11.7 Future developments.
11.8 Conclusions and future work.
12. DMGA: a generic brokering-based data mining grid architecture
12.2 DMGA overview.
12.3 Horizontal composition.
12.4 Vertical composition.
12.5 The need for brokering.
12.6 Brokering-based data mining grid architecture.
12.7 Use cases: Apriori, ID3 and J4.8 algorithms.
12.8 Related work.
13. Grid-based data mining with the Environmental Scenario Search Engine (ESSE)
13.1 Environmental data source: NCEP/NCAR reanalysis data set.
13.2 Fuzzy search engine.
13.3 Software architecture.
14. Data pre-processing using OGSA-DAI
14.2 Data pre-processing for grid-enabled data mining.
14.3 Using OGSA-DAI to support data mining applications.
14.4 Data pre-processing scenarios in data mining applications.
14.5 State of the art solutions for grid data management.
14.7 Open issues.
Primary: Developers and users of data mining and grid technology across a wide range of subject areas including biomedical sciences, finance, manufacturing and marketing.
Secondary: advanced undergraduate and postgraduate students on a variety of courses such as data mining, grid computing, high-performance computing, knowledge-based systems, bioinformatics and medical informatics.
Courses names: bioinformatics, medical informatics and systems biology