数据挖掘导论(英文版) 拾光::有路网

免费注册 | 登录 | 我的有路 | 黑板报 | 客服中心 | 帮助

网站购物车本 | 店铺购物车本

店铺平均得分：99.81 分，再接再厉！！！【查看全部评价】

评分	40分	50分	60分	70分	80分	90分	100分
数量	3	0	1	0	8	29	3477

本店铺共有 0 笔投诉记录，投诉率 0% ，低于平均投诉率 1% 【查看详细】

投诉类型

数量

比例

店主称呼：拾光联系方式：

15974791540 地址：湖南省长沙市望城区书堂山
促销广告：正版二手八五成新左右，多仓发货，多本可优惠，可开发票，急单慎重，最好先咨询。

【进入店铺首页】

≡

本店已缴纳保证金≡

【查看店家资质】

图书分类

店铺公告

提交订单后，在“入驻店铺订单”内查看。
多本可优惠，具体联系客服。

正版二手书籍，八五成新左右，发货以后品相问题不退货退款，买家原因造成的退货退款拒收，都需要买家承担相应的运费。
确认后的订单在入驻店铺订单里找；确认后请及时付款，长时间未付款书籍也会被别人买走。店铺二手书默认不含CD，有CD的我们会附赠的，购买套装的请联系客服，低价是一本书的价格。
多本书籍多仓寄出，请耐心等待，有问题最好电话或者短信联系。

电话或微信:15974791540

发布时间：2023年09月18日

店铺介绍

找书具体联系客服。
多本多仓发货，不指定快递，具体看公告
咨询，找书，售后都打电话加微信，QQ上不了
订单在入驻店铺订单查看

入驻时间：2021年07月03日

交易帮助

第一步：选择图书放入购物车。
第二步：结算、填写收货地址。
第三步：担保付款或银行汇款。
第四步：卖家发货。
第五步：确认收货、评价。

【查看更多帮助】

书名：数据挖掘导论(英文版)

图书分类 >> 计算机与网络 >> 数据库

作/译者：[美]Pang-Ning Tan Michael Steinbach Vipin Kumar 出版社：人民邮电出版社

出版日期：2006年01月
ISBN：9787115141446 [十位：7115141444]
页数：516
定价：￥59.00
店铺售价：￥14.70 （为您节省：￥44.30）
店铺库存：7 本
注：您当前是在入驻店铺购买，非有路网直接销售。

正在处理购买信息，请稍候……

我要买：本

* 如何购买
** 关于库存、售价、配送费等具体信息建议直接联系店主咨询。
联系店主：

15974791540

本店已缴纳保证金,请放心购买!【如何赔付?】

买家对店铺的满意度评价：查看更多>>

评分

评价内容

评论人

订单图书

100分
满分

确认收货后30天未评价，系统默认好评！
[2025-01-26 12:27:13]

李**
福州市

传热学(第五版) ￥26.60
ANSYS Fluent 实例详解￥26.20
100分
满分

确认收货后30天未评价，系统默认好评！
[2025-01-26 11:49:09]

陈**
宁波市

新编西方文论教程￥9.50
文学批评方法手册(第4版) ￥15.50
100分
满分

确认收货后30天未评价，系统默认好评！
[2025-01-26 10:07:45]

李**
秦皇岛市

拿破仑传（德语直译无删节）￥27.50
失去的胜利-曼施泰因元帅战争回忆录￥22.50
100分
满分

确认收货后30天未评价，系统默认好评！
[2025-01-26 09:10:23]

罗**
佛山市

光伏专业英语（高等职业教育“十三五”规划教材（新能源课程群））￥10.40
100分
满分

确认收货后30天未评价，系统默认好评！
[2025-01-25 20:31:26]

左**
武汉市

都柏林人(译文经典·精装) ￥15.30

《数据挖掘导论(英文版)》内容提要：

本书对数据挖掘进行了全面介绍,旨在为读者提供将数据挖掘应用于实际问题所必需的知识。本书涵盖五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前面一章讲述基本概念、代表性算法和评估技术,而后面一章较深入地讨论**概念和算法。目的是在使读者透彻地理解数据挖掘基础的同时,还能了解更多重要的**主题。此外,书中还提供了大量例子、图表和习题。
本书适合作为相关专业高年级本科生和研究生数据挖掘课程的教材,同时也可作为从事数据挖掘研究和应用开发工作的技术人员的参考书。

《数据挖掘导论(英文版)》图书目录：

1 Introduction 1
1.1 What Is Data Mining? 2
1.2 Motivating Challenges 3
1.3 The Origins of Data Mining 4
1.4 Data Mining Tasks 5
1.5 Scope and Organization of the Book 8
1.6 Bibliographic Notes 9
1.7 Exercises 12

2 Data 13
2.1 Types of Data 15
2.1.1 Attributes and Measurement 15
2.1.2 Types of Data Sets 20
2.2 Data Quality 25
2.2.1 Measurement and Data Collection Issues 26
2.2.2 Issues Related to Applications 31
2.3 Data Preprocessing 32
2.3.1 Aggregation 32
2.3.2 Sampling 34
2.3.3 Dimensionality Reduction 36
2.3.4 Feature Subset Selection 37
2.3.5 Feature Creation 39
2.3.6 Discretization and Binarization 41
2.3.7 Variable Transformation 45
2.4 Measures of Similarity and Dissimilarity 47
2.4.1 Basics 47
2.4.2 Similarity and Dissimilarity between Simple Attributes 49
2.4.3 Dissimilarities between Data Objects 50
2.4.4 Similarities between Data Objects 52
2.4.5 Examples of Proximity Measures 53
2.4.6 Issues in Proximity Calculation 58
2.4.7 Selecting the Right Proximity Measure 60
2.5 Bibliographic Notes 61
2.6 Exercises 64

3 Exploring Data 71
3.1 The Iris Data Set 71
3.2 Summary Statistics 72
3.2.1 Frequencies and the Mode 72
3.2.2 Percentiles 73
3.2.3 Measures of Location: Mean and Median 73
3.2.4 Measures of Spread: Range and Variance 75
3.2.5 Multivariate Summary Statistics 76
3.2.6 Other Ways to Summarize the Data 77
3.3 Visualization 77
3.3.1 Motivations for Visualization 77
3.3.2 General Concepts 78
3.3.3 Techniques 81
3.3.4 Visualizing Higher-Dimensional Data 90
3.3.5 Do's and Don'ts 94
3.4 OLAP and Multidimensional Data Analysis 95
3.4.1 Representing Iris Data as a Multidimensional Array 95
3.4.2 Multidimensional Data: The General Case 97
3.4.3 Analyzing Multidimensional Data 98
3.4.4 Final Comments on Multidimensional Data Analysis 101
3.5 Bibliographic Notes 102
3.6 Exercises 103

4 Classification: Basic Concepts, Decision Trees, and Model Evaluation 105
4.1 Preliminaries 105
4.2 General Approach to Solving a Classification Problem 107
4.3 Decision Tree Induction 108
4.3.1 How a Decision Tree Works 108
4.3.2 How to Build a Decision Tree 110
4.3.3 Methods for Expressing Attribute Test Conditions 112
4.3.4 Measures for Selecting the Best Split 114
4.3.5 Algorithm for Decision Tree Induction 119
4.3.6 An Example: Web Robot Detection 120
4.3.7 Characteristics of Decision Tree Induction 122
4.4 Model Overfitting 125
4.4.1 Overfitting Due to Presence of Noise 127
4.4.2 Overfitting Due to Lack of Representative Samples 129
4.4.3 Overfitting and the Multiple Comparison Procedure 129
4.4.4 Estimation of Generalization Errors 131
4.4.5 Handling Overfitting in Decision Tree Induction 134
4.5 Evaluating the Performance of a Classifier 135
4.5.1 Holdout Method 136
4.5.2 Random Subsampling 136
4.5.3 Cross-Validation 136
4.5.4 Bootstrap 137
4.6 Methods for Comparing Classifiers 137
4.6.1 Estimating a Confidence Interval for Accuracy 138
4.6.2 Comparing the Performance of Two Models 139
4.6.3 Comparing the Performance of Two Classifiers 140
4.7 Bibliographic Notes 141
4.8 Exercises 144

5 Classification: Alternative Techniques 151
5.1 Rule-Based Classifier 151
5.1.1 How a Rule-Based Classifier Works 153
5.1.2 Rule-Ordering Schemes 154
5.1.3 How to Build a Rule-Based Classifier 155
5.1.4 Direct Methods for Rule Extraction 155
5.1.5 Indirect Methods for Rule Extraction 161
5.1.6 Characteristics of Rule-Based Classifiers 163
5.2 Nearest-Neighbor classifiers 163
5.2.1 Algorithm 165
5.2.2 Characteristics of Nearest-Neighbor Classifiers 165
5.3 Bayesian Classifiers 166
5.3.1 Bayes Theorem 166
5.3.2 Using the Bayes Theorem for Classification 168
5.3.3 Na?ve Bayes Classifier 169
5.3.4 Bayes Error Rate 175
5.3.5 Bayesian Belief Networks 176
5.4 Artificial Neural Network (ANN) 181
5.4.1 Perceptron 181
5.4.2 Multilayer Artificial Neural Network 184
5.4.3 Characteristics of ANN 187
5.5 Support Vector Machine (SVM) 188
5.5.1 Maximum Margin Hyperplanes 188
5.5.2 Linear SVM: Separable Case 190
5.5.3 Linear SVM: Nonseparable Case 195
5.5.4 Nonlinear SVM 198
5.5.5 Characteristics of SVM 203
5.6 Ensemble Methods 203
5.6.1 Rationale for Ensemble Method 203
5.6.2 Methods for Constructing an Ensemble Classifier 204
5.6.3 Bias-Variance Decomposition 206
5.6.4 Bagging 209
5.6.5 Boosting 211
5.6.6 Random Forests 215
5.6.7 Empirical Comparison among Ensemble Methods 216
5.7 Class Imbalance Problem 217
5.7.1 Alternative Metrics 218
5.7.2 The Receiver Operating Characteristic Curve 220
5.7.3 Cost-Sensitive Learning 223
5.7.4 Sampling-Based Approaches 225
5.8 Multiclass Problem 226
5.9 Bibliographic Notes 228
5.10 Exercises 233

6 Association Analysis: Basic Concepts and Algorithms 241
6.1 Problem Definition 242
6.2 Frequent Itemset Generation 244
6.2.1 The Apriori Principle 246
6.2.2 Frequent Itemset Generation in the Apriori Algorithm 247
6.2.3 Candidate Generation and Pruning 249
6.2.4 Support Counting 252
6.2.5 Computational Complexity 255
6.3 Rule Generation 257
6.3.1 Confidence-Based Pruning 258
6.3.2 Rule Generation in Apriori Algorithm 258
6.3.3 An Example: Congressional Voting Records 259
6.4 Compact Representation of Frequent Itemsets 260
6.4.1 Maximal Frequent Itemsets 260
6.4.2 Closed Frequent Itemsets 262
6.5 Alternative Methods for Generating Frequent Itemsets 264
6.6 FP-Growth Algorithm 268
6.6.1 FP-Tree Representation 268
6.6.2 Frequent Itemset Generation in FP-Growth Algorithm 270
6.7 Evaluation of Association Patterns 273
6.7.1 Objective Measures of Interestingness 274
6.7.2 Measures beyond Pairs of Binary Variables 282
6.7.3 Simpson's Paradox 283
6.8 Effect of Skewed Support Distribution 285
6.9 Bibliographic Notes 288
6.10 Exercises 298

7 Association Analysis: Advanced Concepts 307
7.1 Handling Categorical Attributes 307
7.2 Handling Continuous Attributes 309
7.2.1 Discretization-Based Methods 310
7.2.2 Statistics-Based Methods 312
7.2.3 Non-discretization Methods 314
7.3 Handling a Concept Hierarchy 316
7.4 Sequential Patterns 318
7.4.1 Problem Formulation 318
7.4.2 Sequential Pattern Discovery 320
7.4.3 Timing Constraints 323
7.4.4 Alternative Counting Schemes 327
7.5 Subgraph Patterns 328
7.5.1 Graphs and Subgraphs 329
7.5.2 Frequent Subgraph Mining 330
7.5.3 Apriori-like Method 332
7.5.4 Candidate Generation 333
7.5.5 Candidate Pruning 338
7.5.6 Support Counting 340
7.6 Infrequent Patterns 340
7.6.1 Negative Patterns 341
7.6.2 Negatively Correlated Patterns 342
7.6.3 Comparisons among Infrequent Patterns, Negative Patterns, and Negatively Correlated Patterns 343
7.6.4 Techniques for Mining Interesting Infrequent Patterns 344
7.6.5 Techniques Based on Mining Negative Patterns 345
7.6.6 Techniques Based on Support Expectation 347
7.7 Bibliographic Notes 350
7.8 Exercises 353

8 Cluster Analysis: Basic Concepts and Algorithms 363
8.1 Overview 365
8.1.1 What Is Cluster Analysis? 365
8.1.2 Different Types of Clusterings 366
8.1.3 Different Types of Clusters 368
8.2 K-means 370
8.2.1 The Basic K-means Algorithm 371
8.2.2 K-means: Additional Issues 378
8.2.3 Bisecting K-means 380
8.2.4 K-means and Different Types of Clusters 381
8.2.5 Strengths and Weaknesses 383
8.2.6 K-means as an Optimization Problem 383
8.3 Agglomerative Hierarchical Clustering 385
8.3.1 Basic Agglomerative Hierarchical Clustering Algorithm 385
8.3.2 Specific Techniques 387
8.3.3 The Lance-Williams Formula for Cluster Proximity 391
8.3.4 Key Issues in Hierarchical Clustering 391
8.3.5 Strengths and Weaknesses 393
8.4 DBSCAN 393
8.4.1 Traditional Density: Center-Based Approach 393
8.4.2 The DBSCAN Algorithm 394
8.4.3 Strengths and Weaknesses 398
8.5 Cluster Evaluation 398
8.5.1 Overview 399
8.5.2 Unsupervised Cluster Evaluation Using Cohesion and Separation 401
8.5.3 Unsupervised Cluster Evaluation Using the Proximity Matrix 406
8.5.4 Unsupervised Evaluation of Hierarchical Clustering 408
8.5.5 Determining the Correct Number of Clusters 409
8.5.6 Clustering Tendency 410
8.5.7 Supervised Measures of Cluster Validity 411
8.5.8 Assessing the Significance of Cluster Validity Measures 414
8.6 Bibliographic Notes 416
8.7 Exercises 419

9 Cluster Analysis: Additional Issues and Algorithms 427
9.1 Characteristics of Data, Clusters, and Clustering Algorithms 427
9.1.1 Example: Comparing K-means and DBSCAN 428
9.1.2 Data Characteristics 429
9.1.3 Cluster Characteristics 430
9.1.4 General Characteristics of Clustering Algorithms 431
9.2 Prototype-Based Clustering 433
9.2.1 Fuzzy Clustering 433
9.2.2 Clustering Using Mixture Models 437
9.2.3 Self-Organizing Maps (SOM) 446
9.3 Density-Based Clustering 451
9.3.1 Grid-Based Clustering 451
9.3.2 Subspace Clustering 454
9.3.3 DENCLUE: A Kernel-Based Scheme for Density-Based Clustering 457
9.4 Graph-Based Clustering 460
9.4.1 Sparsification 461
9.4.2 Minimum Spanning Tree (MST) Clustering 462
9.4.3 OPOSSUM: Optimal Partitioning of Sparse Similarities Using METIS 463
9.4.4 Chameleon: Hierarchical Clustering with Dynamic Modeling 464
9.4.5 Shared Nearest Neighbor Similarity 468
9.4.6 The Jarvis-Patrick Clustering Algorithm 471
9.4.7 SNN Density 472
9.4.8 SNN Density-Based Clustering 473
9.5 Scalable Clustering Algorithms 475
9.5.1 Scalability: General Issues and Approaches 476
9.5.2 BIRCH 477
9.5.3 CURE 479
9.6 Which Clustering Algorithm? 482
9.7 Bibliographic Notes 484
9.8 Exercises 488

10 Anomaly Detection 491
10.1 Preliminaries 492
10.1.1 Causes of Anomalies 492
10.1.2 Approaches to Anomaly Detection 493
10.1.3 The Use of Class Labels 494
10.1.4 Issues 495
10.2 Statistical Approaches 496
10.2.1 Detecting Outliers in a Univariate Normal Distribution 497
10.2.2 Outliers in a Multivariate Normal Distribution 499
10.2.3 A Mixture Model Approach for Anomaly Detection 500
10.2.4 Strengths and Weaknesses 502
10.3 Proximity-Based Outlier Detection 502
10.3.1 Strengths and Weaknesses 503
10.4 Density-Based Outlier Detection 504
10.4.1 Detection of Outliers Using Relative Density 505
10.4.2 Strengths and Weaknesses 506
10.5 Clustering-Based Techniques 506
10.5.1 Assessing the Extent to Which an Object Belongs to a Cluster 507
10.5.2 Impact of Outliers on the Initial Clustering 509
10.5.3 The Number of Clusters to Use 509
10.5.4 Strengths and Weaknesses 509
10.6 Bibliographic Notes 510
10.7 Exercises 513

《数据挖掘导论(英文版)》编辑推荐与评论：

“这是一本全新的数据挖掘教材,值得大力**。”
——Jiawei Han,伊利诺伊大学教授
本书全面介绍了数据挖掘,涵盖了五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前一章涵盖基本概念、代表性算法和评估技术,而后一章讨论**概念和算法。这样读者在透彻地理解数据挖掘的基础的同时,还能够了解更多重要的**主题。
本书是明尼苏达大学和密歇根州立大学数据挖掘课程的教材,由于独具特色,正式出版之前就已经被斯坦福大学、得克萨斯大学奥斯汀分校等众多名校采用。
本书特色:
·与许多其他同类图书不同,本书将**放在如何用数据挖掘知识解决各种实际问题。
·只要求具备很少的预备知识——不需要数据库背景,只需要很少的统计学或数学背景知识。
·书中包含大量的图表、综合示例和丰富的习题,并且使用示例、关键算法的简洁描述和习题,尽可能直接地聚集于数据挖掘的主要概念。
·教辅内容极为丰富,包括课程幻灯片、学生课题建议、数据挖掘资源(如数据挖掘算法和数据集)、联机指南(使用实际的数据集和数据分析软件,为本书介绍的部分数据挖掘技术提供例子讲解)。
·为采用本书作为教材的教师提供习题解答。

《数据挖掘导论(英文版)》作者介绍：

Pang-Ning Tan 现为密歇根州立大学计算机与工程系助理教授,主要教授数据挖掘、数据库系统等课程。此前,他曾是明尼苏达大学美国陆军高性能计算研究**副研究员(2002-2003)。
Michael Steinbach 明尼苏达大学计算机与工程系研究员,在读博士。
Vipin Kumar 明尼苏达大学计算机科学与工程系主任,曾任美国陆军高性能计算研究**主任。他拥有马里兰大学博士学位,是数据挖掘和高性能计算方面的国际权威,IEEE会士。