Data Compression for Data Mining Algorithms

1st Edition - May 3, 2026
Latest edition
Author: Xiaochun Wang
Language: English

Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provid… Read more

Spring sale

Knowledge that grows with you

Up to 25% off trusted resources

Shop the spring sale

Description

Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provides the first systematic and comprehensive description of the relationships between data compression mechanisms and the computations involved in data mining algorithms. Data mining algorithms are powerful analytical techniques used across various disciplines, including business, engineering, and science. However, in the big data era, tasks such as association rule mining and classification often require multiple scans of databases, while clustering and outlier detection methods typically depend on Euclidean distance for similarity measures, leading to high computational costs.

Data Compression for Data Mining Algorithms addresses these challenges by focusing on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations involved in tasks such as feature selection and similarity computation. The book features the latest developments in both lossless and lossy data compression methods and provides a comprehensive exposition of data compression methods for data mining algorithm design from multiple points of view.

Key discussions include Huffman coding, scalar and vector quantization, transforms, subbands, wavelet-based compression for scalable algorithms, and the role of neural networks, particularly deep learning, in feature selection and dimensionality reduction. The book’s contents are well-balanced for both theoretical analysis and real-world applications, and the chapters are well organized to compose a solid overview of the data compression techniques for data mining. To provide the reader with a more complete understanding of the material, projects and problems solved with Python are interspersed throughout the text.

Key features

Covers popular data compression methods and their solutions to aid in the development and application of data mining algorithms
Includes projects and problems solved with Python to help readers create programs for both data compression and data mining problems
Focuses on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations
Simplifies the content of the field of data compression by covering topics that are widely useful from a data mining perspective

Readership

Computer Science researchers, data science researchers, and data analysis researchers in academia and industry. The primary audience also includes researchers and professionals in the fields of mathematics, AI, ML, deep learning and those who want to enhance their skills in data mining and analysis.

Part I: Foundation

1. Overview and Contributions

1.1 Overview

1.2 Introduction

1.3 Developments in Data Compression Techniques for Data Mining Algorithm Design

1.4 Overview of the Book

1.5 Contributions

1.6 Conclusions

2. Introduction to Data Mining Algorithms

2.1 Introduction

2.2 Association Rule Mining

2.2.1 Frequent Itemsets

2.2.2 Association Rules

2.3 Classification

2.3.1 Decision Tree

2.3.2 Support Vector Machine

2.4 Clustering

2.4.1 k-Means Algorithm

2.4.2 Single-Link Algorithm

2.4.3 DBSCAN Algorithm

2.4.4 Minimum Spanning Tree Algorithm

2.5 Outlier Detection

2.5.1 Probability Based Algorithm

2.5.2 Proximity Based Algorithm

2.5.3 Classification Based Algorithm

2.5.4 Clustering Based Algorithm

2.6 Mining Large Datasets

2.6.1 Overview

2.6.2 Issues and Challenges

2.7 Summary

2.8 Bibliographies

3. Introduction to Data Compression Methods

3.1 Feature Extraction and Data Representation

3.2 Lossless Data Compression Methods

3.2.1 Huffman Coding

3.2.2 Arithmetic Coding

3.2.3 Run-length Coding

3.3 Lossy Data Compression Methods

3.3.1 Quantization

3.3.2 Dictionary Techniques

3.3.3 Differential Encoding

3.3.4 Transform Coding, Subband Coding, and Wavelets

3.4 Data Compression for Data Preprocessing

3.4.1 Data Reduction and Transformation

3.4.2 Sampling

3.4.3 Dimensionality Reduction

3.5 Summary

3.6 Bibliographic Notes

Part II: Association Rule Mining

4. Huffman Coding for Association Rule Mining

4.1 Introduction

4.2 Frequent Itemset and Association Rule Mining

4.3 The Apriori Algorithm

4.4 The FP-tree Algorithm

4.5 The Proposed Huffman Coding for Frequent Itemset Mining

4.6 Experiments and Results

4.7 Conclusions

4.8 References

5. Arithmetic Coding for Maximal Frequent Itemsets Mining

5.1 Introduction

5.2 Maximal Frequent Itemsets Mining

5.3 Arithmetic Coding

5.4 The Proposed Arithmetic Coding for Maximal Frequent Itemset Mining

5.5 Experiments and Results

5.6 Conclusions

5.7 References

Part III: Classification

6. Feature Subset Selection for Decision Tree Construction

6.1 Introduction

6.2 Decision Tree for Classification

6.3 Feature Subset Selection

6.4 The Proposed Feature Subset Selection for Decision Tree Construction

6.5 Experiments and Results

6.6 Conclusions

6.7 References

7. Neural Networks for Decision Tree Construction

7.1 Introduction

7.2 Neural Networks

7.3 Deep Neural Networks

7.4 The Proposed NN-Based Feature Subset Selection for Decision Tree Construction

7.5 Experiments and Results

7.6 Conclusions

7.7 References

8. Principal Component Analysis for Decision Tree Construction

8.1 Introduction

8.2 Principal Component Analysis

8.3 The Proposed PCA-Based Decision Tree Construction

8.4 Experiments and Results

8.5 Conclusions

8.6 References

9. Dictionary Techniques for Support Vector Machine

9.1 Introduction

9.2 Support Vector Machine for Classification

9.3 Dictionary Techniques

9.4 The Proposed Dictionary Techniques for Support Vector Machine

9.5 Experiments and Results

9.6 Conclusions

9.7 References

10. Quantization for Support Vector Machine

10.1 Introduction

10.2 Scalar Quantization

10.3 Vector Quantization

10.4 The Proposed Quantization Method for Support Vector Machine

10.5 Experiments and Results

10.6 Conclusions

10.7 References

Part IV: Clustering and Outlier Detection

11. A Sparse Data Representation for Clustering

11.1 Introduction

11.2 Background

11.3 The Proposed Data Compression Method

11.4 Experiments and Results

11.5 Conclusions

11.6 References

12. Dictionary Coding Based Compression for Clustering

12.1 Introduction

12.2 Background

12.3 The Proposed Dictionary Coding Method for Efficient Clustering

12.4 Experiments and Results

12.5 Conclusions

12.6 References

13. Nearest Neighbor Based Compression for Outlier Detection

13.1 Introduction

13.2 Background

13.3 The Proposed Data Compression Method for Efficient Outlier Detection

13.4 Experiments and Results

13.5 Conclusions

13.6 References

14. Huffman Coding for Outlier Detection

14.1 Introduction

14.2 Background

14.3 The Proposed Multi-dimensional Data Compression by Huffman Coding

14.4 Experiments and Results

14.5 Conclusions

14.6 References

15. Arithmetic Coding for Outlier Detection

15.1 Introduction

15.2 Background

15.3 The Proposed Multi-dimensional Data Compression by Arithmetic Coding

15.4 Experiments and Results

15.5 Conclusions

15.6 References

Product details

Edition: 1
Latest edition
Published: May 7, 2026
Language: English

About the author

Xiaochun Wang

Dr. Xiaochun Wang received her BS degree from Beijing University and her MS degree in data compression and PhD degree in mobile robotics from the Department of Electrical Engineering and Computer Science at Vanderbilt University. She was an associate professor at the School of Software Engineering at Xi’an Jiaotong University and taught Database Management and Data Mining courses from 2010 to 2021. She currently works as a senior scientist at Xi’an Tuowei Hi-Tech Corporation. Her research interests include data mining, pattern recognition, signal processing, and computer vision.

Affiliations and expertise

Xi’an Tuowei-High-Tech Corporation, Xi'an, China

View book on ScienceDirect

Read Data Compression for Data Mining Algorithms on ScienceDirect

Life Sciences

Physical Sciences & Engineering

Social Sciences & Humanities

Health