Data Mining

Guide to the top data mining algorithms

In today’s ever-expanding technological environment, companies—for instance, in banking, retail, and social media—store large batches of data online and across many systems. Companies can make use of this data and benefit from it through data mining. This article explores how data mining algorithms work and how you can use them. It also looks at some of the top data mining algorithms available today.

To start, data mining is an important step in the larger process of knowledge discovery. It is the process of exploring and analyzing large data sets for patterns, relationships, and trends. Companies engage in data mining to gain useful business insights. For example, a company might use data mining to analyze a group’s buying habits, bank transactions, or medical history to predict the group’s future actions.

People sometimes confuse data mining with data harvesting. However, data harvesting is the process of extracting and analyzing data from online sources. Data mining does not involve “harvesting” data. Instead, it centers on examining data to produce new information.

To do so, data mining typically uses a machine learning method called supervised learning. Supervised learning “teaches” algorithms new processes in data review and analysis. Typically, a supervised learning algorithm views data, applies conditions, acts on the data, and produces results. It then applies the same process and reasoning to new data.

What is an algorithm in data mining?

In general, algorithms employ a series of steps or rules to process data and produce a specific outcome, result, or prediction. Within data mining, algorithms perform functions such as analyzing, classifying, and forecasting data and monitoring data trends.

Data mining algorithms are types of supervised learning algorithms. They use learning algorithm elements like statistics, probability, and artificial intelligence to explore and generate results that benefit companies, industries, and organizations all over the world.

Supervised learning algorithms and other supervised learning methods depend on labeled data. The labeled data includes the algorithm’s expected data output. A simple example is a picture of a dog labeled with the word “dog.” Labeled data helps the algorithm “learn” patterns in the data and later apply these patterns to unlabeled sets of data. In the above example, a labeled picture of a dog could help an algorithm recognize other images of dogs.

Despite their use of labeled data, supervised learning algorithms can predict or estimate unknown data quantities in the future. This is possible as long as the calculations are based on prior patterns in known data.

What is the role of the algorithm in data mining?

Data mining algorithms process large groups of data to produce certain statistical analyses or results for businesses, industries, or organizations. As such, they are a vital part of the data mining process.

A data mining algorithm’s role depends on the expectations of a user, creator, or investor. As we noted previously, many data mining algorithms conduct data analysis on large data sets. Across many fields, data mining algorithms can analyze audio, textual, and visual data according to demographic factors such as age, gender, and income.

For instance, a shoe company might develop a data mining algorithm to uncover the percentage of the company’s stock that women between the ages of twenty-five and thirty own. An organization within the medical field might use data mining algorithms to conduct research on certain diseases and their impacts on different groups of patients. A social media company might use a data mining algorithm to provide facial tagging suggestions.

All of these examples rely on data points, or criteria, that work with a data mining algorithm to produce the best or closest desired outcome.

Many types of data mining algorithms exist to analyze and interpret data and help achieve desired results. Examples include decision trees, support vector machines, k-nearest neighbors, and neural networks. We will discuss these in more depth later.

However, despite this variety in data mining algorithms, the basic underlying process for all of them is similar. Regardless of its specific purpose, a data mining algorithm’s process—taking data and producing a result—remains the same.

What are the main components of an algorithm for data mining?

At a basic level, data mining algorithms contain different elements that, in combination, lead to a result. The main components of data mining algorithms include data, conditions, and expectations (i.e., end goals).

As we discussed previously, a data mining algorithm relies on data to operate. This data usually comes in the form of large data sets that the algorithm reviews and breaks down into smaller data sets. The algorithm breaks down and analyzes data in relationship to variables. Examples of variables include age, gender, salary, and location.

Different types of variables produce different results. Three types of variables that algorithms use are discrete, continuous, and categorical.

Discrete variables consist of finite (countable) numbers. An example is the number of people who attended a concert or the length of a piece of equipment. Continuous variables, in contrast, have an infinite number of values. An example is the date and time a company receives a payment. Categorical variables contain a finite number of categories or groups. These variables aren’t dependent on order. Examples include material type and payment method.

In an algorithm, multiple variables work together to create conditions. When the algorithm applies certain conditions, it produces specific results.

For example, say a company wants to see how many elderly customers buy a certain type of toothpaste. Conditions of the data mining algorithm would include variables such as customer age, toothpaste type, and purchase confirmation. When the algorithm applies these conditions, it can generate the company’s desired result.

Many data mining algorithms also use conditional probability to generate outcomes. Conditional probability involves events and if/then instances. We can look at a coin toss to illustrate this. If I toss a coin, then it will land as either heads or tails. The algorithm “learns” how to understand and use this conditional “language” in order to seek out a specific outcome. Through such language, for example, an algorithm can identify the face of a specific person in a database.

Often, data mining algorithms incorporate Bayes’s theorem of conditional probability and predictive analysis into their data mining processes. Bayes’s eighteenth-century theorem hinges on the fact that one event will likely happen because another event has already happened. Although the theorem is dated, its if/then concepts and approaches remain useful to determine outcomes today. Likewise, predictive analytics offer another way to estimate future impacts on other sets of large data.

How do you write an algorithm for data mining?

Data experts and programmers create data mining algorithms through careful thought, planning, and execution. By establishing input variables, conditions, and output variables, they create algorithms that produce models from data. These models can then predict future data outcomes based on past incidences.

It is important to keep in mind that data programmers write different types of algorithms to create data mining models with specific end goals in mind. Examples of data mining models include the following:

  • A classification model to label loan applicants as low, medium, or high credit risk
  • A decision tree to predict whether a particular consumer will like a product and describe how factors like age and gender will determine product popularity
  • A mathematical model to forecast product sales
  • A set of rules to explain the probabilities that a consumer will purchase a group of products together

Classifications are ways of breaking down and comparing data points. For example, the solar system breaks down into classifications such as planets, moons, and stars. If an algorithm tried to label a specific object in our solar system, it would likely consider these different classifications and their connections to each other in its analytical process.

Decision trees resemble classification models. They start with a main idea that breaks down into several other related ideas when an algorithm applies certain factors. In turn, those ideas break down further as the algorithm applies more conditions. Eventually, these “branches” of ideas lead to an end result.

Both human and machine learning use decision trees as part of the decision-making process. Decision trees present data simply and linearly. For this reason, they represent a key approach to data mining.

Mathematical processes are key to identifying correlations in large data sets and then creating predictions. Linear algebra and probability, for example, play an important role in some data mining models.

Rules are also important in data mining models. These rules tell a data mining algorithm where it should act first. Consider, for example, a situation in which you need to know the probability of event A to predict the likelihood that event B will happen because of event A. An important rule in the equation would instruct the algorithm to discern event A’s probability before proceeding to any other calculations.

Once operating, many data mining algorithms work independently, without human supervision. That’s what makes them part of the machine learning family. However, someone must first set up the algorithm and make adjustments as necessary. This is why we categorize data mining algorithms as supervised learning algorithms.

How to use data mining algorithms

Various industries use data mining algorithms for research, investigation, and analytical purposes. These algorithms produce useful insights from the large data sets that companies have at their disposal.

An example of a field employing data mining algorithms for research today is the medical field. Often, doctors and other medical professionals use different data mining algorithms to predict the prevalence of certain diseases, such as heart disease, among a population.

In contrast, law enforcement agencies and social media companies might use data mining algorithms for investigation and analytical purposes. Although for different reasons, both types of organizations might conduct facial recognition searches to confirm a person’s identity.

What should you look for in an algorithm for data mining?

It is important to choose a data mining algorithm that meets your specific needs and goals. As we have discussed above, data mining algorithms vary according to their purpose. If you are considering data mining, you want to ensure that you choose algorithms that fit with your intended purpose.

The ultimate goal of data mining is actionable insight. Finding patterns among large data sets alone might be interesting to an individual or company. But the true value of a data mining algorithm comes from the user’s ability to act on the new information that data mining produces. You should always keep this in mind when evaluating data mining algorithms.

What do you need to write an algorithm for data mining?

Before developers create a data mining algorithm, they must first know the purpose of the algorithm and what it will analyze in terms of both data type and data format. Will the algorithm examine handwriting? Will it examine cell phone photographs? Will it examine shopping tendencies?

In addition to knowing what an algorithm will examine, developers also need an appropriate set of data. Based on the application, data could vary from a collection of sample handwriting or cell phone photos to a large database, such as the history of transactions in a group of retail stores.

Finally, developers need to write an equation that enables the algorithm to test the data. This equation often includes probability and predictive analysis.

How do you measure the efficacy of an algorithm for data mining?

Different algorithms have different levels of efficacy. Testing efficacy sometimes means running data through multiple data mining algorithms in order to see which one produces the best results.

One study in the medical field compared different data mining algorithms’ ability to predict heart disease. When scientists ran data through various algorithms to test for heart disease prevalence, the algorithms produced different results. Some algorithms produced more accurate information and thus proved more useful than others.

Some researchers recommend high-utility itemset mining as a very efficient data mining technique. In this type of data mining, an algorithm searches sets of data for items of high importance to the user. Highly important items might include, for example, specific business transactions, exact medical files, or personal security information.

The development of this type of data mining points to the advancing functionality and promising future of the field. As the world becomes more technologically reliant, more and more data become available. This creates more opportunity for data analysis solutions.

To stay up to date on the latest developments in data mining solutions, check out the IEEE Xplore digital library. Xplore is one of the world’s largest collections of technical literature in engineering, computer science, and related technologies, with five million documents now available in its vast repository. You can search through this library to find out more about ongoing advances in data mining.

Best algorithms for data mining

As mentioned earlier, data mining algorithms fit within the broader category of learning algorithms. Typically, learning algorithms depend on either classification or regression to produce results.

What are the most-used data mining algorithms?

Classification and regression algorithms remain the most-used data mining algorithms available today.

Classification algorithms take data and separate it into groups. Usually the groups correspond to answers to questions, such as “yes” or “no.” Spam filters in email provide a good example of a classification algorithm at work. As an email comes in, an algorithm analyzes its contents (such as sender, subject, and message). Then, the algorithm files the email into either a “yes spam” or “no spam” category.

Examples of classification algorithms are naive Bayes and k-nearest neighbors. (However, you can also use a k-nearest neighbors algorithm as part of a regression model.)

Naive Bayes algorithms use Bayes’s theorem of probability to review data and assign certain classifications to it. For instance, a naive Bayes algorithm might analyze a text to determine its main theme. It might determine, for example, that a text is discussing cats or dogs.

K-nearest neighbors algorithms are some of the simplest and most easy-to-use data mining algorithms today. They have been around since the 1970s. Their main goal is to place a data point into a certain category based on the data around it.

Examples of systems using k-nearest neighbors algorithms include recommendation lists from streaming services such as Netflix or Hulu. These lists take data points (such as movies or TV shows) and recommend similar/related content to users.

Regression algorithms, on the other hand, answer more complex questions related to a data set. Their goal is to discern a relationship between different data points. For example, facial recognition software uses a regression algorithm that gathers and analyzes different data points to verify a person’s identity.

An example of a regression algorithm is a neural network. Neural networks mimic the human brain’s neural paths. Thousands or millions of pieces of information form these complex computer systems. Neural networks use linear regression algorithms to arrive at key decisions.

Both regression and classification models get support from support vector machines (SVMs). An SVM is another type of algorithm. It takes data from regression and classification models and creates graphs from the data. This lends a visual component to the algorithm. SVMs also help separate data into different classifications.

What makes a data mining algorithm popular?

Companies use data mining algorithms to solve many different problems. Consequently, a wide variety of data mining algorithms exist today. You can fine-tune each algorithm to solve a particular problem.

Generally speaking, a data mining algorithm’s popularity hinges on its ability to provide detailed answers to questions concerning big data. These answers can help users predict an event or trend or, more broadly, the future of an industry. But they also help users with tasks such as avoiding spam in their email inboxes or choosing a nightly TV show.

How do algorithms vary from data mining project to project?

All in all, algorithms are versatile. Likewise, their use varies across many projects in different industries. Some projects call for specific types of algorithms. For example, one project might require an algorithm that can test for classification-based outcomes. Another might require an algorithm that can test for regression-based outcomes.

Additionally, some projects depend on multiple algorithms to work. For instance, the results from one algorithm might help produce results that are used by a second algorithm.

Top software packages for using data mining algorithms

Today, companies often choose to invest in software packages that make data mining easy and approachable. Many of these software packages offer the added bonus of providing data managing and storage services in addition to data mining algorithms and tools.

As we have stated above, data mining algorithms vary according to their intended purpose. As such, users should choose a data mining software package that fits with their specific needs.

What software is available for using data mining algorithms?

Software packages reduce the need to produce algorithms from scratch. Likewise, they provide different data analytics tools that aid algorithms and help the user get desired results. Examples of such tools include artificial intelligence and predictive analytics.

Popular software packages such as Alteryx Analytics, Orange, and KNIME contain data analytics tools like these. They also contain additional features that appeal to users. These include, for example, data visualization and display features and accessibility across multiple platforms.

What should you look for in software for using data mining algorithms?

You should keep your goals in mind when considering software options. When you choose software, you should make sure its offerings match your data mining vision. For example, you might want a system that creates visual displays, such as charts and graphs, from a data mining algorithm’s output information. In this case, you want to make sure the software you invest in includes data visualization among its features.

Likewise, you should consider the package’s accessibility options. For instance, can Mac and PC users access the software equally as easily? Is there a cloud-based storage system or a Software-as-a-Service (SaaS) option? What does the package’s interface look like? How would the interface affect your ability to explore and utilize the software?

Furthermore, you might benefit from a software package that you can add paid or free features to over time. Some software packages allow users with a valid product license to freely download or purchase additional features. The future of data mining looks promising. Because of this, having the ability to add features might be especially important going forward.

What are the best free and paid options for data mining algorithm software?

Software packages and their offerings vary according to their monetary value. Often paid-for packages include more high-tech, innovative, and appealing elements. In contrast, free versions generally contain fewer features. However, quality free options do exist.

According to a conference paper on free software tools for data mining, the best free offerings include RapidMiner, Weka, R, KNIME, Orange, and scikit-learn. Many of the companies behind these free tools also offer data mining services.

Paid-for options include Sisense, Neural Designer, and Alteryx Analytics. These companies focus on different data mining tools, such as analytics, machine learning, and business intelligence, respectively.

Ultimately, as technology continues to improve, the variety of data mining algorithms and software packages will likely continue to grow. So too will the importance and potential value of data mining as a field continue to grow in the future.

Interested in becoming an IEEE member? Joining this community of over 420,000 technology and engineering professionals will give you access to the resources and opportunities you need to keep on top of changes in technology, as well as help you get involved in standards development, network with other professionals in your local area or within a specific technical interest, mentor the next generation of engineers and technologists, and so much more.







Conferences related to Data Mining

Back to Top

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

The conference program will consist of plenary lectures, symposia, workshops and invitedsessions of the latest significant findings and developments in all the major fields of biomedical engineering.Submitted papers will be peer reviewed. Accepted high quality papers will be presented in oral and postersessions, will appear in the Conference Proceedings and will be indexed in PubMed/MEDLINE


IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium

All fields of satellite, airborne and ground remote sensing.


2020 59th IEEE Conference on Decision and Control (CDC)

The CDC is the premier conference dedicated to the advancement of the theory and practice of systems and control. The CDC annually brings together an international community of researchers and practitioners in the field of automatic control to discuss new research results, perspectives on future developments, and innovative applications relevant to decision making, automatic control, and related areas.


2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI)

The IEEE International Symposium on Biomedical Imaging (ISBI) is the premier forum for the presentation of technological advances in theoretical and applied biomedical imaging.ISBI 2019 will be the 16th meeting in this series. The previous meetings have played a leading role in facilitating interaction between researchers in medical and biological imaging. The 2019 meeting will continue this tradition of fostering cross fertilization among different imaging communities and contributing to an integrative approach to biomedical imaging across all scales of observation.


2019 20th IEEE International Conference on Mobile Data Management (MDM)

The MDM series of conferences, since its debut in 1999, has established itself as a prestigious forum for the exchange of innovative and significant research results in mobile data management. The conference provides unique opportunities to bring researchers, engineers, and practitioners together to explore new ideas, techniques, and tools, and exchange experiences. Comprising both research and industry tracks, it serves as an important bridge between academic researchers and industry researchers. Along with the presentations of research publications, it also serves as a meeting place for technical demonstrations (demos), workshops, advanced seminars, panel discussions as well as Industrial forum to cater industrial developers.The conference focuses on research contributions in data management in mobile, ubiquitous and pervasive computing.

  • 2004 IEEE International Conference on Mobile Data Management (MDM)

  • 2006 International Conference on Mobile Data Management (MDM)

  • 2007 International Conference on Mobile Data Management (MDM)

  • 2008 9th International Conference on Mobile Data Management (MDM)

  • 2010 11th International Conference on Mobile Data Management (MDM)

    The annual MDM conference is a leading international forum that focuses on data management for mobile, ubiquitous, and pervasive computing. It brings together a wide range of researchers, practitioners, and users to explore scientific and industrial challenges that arise in the areas of data management and mobile computing.

  • 2011 12th IEEE International Conference on Mobile Data Management (MDM)

    The MDM series of conferences, since its debut in December 1999, has established itself as a prestigious forum for the exchange of innovative and significant research results in mobile data management. The term mobile in MDM has been used from the very beginning in a broad sense to encompass all aspects of mobility - aspects related to wireless, portable and tiny devices. The conference provides unique opportunities for researchers, engineers, practitioners, developers, and users to explore new ideas.

  • 2012 13th IEEE International Conference on Mobile Data Management (MDM)

    The MDM series of conferences, since its debut in December 1999, has established itself as a prestigious forum for the exchange of innovative and significant research results in mobile data management. The term mobile in MDM has been used from the very beginning in a broad sense to encompass all aspects of mobility - aspects related to wireless, portable and tiny devices. The conference provides unique opportunities for researchers, engineers, practitioners, developers, and users to explore new ideas, techniques, and tools, and to exchange experiences.

  • 2013 14th IEEE International Conference on Mobile Data Management (MDM)

    The MDM series of conferences, since its debut in December 1999, has established itself as a prestigious forum for the exchange of innovative and significant research results in mobile data management. The term mobile in MDM has been used from the very beginning in a broad sense to encompass all aspects of mobility - aspects related to wireless, portable and tiny devices. The conference provides unique opportunities for researchers, engineers, practitioners, developers, and users to explore new ideas, techniques, and tools, and to exchange experiences.

  • 2014 15th IEEE International Conference on Mobile Data Management (MDM)

    The MDM series of conferences, since its debut in December 1999, has established itself as a prestigious forum for the exchange of innovative and significant research results in mobile data management. The term mobile in MDM has been used from the very beginning in a broad sense to encompass all aspects of mobility

  • 2015 16th IEEE International Conference on Mobile Data Management (MDM)

    The MDM series of conferences, since its debut in December 1999, has established itself as a prestigious forum for the exchange of innovative and significant research results in mobile data management. The term mobile in MDM has been used from the very beginning in a broad sense to encompass all aspects of mobility related to wireless, portable and tiny devices. The conference provides unique opportunities for researchers, engineers, practitioners, developers, and users to explore new ideas, techniques, and tools, and to exchange experiences.

  • 2016 17th IEEE International Conference on Mobile Data Management (MDM)

    The Mobile Data Management series of conferences first debuted in December 1999. Since inception, it has established itself as a prestigious forum to exchange innovative and significant research results in mobile data management. Comprising both research and industry tracks, it serves as an important bridge between academic researchers and industry researchers. Along with the presentations of research publications, it also serves as a meeting place for technical demonstrations (Demos), workshops, panel discussions as well as PhD forum and Industrial forum to cater PhD students and industrial developers.The conference focuses on research contributions in data management in mobile, ubiquitous and pervasive computing.

  • 2017 18th IEEE International Conference on Mobile Data Management (MDM)

    Mobile computing and data management

  • 2018 19th IEEE International Conference on Mobile Data Management (MDM)

    The conference aims to attract original research contributions in the interesection of mobile computing and data management. Topics of interest include, but not limited to:- Mobile Cloud Computing and Data Management - Data Management for Internet of Things (IoT) and Sensor Systems- Data Management for Augmented Reality Systems- Data Management for Intelligent Transportation Systems, Smart Spaces- Mobile Crowd-Sourcing and Crowd-Sensing- Mobile Data Analytics- Behavioural/Activity Sensing and Analytics- Mobile Location-Based Social Networks- Mobile Recommendation Systems- Context-aware Computing for Intelligent Mobile Services- Middleware and Tools for Mobile and Pervasive Computing- Theoretical Foundations of Data-intensive Mobile Computing- Data Stream Processing in Mobile/Sensor Networks- Indexing, Optimisation and Query Processing for Moving Objects/Users- Security and Privacy in Mobile Systems



Periodicals related to Data Mining

Back to Top

Communications, IEEE Transactions on

Telephone, telegraphy, facsimile, and point-to-point television, by electromagnetic propagation, including radio; wire; aerial, underground, coaxial, and submarine cables; waveguides, communication satellites, and lasers; in marine, aeronautical, space and fixed station services; repeaters, radio relaying, signal storage, and regeneration; telecommunication error detection and correction; multiplexing and carrier techniques; communication switching systems; data communications; and communication theory. In addition to the above, ...


Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Specific topics of interest include, but are not limited to, sequence analysis, comparison and alignment methods; motif, gene and signal recognition; molecular evolution; phylogenetics and phylogenomics; determination or prediction of the structure of RNA and Protein in two and three dimensions; DNA twisting and folding; gene expression and gene regulatory networks; deduction of metabolic pathways; micro-array design and analysis; proteomics; ...


Computer

Computer, the flagship publication of the IEEE Computer Society, publishes peer-reviewed technical content that covers all aspects of computer science, computer engineering, technology, and applications. Computer is a resource that practitioners, researchers, and managers can rely on to provide timely information about current research developments, trends, best practices, and changes in the profession.


Electron Device Letters, IEEE

Publishes original and significant contributions relating to the theory, design, performance and reliability of electron devices, including optoelectronic devices, nanoscale devices, solid-state devices, integrated electronic devices, energy sources, power devices, displays, sensors, electro-mechanical devices, quantum devices and electron tubes.


Electron Devices, IEEE Transactions on

Publishes original and significant contributions relating to the theory, design, performance and reliability of electron devices, including optoelectronics devices, nanoscale devices, solid-state devices, integrated electronic devices, energy sources, power devices, displays, sensors, electro-mechanical devices, quantum devices and electron tubes.



Most published Xplore authors for Data Mining

Back to Top

Xplore Articles related to Data Mining

Back to Top

Spatial and Spatio-temporal Data Mining

2010 IEEE International Conference on Data Mining, 2010

Summary form only given. The recent advances and price reduction of technologies for collecting spatial and spatio-temporal data like Satellite Images, Cellular Phones, Sensor Networks, and GPS devices has facilitated the collection of data referenced in space and time. These huge collections of data often hide interesting information which conventional systems and classical data mining techniques are unable to discover. ...


Application of data mining in traffic management: Case of city of Isfahan

2010 2nd International Conference on Electronic Computer Technology, 2010

This paper describes the work investigating the application of data mining tools to aid in the development of traffic signal timing plans. A case study was conducted to illustrate that the use of hierarchical cluster analysis. This approach can be used for designing of a TOD signal control system, since it automatically identifies time-of-day (TOD) intervals using the historical collected ...


Domain Driven Data Mining (D3M)

2008 IEEE International Conference on Data Mining Workshops, 2008

In deploying data mining into the real-world business, we have to cater for business scenarios, organizational factors, user preferences and business needs. However, the current data mining algorithms and tools often stop at the delivery of patterns satisfying expected technical interestingness. Business people are not informed about how and what to do to take over the technical deliverables. The gap ...


CAKE – Classifying, Associating and Knowledge DiscovEry - An Approach for Distributed Data Mining (DDM) Using PArallel Data Mining Agents (PADMAs)

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008

This paper accentuate an approach of implementing distributed data mining (DDM) using multi-agent system (MAS) technology, and proposes a data mining technique of ldquoCAKErdquo (classifying, associating & knowledge discovery). The architecture is based on centralized parallel data mining agents (PADMAs). Data mining is part of a word, which has been recently introduced known as BI or business intelligence. The need ...


Seventh IEEE International Conference on Data Mining Workshops - Title

Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 2007

The following topics are dealt with: data mining in Web 2.0 environment; knowledge-discovery from multimedia data and multimedia applications; mining and management of biological data; data mining in medicine; optimization-based data mining techniques; high performance data mining; mining graphs and complex structures; data mining on uncertain data; data streaming mining and management; spatial and spatio-temporal data mining.



Educational Resources on Data Mining

Back to Top

IEEE-USA E-Books

  • Spatial and Spatio-temporal Data Mining

    Summary form only given. The recent advances and price reduction of technologies for collecting spatial and spatio-temporal data like Satellite Images, Cellular Phones, Sensor Networks, and GPS devices has facilitated the collection of data referenced in space and time. These huge collections of data often hide interesting information which conventional systems and classical data mining techniques are unable to discover. Spatial and spatio- temporal data are embedded in continuous space, whereas classical datasets (e.g. transactions) are often discrete. Spatial and spatio-temporal data require complex data preprocessing, transformation, data mining, and post- processing techniques to extract novel, useful, and understandable patterns. The importance of spatial and spatio-temporal data mining is growing with the increasing incidence and importance of large geo-spatial datasets such as maps, repositories of remote-sensing images, trajectories of moving objects generated by mobile devices, etc. Applications include Mobile-commerce industry (location-based services), climatologically effects of El Nino, land- use classification and global change using satellite imagery, finding crime hot spots, local instability in traffic, migration of birds, fishing control, pedestrian behavior analysis, and so on. Thus, new methods are needed to analyze spatial and spatio-temporal data to extract interesting, useful, and non-trivial patterns. The main goal of this tutorial is to disseminate this research field, giving an overview of the current state of the art and the main methodologies and algorithms for spatial and spatio-temporal data mining. This tutorial is directed to researches and practitioners, experts in data mining, analysts of spatial and spatio-temporal data, as well as knowledge engineers and domain experts from different application areas.

  • Application of data mining in traffic management: Case of city of Isfahan

    This paper describes the work investigating the application of data mining tools to aid in the development of traffic signal timing plans. A case study was conducted to illustrate that the use of hierarchical cluster analysis. This approach can be used for designing of a TOD signal control system, since it automatically identifies time-of-day (TOD) intervals using the historical collected data. The cluster analysis approach is able to utilize a high- resolution system state definition that takes full advantage of the extensive set of sensors deployed in a traffic signal system and cluster validation supports the hypotheses presented. The results of this research indicate that advanced data mining techniques hold high potential to provide automated signal control techniques.

  • Domain Driven Data Mining (D3M)

    In deploying data mining into the real-world business, we have to cater for business scenarios, organizational factors, user preferences and business needs. However, the current data mining algorithms and tools often stop at the delivery of patterns satisfying expected technical interestingness. Business people are not informed about how and what to do to take over the technical deliverables. The gap between academia and business has seriously affected the widespread employment of advanced data mining techniques in greatly promoting enterprise operational quality and productivity. To narrow down the gap, cater for realworld factors relevant to data mining, and make data mining workable in supporting decision-making actions in the real world, we propose the methodology of domain driven data mining (D<sup>3</sup>M for short). D<sup>3</sup>M aims to construct next-generation methodologies, techniques and tools for a possible paradigm shift from data-centered hidden pattern mining to domain-driven actionable knowledge delivery. In this talk, we address the concept map of D<sup>3</sup>M, theoretical underpinnings, several general and flexible frameworks, research issues, possible directions, application areas etc. related to D<sup>3</sup>M. Real-world case studies in financial data mining and social security mining are demonstrated to show the effectiveness and applicability of D<sup>3</sup>M in both research and development of real- world challenging problems.

  • CAKE – Classifying, Associating and Knowledge DiscovEry - An Approach for Distributed Data Mining (DDM) Using PArallel Data Mining Agents (PADMAs)

    This paper accentuate an approach of implementing distributed data mining (DDM) using multi-agent system (MAS) technology, and proposes a data mining technique of ldquoCAKErdquo (classifying, associating & knowledge discovery). The architecture is based on centralized parallel data mining agents (PADMAs). Data mining is part of a word, which has been recently introduced known as BI or business intelligence. The need is to derive knowledge out of the abstract data. The process is difficult, complex, time consuming and resource starving. These highlighted problems addressed in the proposed model. The model architecture is distributed, uses knowledge-driven mining technique and flexible enough to work on any data warehouse, which will help to overcome these problems. Good knowledge of data, meta-data and business domain is required for defining rules for data mining. Taking into consideration that the data and data warehouse has already gone through the necessary processes and ready for data mining.

  • Seventh IEEE International Conference on Data Mining Workshops - Title

    The following topics are dealt with: data mining in Web 2.0 environment; knowledge-discovery from multimedia data and multimedia applications; mining and management of biological data; data mining in medicine; optimization-based data mining techniques; high performance data mining; mining graphs and complex structures; data mining on uncertain data; data streaming mining and management; spatial and spatio-temporal data mining.

  • Developing an Integrated Time-Series Data Mining Environment for Medical Data Mining

    In this paper, we present an integrated time-series data mining environment for medical data mining. Medical time-series data mining is one of key issues to get useful clinical knowledge from medical databases. However, users often face difficulties during such medical time-series data mining process for data preprocessing method selection/construction, mining algorithm selection, and post-processing to refine the data mining process as shown in other data mining processes. To get more valuable rules for medical experts from a time- series data mining process, we have designed an environment which integrates time- series pattern extraction methods, rule induction methods and rule evaluation methods with visual human-system interface. After implementing this environment, we have done a case study to mine time- series rules from blood/urine biochemical test database on chronic hepatitis patients. The result shows the availability to find out valuable clinical course rules based on time-series pattern extraction. Furthermore, we compared the difference of time-series pattern extraction methods with objective rule evaluation results.

  • Bibliometric Analysis of Data Mining in the Chinese Social Science Circle

    In this paper, papers about data mining recorded by CSSCI (1998~2007) are collected and analyzed with statistical analysis and bibliometric analysis such as year distribution, journal distribution, subject distribution, the core author and the geographical distribution of the author. So we can identify the core author, core journals, research institutes and the law of research on data mining in the Chinese social science circle, and reveal the review of data mining study and the main theme in the Chinese social science circle. In collusion, this paper indicates some problems and trends about study on data mining.

  • i-Analyst: An Agent-Based Distributed Data Mining Platform

    User-friendliness and performance are important properties of data mining and analysis tools. In this demo, we introduced an agent-based distributed data mining platform that allows users to manage and share the data-mining-related resources conveniently. Furthermore, the platform employs agents for workflow enactment in which the performance is improved with agent abilities. We also present an example to illustrate how the platform works in distributed environment. The performance is relatively competitive with non-agent approach when data is highly distributed and large.

  • Application of Data Mining in Higher Secondary Directorate of Kerala

    In this paper, we discuss Data Mining and its application in Higher Secondary Directorate of Kerala. Data Mining process has a set of functionalities among which classification has wide application in real world data processing. We examine the Naïve Bayes classification techniques. In the third section, we explain Naïve Bayes Theorem using an experiment. This experiment covers attributes like School Type, Candidate Type, Study Type, Districts, etc. These attribute values, we are using to analysing the result of Higher Secondary First Year Improvement Examination held in September 2015. This will improve the performance and data processing speed of Higher Education Directorate. This paper demonstrates the application of Data Mining in Higher Secondary Examination result. This will help further research and will improvise the activity of Higher Secondary Directorate.

  • Research of GIS-based Spatial Data Mining Model

    In this paper, the theories of spatial data mining and geographic information system are described firstly, and the integration model of the spatial data mining is also researched and analyzed in-depth. On the basis of this, a new GIS system structure based on the spatial data mining is presented, which has the advantages of good universality, interaction and easy realization comparing with other structures.



Standards related to Data Mining

Back to Top

No standards are currently tagged "Data Mining"