Did you ever try to look up papers on a specific topic but get overwhelmed by bunches of semi-related or non-related papers? The key problem here is “we are drowning in information, but starved for knowledge.” Information is not always useful. The knowledge behind it is what we really need. We may have tons of information, but cannot make any decision. The reason is simple. We don’t know how to use it since it becomes too large and complex for us to process.

Fortunately, we have data mining! The mission of data mining is to discover knowledge from oceans of information. it gives us a data-driven approach to survive in an information explosion era. Basically, data mining is a computational process of discovering knowledge in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. As long as we feed the machine running data mining algorithms with sufficient data, it may learn knowledge from data by itself.

One of the most attractive features of data mining is its ability of prediction. It helps us learn from the past, and predict the future. Basically, to build a prediction model we need a training dataset. Once the model has been trained, we assess its performance over a separate testing dataset. Finally, if everything goes well, the model can be deployed to predict some unknown properties of future data such as classes, similarities, relationships, etc. With data mining, we may not hurry to understand the mechanism of a flu in order to predict how it spreads. We may not need to explicitly make a survey on customers to recommend them best-fit books. We may easily detect a bank fraud from billions of transactions. We will never image we can do these crazy things before we have data mining.

In summary, data mining is aimed at discovering knowledge from information which is exactly what we badly need in this age.