Data Mining - Harder Than it Should Be

I’ve recently finished an MSc in Intelligent Systems at De Montfort University (I passed with distinction - thanks for asking). One module of the course was devoted to using Data Mining techniques to explore data sets. Data Mining differs from the more common analysis of data. Commonly a data set may be analysed statistically, looking for the distribution of this or calculating the rate of change in that, whereas applying data mining techniques can expose otherwise obscure patterns in the data. Or to put it another way: using statistics you are asking the data what to show, but Data Mining reveals to you what the data has.

As part of the course we used one of the major data mining products on the market. As an class of Masters Degree students, learning about the subtle intricacies of our chosen subject, the product served us well. We were able to tweak and twiddle with more switches than a 747. We had the luxury of time to run, re-run, rinse and repeat data processing over and over. I’m equally sure that for professional data mining consultants, hired by large wealthy companies to trawl vast private data repositories, these products are worth the thousands of pounds of license fees they pay.

But what about everyone else? What about the campaigns working to holding government to account over public data? What about the new-breed of data journalist exposing stories otherwise hidden in obscure facts and figures? What about the data evangelists enlightening and entertaining with wonderful visualisations? How accessible is data mining for these people? How affordable? How usable?

Not enough.

With increasingly more data being placed in the public domain, so increases the need for tools to explore this data. Data Mining tools are out there, but are no-where-near as accessible as the data that requires them.

It is with this in mind that I have begun to develop an accessible, usable and intuitive suite of tools to explore data sets. The first of these, Clusterbomb, will mine data sets to expose the hidden clusters of records buried within. I’m expecting to have an alpha release out early next year, but for now sign up to the mailing list to keep an eye on developments. Clusterbomb is also on Twitter and Facebook.

❝To be ahead of the rest, you need to see more than they do❞