Analytics: Signal Detection – A Lyrical Digression

Instead of going through another data mining exercise, which would make the rubric look somewhat monotonous, I thought to offer a lyrical digression (I’ll go soft on you, dear reader, this week. But next – I’m back full force.)

Often times, business analysts vouch their job is to ‘find patterns in the data’. Unfortunately, the patterns (i.e. linear, geometric, exponential, etc) are rather rare, most of the time the projects are a bunch of mundane reconciliations, vlookups, crosswalks requiring standardized reporting tools. A typical day in life of a financial data analyst consists of looking at how data points deviated from expected (already known) rules of business. Say I have a report which shows funky looking percent distribution of customers across product lines. To detect this, you of course have to know the ordinary ‘healthy’ looking percent distribution. Knowing that, you deduce something is wrong with the numbers you’re looking at by visual comparison. So, effectively, you haven’t detected a new pattern – you noticed something deviated from a ‘pattern’ (e.g. distribution) already known.

But, what happens when the goal is to discover completely new dependencies, totally unexpected, never thought of before? Unfortunately, this is not your typical analyst’s thing to do. I believe that with automation of reporting (especially given latest developments in business intelligence), there will be less and less need for these ordinary reconciliations which require Excel/Access acrobatics along with good doze of common sense; more energy will be dedicated to the task of discovering genuinely novel features in the data. This is what the impending ‘big data’ revolution is all about: actual hidden pattern discovery, not summarizing trends and drilling down on dimensions, telling business users something they already (most likely) had a good idea about anyway.

The Internet is blooming with books, blogs, tutorials, advices by enthusiasts and professionals alike on how to perform analytics and data mining. At the present, it seems there is no magic bullet to this task – no one algorithm will ‘figure out’ what the hidden dependency looks like unless it has been ‘coached’ i.e. human user preconceived of what a pattern may look like and the algorithm is there only to confirm or invalidate the idea. The conception of an idea is the creative, value adding moment, technical implementation is a corollary. There are many well-accepted ways to organize business data, aggregate it and pretty it up for reporting purposes; very few ways to discover genuinely unexpected rules working with context-free data.

Of course, the entire premise of data mining rests on belief that data does contain hidden patterns. Corporate data repositories are now hailed as silicone gold mines: few people, it seems, doubt existence of dependencies – for some reason, corporations and institutions simply ‘accepted’ the notion of ‘hidden patterns’ in their data. How really ‘hidden’ they are and how much of ‘patterns’ they represent are two points of interest to me.

So, in order to make you, dear reader of this rubric, go through a genuine exercise in data mining, I ensure that my datasets do contain really hidden, mathematically robust patterns. A few posts earlier, I actually described an interview experience I had with a tech company where I was asked to find such a pattern without even knowing it existed.

Next time in this rubric, will dissect the last dataset I provided assuming the signal was genuinely randomly distributed.

Advertisements

~ by Monsi.Terdex on February 8, 2013.

One Response to “Analytics: Signal Detection – A Lyrical Digression”

  1. Yfnf nf f xxjyjn pdpbfsl es ubfhr wqdfyguhhsatr.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Normal Boy

Nothing out of the ordinary

Data Engineering Blog

Compare different philosophies, approaches and tools for Analytics.

%d bloggers like this: