OPINION23 August 2017

Why complex modelling is rubbish

Data analytics Opinion UK

In his second analytics blog, Ryan Howard argues that simpler models may be better than more complicated ones where insight is concerned.

I remember my first data model. I used LISREL. You start with a theoretical notion of how things ought to work and then you overlay data onto this blueprint. From here, you delve into the inter-relatedness of things, transported to a world where theory and data fit hand in glove.

Statisticians love Structural Equation Modelling (SEM). For most of us, it is where we fell in love with statistics. I’m certain that if there were an all-knowing omnipotent being, he, she or they would surely see the world as one big SEM. I wanted to build ever bigger and nuanced models, for I was in awe of its purity, logic and elegance. It was a dream from which I was so rudely awoken.

Within the realms of Employee Engagement Research, clean data and a set of widely accepted and strong theoretical underpinnings find SEM an evergreen technique. It however has become nothing more than an academic pursuit within wider market research, primarily for these two reasons:

  • survey data is unruly
  • SEM requires too much time, interpretation and creativity.

Fitting a large model onto survey responses is unforgiving. Pieces do not fall into place. Plainly put, while some adjacent pieces might lock together, the entire puzzle often refuses to sit on the same table.  Multicollinearity prevents this from happening. Results are not repeatable.  So wrapping the various elements of a model together requires some ‘latitude’. That is, modelling requires a series of judgement calls. This means that two individuals with the same data run the risk of producing different solutions depending on their personal approach, experience and interpretation. This is problematic, if not, commercially intolerable.

When modelling to make predictions, it does not matter what flavour of arcane magic is conjured. The model just needs to perform reasonably well without over-fitting. Beyond this, no one cares (nor should they). However, modelling to extract customer insight is about explanation and sector expertise. The audience demands an unquestionable trust in the story your model offers, before taking decisions. This is a challenge because a model is nothing more than a summary of what data might be alluding to, a step removed from the cold hard figures, which otherwise would be plain for all to see in cross tabulations.

Data modelling has impact when it mimics intuitive management-consulting frameworks such as those peddled by the big five. These frameworks might appear to be SEM-ish but with one exception: they are always popular. They explain behaviour in sharp and simple terms; resonate instantly without assumption nor preamble and pasted into PowerPoint sans caveat. There is a lesson here.

Complexity does not sell ideas.

Should you accept this, go now. Run. Throw away your large, complex models – particularly when standalone regression or correlation will do. Separate your problem into its component parts. Demonstrate how each element works. The more your model appears to be a commodity, the more comfortable your audience is. Yes, it is a sad, bitter pill.

Fear not for there are several upsides. Your recommendations will be accepted readily and your workings less daunting to the casual observer. If required, your general approach will be easier to replicate the next time around. Consistency is good and just in itself, a massive time saver – but here is the real kicker and it makes for a humbling yet not surprising revelation: as SEM draws on correlation, a big complex model’s results, once disentangled, are similar. At bird’s-eye view, identical.

If you are hot on your Greek symbols, love equations that peel off the page and willing to nosedive into the concrete pool of SEM, then this is a challenge I strongly recommend. Your stats skills will explode. However, if you find yourself in a commercial setting, transparency and simplicity will win every single time.

Ryan Howard is director advanced analytics at Simpson Carpenter