Stop focusing on Data: Why data is not the new oil!


Data is hugely important, but, it’s too easy to say that the answer is simply ‘big data’ and once we have enough of it, everything else will just work out.      

Data is not oil.

People keep referring to data as the new oil in phrases like ‘It’s the oil of the information economy’ and ‘data is the fuel of the future’.  Clive Humby, the UK Mathematician and architect behind TESCO’s club card, first used this analogy in 2006 and many prominent people have used it since.  For the most part it did what they needed, to emphasise the seismic shift in commerce that the proliferated access to data is having.     

But in some organisations, this analogy is also reinforcing the notion that only people who have the technical knowledge and industrial size infrastructure can make any meaningful use of data.  Perhaps this is why the data giants like Facebook, Google, Amazon etc. are all profiting from the data that their consumers produce without any demand for remuneration? 

I think there are several very important differences between data and oil;

Oil is oil.  Oil, by which I mean unprocessed naturally occurring crude petroleum oil, has similar properties wherever it is found.  Whether it is drilled from the deserts from the Middle East or extracted from the tar sands of Alberta, oil is oil.  Data on the other hand is by its very nature varied in both form and function, it is both created and consumed constantly by everyone and everything.   

Oil has a standard value. Well almost, there are several benchmark prices for oil.  Producers, brokers and consumers trade oil based on these benchmarks.  Changes in the price are therefore driven by nothing more than market forces effects on supply and demand.  Whereas the latent value of data is based on the speculative or realised value that it adds to decision making.  Data has no standard price, it also in a lot of cases, has no clear ‘owner’ and this challenges the very idea that it is a commodity to be traded.

Oil is a consumable commodity.  Oil has two major uses, either to lubricate moving parts or to be burned to release its chemical energy.  In both these cases it cannot be replicated, or used more than once or twice. The value of oil is partially driven by the fact it is a consumable with limited supply.  Data on the other hand is infinite in its supply, can be replicated and shared and its latent value can be realised repeatedly and (if it’s not lost or damaged) indefinitely.  

The more oil you have the less it is worth.  Ignoring the occasional political motivations, oil suppliers carefully balance how quickly or slowly they pump their wells so that they don’t over supply the market.  When oil is plentiful the price drops.  With some data, the opposite is true, aggregation of large volumes increases its utility and therefore its value.  This is the very principle behind ‘big data’. 

Be a geologist, not an oil baron!

Part of the fundamental argument for viewing data like oil is the premise of needing to 'refine' data into something useable.  This is akin to the refining of crude oil into petrol.

But to view data like this potentially falls into the trap of assuming that by refining it, all data will become useful.  This mind-set means that vast amounts of data are often collected, processed and then disposed of without ever actually being used to provide insights or  make decisions. 

An alternative approach is to start with the insight you want.  Ask yourself what information you need and then to match the sensors to the signatures to collect only the required data to test.  The irony here is this actually how oil companies find the black stuff.

Rather than gathering all the data we can, in huge data storage tanks like some sort of data oil baron, we need to be hunting for the information we need like an exploratory geologist.  Only looking for the data we’re going to use. 

Admittedly, this does often require lots of data as there are sometimes things you can do things at a large scale that you can't do at a small scale.  But, the decision to use large amounts of data should be deliberate.  Big Data is about applying mathematics to vast quantities of data to infer insights from probabilities, correlations and clusters.  Large sample sizes also allow for increased tolerance for inaccuracy.  For example, we don't measure an entire country’s GDP down to the nearest penny.    

But saying that large amounts of data can be useful, is not the same as saying all data is useful when it is large.  Amazon for example, can recommend products you like without needing to know why you like them, just by using correlations in their vast amounts of data.  But if you already had a specific product in mind, sifting through all of Amazon’s data without the ability to use search criteria to reduce the data, would be a daunting task. 

Big data will change the world we live in and make many decisions we make today easier (if not removing them completely).  However, at the micro level, in the real physical world there will always be a need for human decision making. 

When we stop focusing on data and start focusing on the insights we want from it, we can ask the right questions. If we then combine then scientific process of deduction with the power of data analytics on a huge scale, we can do amazing things.  

If we just start with large amounts of data and expect it to provide insights, we are more than likely going to be overwhelmed and end up drowning in oil...

Err…I mean data.


Written by Gareth Tennant

Founder & director Decision Advanatge