Sunday, 23 October 2016

Just how BIG is ‘big data’? - the answer is blowin’ in the wind

I read with interest the mixed views on Bob Dylan being awarded the 2016 Nobel Prize for Literature the other week. He was cited as bringing new poetic expressions to the great US song tradition. For many he was a Marmite type of performer. People either liked him or they didn’t. His music was part of my youth and I was in the camp that liked him. His words are always elegant and eloquently constructed even if at times discovering the meaning behind his words took a bit of work. However, since 1959 he has sold over 100 million records. And that's a big number.

An equally big number is 3,000,000,000 (3bn). That’s the number of genes in the human genome, and in 2003 the last of the base pairs were finally identified. It must have been an amazing moment for those involved. 50 years before little was known about how genetic factors contribute to human disease. It was in 1953 that James Watson and Francis Crick described the double helix structure of deoxyribonucleic acid – more commonly known to most of us as DNA. DNA is the compound that holds the genetic instructions for building, running and maintaining living organisms. It was the Human Genome Project that eventually led to the cataloguing of the complete set of DNA in the human body.

The Human Genome Project provided researchers across the world with freely available data and in so doing, opened up opportunities to better understand human diseases and how we might more effectively diagnose, treat and prevent them. To date, some 1800 disease genes have been identified, and there are now more than 2000 genetic tests for human conditions. The project would not have been achieved (and 2 years before schedule and under budget) without the possibility of harnessing 'big data'. This term was first used in a 1997 paper published by NASA. Big data is both ubiquitous and increasingly readily accessible.

I am not a statistician, but I love numbers and what they can tell us. Having access to big data opens up a whole new world. Let me take you for quick stroll through some of the numbers that make up this world. More data has been created in the last 2 years than in the entire history of the human race; about 1.7 megabytes of new data will be created every second for every human being living on the planet (currently 7.4bn people); the digital universe is expected to grow to 44 Zettabytes by 2020 (a Zettabyte contains 1000 Exabyte’s – a single Exabyte can stream the entire Netflix catalogue more than 3000 times); we perform 40000 searches on Google every second (3.5bn per day, 1.2 trillion per year); Facebook users send 31.25 million messages and view 2.77 million videos every minute; there will be 6.1bn smart phone users globally by 2020; 300% of all data passes through the cloud (Google uses up to 1000 cloud networked computers in answering a single query in no more than 0.2 seconds). And if you want more, have a look at this!

What is also interesting is that only about 0.5% of all this data is ever analysed or used. It’s estimated that in the US better use of big data could save the US health care system US$ 300bn a year. Of course the problem is that most of us lack the ability to manage and interpret large data sets, and that is true for both organisations and individuals (Barrack Obama has well over 1 million Facebook friends for example).

The internationally respected management consultant group McKinsey, note that there is already a world shortage of skilled data analysts and this is a situation unlikely to improve in the short term. I work for a University, and knowledge creation and knowledge exchange is our business. I think we have a responsibility to respond to this skill gap in the future workforce. In the CBI/Pearson Education and Skills Survey published last week, most employers report being satisfied or very satisfied with their graduates attitudes, relevant work experience and skills. Satisfaction with graduates' numeracy was 91%; technical skills 88% and literacy, 86%. Maintaining these high levels of satisfaction in a rapidly changing technologically enhanced workplace is crucial.

This was something the University leadership community discussed at length at our Planning Day last week. We also noted that in a rapidly changing world, we need to use big data ourselves to more effectively deliver an intuitive learning experience for our students and in shaping our relationships with our industry partners. As Socrates said, 'the secret of change is to focus all your energy not on fighting the old, but on building the new'. As with the triumph of unlocking the secrets of DNA, understanding and using big data can help us better understand how we can more effectively do this. Without being able to use it, as Dylan nearly said, 'the answer[s] my friend, will be blowin’ in the wind'.