What’s so big about big data anyway?

Roger Downing, Data Science Group Leader at the STFC Hartree Centre, shares his thoughts on the topic of big data, where it came from and where it’s going next.

The phrase “Big Data” first entered our vocabulary more than 20 years ago in some discussions on a paper in a journal from IEEE. As a subject until recently it was relatively unknown to most people even though it had been embraced by companies such as Google and Yahoo, on which millions of people relied on a daily basis. A technology known as Map-Reduce, invented by Google and re-implemented by Yahoo as Hadoop, allowed for the first time very large amounts of data to be worked on as a whole. This coincided with a dramatic decrease in the cost of storing data and the explosion of the Internet into people’s everyday lives.

We are now at a point where data is utterly pervasive – every choice we take, every purchase we make, generates data and this data is stored somewhere. The problem now facing us is how to deal with this onslaught of data, the “data deluge”. As computing struggles to keep up with the rate of data generation in the post Moore’s Law age, we must find better and smarter ways to work with the vast amounts of information available to us. The ability to look at data holistically – to evaluate whole populations, to investigate the whole length of timelines – gives us insight we have never had access to previously. By being able to process large amounts of data, patterns emerge from chaos and signals emerge from noise. These signals can be used to help online shops personalise product recommendations, for example, or they can be used to study the causes of health inequality. Having a view over all of the data gives the best perspective for understanding the data.

What about AI?

Artificial Intelligence (AI) is viewed by the UK Government as critically important to the future of industry. The Industrial Strategy white paper released in November 2017 put AI front and centre of the charge towards the promise of Industry 4.0, with its agenda of digitalisation. AI relies upon big data in that AIs are created by training machine learning systems on very large amounts of data.  These AIs can then act as decision support systems or as expert assistants. The most advanced ones are even able to drive cars safely on public roads. All AIs though are specialised, and none are able to do anything other than the role they were created for. For this reason, AIs should not be viewed as a threat to employment but more as an assistive technology to make us all more effective. AIs promise to boost productivity, not drive down staff costs.

The Hartree Centre

The Hartree Centre is a department within the Science and Technology Facilities Council. The centre specialises in computing and its mission is to accelerate the adoption of High Performance Computing, Big Data, and Cognitive Computing by UK industry and by doing so provide competitive advantage.

Some examples of work the centre has done around Big Data and AI include:

  • Providing a more accurate model of train delays – accidents cannot be predicted but the consequences can be. This model provides better estimates for future train journey times based on network state
  • Predicting the location of archaeological remains – construction companies set aside project budget offset the risk of uncovering archaeology, a meaningful probability of discovery can help to better estimate that amount or even inform planning
  • Manufacturing plant optimisation by Pareto and root cause analysis – throughput on a production line was boosted by 10% by better understanding failure modes and how to mitigate them
  • Tackling bed blocking by predicting in-patient outcomes – knowledge of how an individual has interacted with local healthcare services can be used to predict which care pathway they will be discharged to. The forewarning this gives enables better preparation in destination services and may reduce time between end of treatment and discharge

For more information please contact: