Steve Elliott, Ventyx, an ABB company, UK, establishes how process safety management can provide reliable and efficient offshore operations.
An oil spill in the Gulf of Mexico, a pipeline break in Alaska, a chemical processing plant explosion in Japan – one could be forgiven for thinking that the oil and gas industry is never far from the headlines for all the wrong reasons. According to the International Association of Oil & Gas Producers’ safety database, 40 of the 88 worker fatalities reported in 2012 were identified as being related to process safety events.
Not surprisingly today, companies across the supply chain are facing tremendous pressure to improve their process safety records and reduce risk, especially as they take on new challenges, such as drilling in ultra-deepwater or the Arctic, necessitated by the loss of access to easy reserves.
Figure 1. Swiss cheese model of accident causation.
Increasingly, it is also industry leadership that comes under the spotlight. In 2012, the Organisation for Economic Co-operation and Development (OECD) published new guidance on corporate governance. If reinforced, the notion that process safety leadership from the very top of the organisation is essential for successful sustainable business performance.
Offshore staff challenges
For oil and gas executives, the need for a proven path to process safety has never been greater. At the same time, increasingly complex operations, rising production costs, decreasing refining margins, heightened regulatory scrutiny, ageing infrastructure and a lack of skilled workers are generating more challenges than ever.
The oil and gas industry has been aware of the issues of staff turnover for some time. The cause is not just ageing ‘baby boomers,’ but a more fundamental change in working trends. On one hand, market forces are driving organisations to explore and produce under increasingly challenging conditions; on the other, the tacit, expert knowledge required to ensure safe and reliable operations is being lost as an ageing workforce is being replaced by a highly mobile younger generation.
To combat the loss of knowledge retained by people, intelligent systems are needed that are not just electronic filing cabinets, but capture knowledge in use and deliver the right information, to the right person, at the right time, as part of day-to-day operations. When such systems are used by both seasoned staff and new employees, operational knowledge is captured and preserved. In times of transitions, abnormal conditions or opportunities for improvement, decades of information and tacit knowledge is readily available to inform the decisions of supervisors, operators and engineers. Such a system can be thought of as a living, breathing application, but it is not the stuff of sci-fi films either. Companies at the forefront are already making a step change in the right direction.
Cascading causes of process safety incidents
Numerous studies have concluded that the majority of safety incidents never result from any one thing – it is never a single failure, caused by a single individual, caused by a single piece of equipment, at a single point in time. Often incidents occur during non-routine activities, past history and lessons learned are forgotten or ignored, near miss follow up processes are missing or at best, inadequate or poor, or the impact of change(s) may be hidden for a long time. Indeed, often the people making changes are not affected.
In simple terms, the root cause of the failure is the failure to maintain the design intent.
Operating discipline is typically at fault, since these errors spring primarily from defects and deficiencies in following operating and maintenance procedures and in applying necessary administrative controls to ensure competency, good communications, performance measurement and change management. For example, work instructions that are incomplete, inaccessible or not readable can lead to inconsistent execution of procedures; poor communication can cause incomplete worker-to-worker handoffs; stress and excessive fatigue can impair decision-making and contribute to procedural missteps; and poor human-machine interfacing can lead to operator confusion and delayed responses.
Many companies attempt to address this situation through the implementation of safeguards or barriers based on the so-called ‘Swiss cheese’ model of accident causation used in risk management. The idea is to reduce the likelihood of an incident occurring, or to reduce the impact if an incident does occur. However, the problem is that gaps inevitably start to appear after these safeguards and barriers are introduced into operations thus reducing their effectiveness.
There are three main contributors to this deterioration:
- The passing of time. Just because a safety incident has not occurred for some time does not mean that all is well. If assets are poorly maintained and operating processes not regularly checked for safety effectiveness, they will eventually stop providing the level of risk reduction they were originally designed to provide.
- Covert risk. Risk has a propensity to emerge from the least expected places. The inability to visualise where the risks are and where the next incident is coming from is an open door to disaster.
- Complacency. “This is how we always do it.” – “Don’t fix it if it isn’t broken.” Such statements are commonplace. As days, months and years pass without incidents, and when you cannot see where incidents are in the making, it is all too easy to become complacent. That is when poor habits can infiltrate processes and procedures and over time become the new norm.
All told, the Swiss cheese model as it stands is too porous and static an approach to achieve the next safety performance breakthrough. Instead, one must to look to the connectedness of the industrial internet of things, people and services and address the process safety challenge as a holistic system integration problem in order to make meaningful progress.
Managing operational risk: static versus dynamic
Risk is not static. Safeguards can develop faults, or they can be down for maintenance. New and different activities may be taking place on the installation. Organisations, people, resources and logistics can easily and quickly change. In short, nothing should be taken for granted as things can change. Rather, organisations must bolster safeguards and barriers in a dynamic fashion through monitoring leading indicators of increased vulnerability to process safety incidents on a day-to-day basis.
A daily review of cumulative risk enables organisations to digest new information coming from operations and understand its impact. Where are breaks in the plan that will defer planned work? What defective equipment has been found that cannot be repaired immediately? Who is unexpectedly off the job today? The accumulated information can be used to fuel operational risk assessments and safety critical risk assessments, leading to decisions on whether to shut down or to take compensating measures.
Figure 2. Transforming data to actionable information.
While this provides a reliable basis for operational decision making and control, and ensures that levels of cumulative risk remain tolerable, this process is still highly reliant on human expertise to float key issues to the surface. The key step to getting real time visibility into conditions is to leverage the convergence of OT and IT and get ahead of incidents through real time risk monitoring, analysis and advising.
Minimising operational risk: post-mortem versus proactive
Traditional approaches to process safety management have leaned heavily on post-mortem analysis. What went wrong? What can be learned from this? Although post-mortems are useful, to continue to slow the pace of incidents and to decrease them in severity, organisations need to start asking ‘how can we be more proactive in addressing operational risk?’ The answer lies in the integration of operational and information technologies.
The clearest way to be more proactive is to integrate a production facility’s wealth of real time operational data (e.g. instrumentation and control data) with technology for dynamic risk analysis. When done comprehensively, this can result in a real time dynamic risk advisory capacity for monitoring and immediately alerting personnel to changing and developing risks, and for providing an optimal course of action to maintain the integrity of the facility.
Accessing and amalgamating all the required data is challenging because it resides everywhere, from operations, maintenance and automation systems to log books, operator rounds, mechanical inspections and lock out/tag out applications and databases. And, it is being created and changed constantly. It needs to be pulled from disparate systems, validated, transformed, integrated and contextualised in order for it to drive actionable intelligence.
Ventyx, an enterprise software provider for asset-intensive industries, and its parent company ABB have been developing technologies to carry out the heavy lifting required to integrate and analyse all of this information in real time, comparing assets’ operation and maintenance performance against the basis of safety and alerting personnel to any deviations from safety constraints.
The proactive identification of issues can be further strengthened through the use of visualisation technologies on top of the analytics. Analytic technology can calculate the change in risk dynamically (e.g., weighing the consequences of an uncompleted operator round or a non-implemented proof test on a critical device) and update a facility’s risk matrix in real time with the revised impact. Visualisation technology can then help focus people’s attention on the change in risk using geospatial representation and colour-coded graphics for impacted facilities, units and equipment.
Single window of operational integrity
The new imperative for offshore operators is knowing with certainty where the next incident will come from, and being able to avoid it, or at least minimise its effects. Regulatory agencies, shareholders and employees and their families demand this.
It will require the ability to:
- Reduce operational risk – by unlocking data for real time insights and enterprise level visibility.
- Take the long view – by using enterprise-level benchmarking and trend analyses to guide more effective process safety strategies and keep incidents at bay.
- Trust the data – by leveraging validated data, whether it is mobile data, or data created automatically by instrumentation and systems.
- Receive instantaneous alerts on developing issues – through the deployment of visualisation and alarm/alerting technologies on top of real time analytic applications.
- Push data out and pull it in from the point of work or the process edge – through the use of mobile and cloud technologies.
- Real time detection and automated response to abnormal conditions – be they partial process impairment or emergency shutdown events.
- Have a single window into its operational integrity – powered by dynamic risk analysis and leveraging existing risk data, automation systems, maintenance and operations procedures, and business systems for planning, scheduling and cost control.
When all these abilities fall into place, a company will be able to capitalise in new ways on existing infrastructure and partake in the long-needed breakthrough in process safety performance. A paradigm shift in process safety performance is achieved through integration of real time information, best operating practices and closed loop control.
A glimpse into the future
While risk can never be eliminated, it can be better managed and reduced. There is no denying that putting operational integrity systems in place industry-wide will require real focus and investment. Safety, operations and other disciplines will need to come together to obtain results that will increase productivity and improve safety. To reap real benefits, these systems cannot be confined to small, tactical projects – they need to encompass all areas of the operation.
One thing is certain: these systems will become increasingly mainstream as more early adopter companies demonstrate their real benefits, not simply as an ‘insurance policy’ against incidents, but to the bottom line. To be sure, such intelligent knowledge systems are ripe with potential for enabling tangible improvements in throughput, through the optimisation of maintenance and operations activities. As further observation cements this as universal industry truth, it is those companies that took the leap first that will reap the most benefits.
Adapted by David Bizley