CompTIA Security+ Chapter 5: Risk Management

2020-09-20

Table of contents

Sections
5.1 Organisational Security Policies, Plans and Policies
5.2 Concepts from a Business Impact Analysis
5.3 Risk Management Processes and Concepts
5.4 Incident Response
5.5 Digital Forensics
5.6 Disaster Recovery and Continuity of Operations
5.7 Control Types
5.8 Data Security and Privacy Practices

5.1 Organisational Security Policies, Plans and Policies

As we bring more people to join our workforce it will become clearer that we need to specify a set of procedures, a way of doing things , that offshores the mountain of questions that new employees have, that clients will have, and these procedures should smoothen out the banal business functions that need to happen to get to the more creative processes i.e when we are designing new chips, the people that work in the factory always need to have protective clothing (procedure) to comply with manufacturing standards (policy). These procedures make sure that when these auditors come round and inspect our factory, we are always in protective gear, as we wouldn’t let employees in otherwise - and so demonstrating our fulfilment of their standard.

These underlying procedures are called the Standard Operating Procedures (SOP), and they are meant to encompass a set of steps that are in line with company policies (internal policies) and regulatory compliance (external policies). For example, we need to ensure that documentation is completed when a new hire’s account is being made.

A whole business shouldn’t be entangled in a web of procedures, they should only constitute the actions one must take before engaging in new creative activities, but time and again you’ll see businesses become too corporate and die out.

Agreements

Now then, aside from relationships between organisations - and employees to organisations - that are imposed on them, let’s talk about a mutual imposition of procedures, which are agreements. There are many reasons why an organisation might turn to a third party for assistance, most common is that the other provides a service that would be of great assistance , it might assist in creating slicker SOP’s, it might assist in Research and Development, it might assist the current products/services we already have etc. Most businesses are a bundle of third-parties, these are the creative upholstery that establish business functions in the first place. Without something like Windows , Linux etc could you imagine each company having to build their own OS ? That would be madness. Linux is in fact the solution to tailored business solutions, as it provides all the essentials someone may need, which makes it very attractive to tinkerers, and businesses that are inclined to make their own OSs like Security agencies , Car manufacturers etc rely on these tools.

In most cases, data is being transferred back and forth between the two organisations, what comes with the mutual agreement (to even agree with the third-party in the first place) is the knowledge of what we are communicating, what they store from us, how they manage and protect our data and what controls are in place during our use of said product.

Agreement Types

Between the two parties, if either of them is providing a service, as often is between two businesses, then a Service Level Agreement (SLA) should be one of the top level terms and conditions, which state the given level of service … In the case of web hosting, depending on the agreement reached , the SLA would define things like uptime, responses times, maximum storage. A very simple example of an SLA would be things the policy differences between Windows Home, Windows Enterprise, Windows Pro etc, you get different levels of service for your choice.

Now onto something called a Business Partners Agreement (BPA). This is where the terms between two corporations is defined, and whom we assume have an interest in improving the state of themselves as well as each other, because of some sort of dependency on the other. The BPA should outline the roles (procedures) that either side should carry out and specify what their responsibilities are. This could include which organisation manages which service(s) etc. An example isn’t usually between businesses in the same exact market, as they have natural rivalry and competitiveness against each other, but you usually see this between one business who is heavily tied with another market and vice versa. Take Google and one of their many hundreds of companies that they have bought out, like Fitbit. Fitbit is interested in partnering with Google and integrating their products into their wearables, and Google is interested in the Fitbit technology for how it can adapt and - essentially - make them money. A BPA between these two should cover at least the following:

Contributions to the partnership. What APIs will Fitbit create that help Google integrate their products, and what will Google bring to the table? Maybe they assist with manufacturing and the software, but then this tilts the power in the agreement. In this case Google did buy Fitbit, so they are perfectly happy to execute on the majority of the contributions.
Allocation of profits and loses. After a massive sale , who would get the share, does it go directly to Google or does the company still have some financial independence?
Decision making, what departments have the power over certain issues, how controlling is one party over another?

You can find a great report covering BPAs here:

This will most likely trip you up in the exam, as it would’ve done to me had it not come up in a practice test, but if you see a question that says something along the lines of:

Company A and Company B decide to bid merge funds and bid for a contract together; however, both agree that the proprietary data they disclose to each other, should never be leaked by either party. Now this second bit flips the script, as all along I’m reading questions like these and thinking - “BPA!”. But in fact, BPA’s deal with internal business structure and continuity, and not with outside matters and disclosure of information. For an agreement to fulfil this option, the BEST option remember , this would be a Non-Disclosure Agreement (NDA). NDAs are supposed to protect proprietary information from being disclosed outside of the specified environment , which may be within the workplace or even within a single meeting. It contractually binds anyone who signs it to not release data , trade secrets or what have you, which is just what we need for a situation like this.

Next up is a more security oriented agreement, which is the Interconnection Security Agreement (ISA). It is mostly employed by corporations which juggle highly sensitive data, and by governments. When entities such as this wish to communicate with another corporation or government, procedure states that at every connection between the two there should be a degree of security controls put in place. An example would be running a folder through antivirus before sending, attaching a hash of the contents, so that the receiver can thus do the same and check the two strings, stuff like that. An ISA can specify things like the establishing of a connection, and what encryption methods it expects to see - like TLS , any authentication protocols , firewalls etc.

Next up are the Memorandums of Understanding (MOU) and the Memorandums of Agreement (MOA). An MOU is a document that basically highlights where two businesses are in terms of their goals, plans and understandings in regards to business prospects, a particular project etc. It is meant to clarify where the two are at, but it is no way legally binding, and the positions that the two parties hold in the document can freely change, and neither party is forced to sign the document. This is what differentiates it from an MOA which is where designations of responsibility are assigned to each party, and how the goal is meant to be accomplished is starting to be figured out. It is a legally binding contract, signed by both parties, which expects the parties involved to meet the minimum standard of cooperation set in the MOA. Don’t go too overboard with this liturgy though, as it doesn’t have the weight of a full contract, it only begins to understand the complexity of the problem, much like assigning children rooms in the house, but without assembling the furniture inside - a full contract would go into a lot more detail of the procedures to complete this project.

Personnel Management Policies

When we’re looking to hire new staff, if they are to sit on a position with at least some responsibility, the employer will probably conduct a background check for every candidate. This is so that anything from their past which might make them a bad choice for the company is known, and so there is no risk of hiring someone who got arrested for stealing money working in HSBC. Common background checks include credit reports and enforcement inquires, later in the interview stage there will also be references - by former employers and such to gauge a candidate’s character. Once we find the right person for the job, it’s time to bring them on board, and this involves making them a new account , filing paper work, getting them a lanyard, bringing them up to speed on our SOPs , policies and what standards of compliance the business must meet. This period of integration may include a formal training program.

Once an employee is with the team, we need to run what is formally known as role-based security awareness training, this is just a fancy way of saying that whatever level of responsibility the employee has been given, we give a proportionate amount of security training. In a typical corporation, or even focusing in on a single system , you have a generic hierarchy of responsibility:

The Data Owner. This person is ultimately responsible for the security of the data, he or she doesn’t actively manage the data itself and run checks on the system that houses the data, but if there were to be a breach, the Data Owner would get the blame. This role is usually a CTO or even CEO role.
System admin. This person enables the use of the system to the outside world, to collect data in the first place, but as they open up the pipeline, they best be prepared to manage and close it, and it is the role of a system admin to oversee the current state of the system, change configurations if need be to ensure that things run smoothly.
System owner. This person oversees the system administrator, to see that it complies with the Data Owner’s requirements, and any international standards.
Outside of roles intimately connected to the system, there are the users. People who perform business functions with the data in mind, this could be the marketing team, it could be a data scientist running regressions , it could be privileged users which have a degree of power, and can make changes to the database and assist in its upkeep. Then there are the executive users, which is akin to the manager in a department (the Head of Sales/ Marketing, CEO/CTO,CFO etc…)

Part of this training is getting the employee to understand our Acceptable Use Policies (AUPs) , which are essentially just impositions on resources and systems and corporate etiquette. The AUP outlines the ways the business network and systems should be used - hence Acceptable Use. Part of the AUP will probably contain some security policies, like locking down your office drawers, not leaving any keys or personal devices out on the desk. Outside of this would be things like whether you’re allowed to view social media at work, access your personal email etc - this all depends on the intensity of the corporation, and indeed their responsibility i.e You probably don’t want government officials accessing their personal email on site - I’m looking at you Hillary.

Aside from security policies and acceptable use, there are many general personnel management policies in place as part of a subset of rules to help secure our organisation. The clean-desk policy directs personnel to keep their areas organised and free of paper. This helps to reduce threats of data theft or the inadvertent disclosure of information, don’t keep information about the system, or PID (personal identifiers like lanyards) out. Going back a chapter when I talked about password expiration and history, so when the time limit on a password goes up, and the history settings prevent you from reusing it, then you have to make notes of the new ones you would be using. But this would be great for an insider as they can just walk by and pounce on your sticky notes !

Aside from this, and depending on the nature of the company , employees might do job rotations every six months, as rotating the workforce helps to identify weaknesses and nuances in process that either a) prohibit beginners from understanding b) are not efficient, and a fresh pair of eyes sees the problem c) using skills in other sectors to make new and exciting products. Job rotation only really helps to train employees in departments which have many self-similarities, so like a design agency may have a lot of teams working on different projects, and a group who typically specialise in one style may hop over and experiment in another. A side effect of rotating is that it pulls out all the cobwebs people leave behind in their own department, and if someone is up to something nefarious like fraud, then the next group that comes in will notice this and , hopefully, report it. Another thing to help prevent fraud is the separation of duties i.e the need for two employees to enter their keys in at the same time to unlock the safe, the need for someone’s signature on another person’s legal documents. Separation of duties doesn’t have to be an insurance policy though, it just spreads the accountability over the whole group. You will often times see a question that will ask you how to stop one employee from having too much power, and you will need to look for the answer that includes that employee, as well as one or more members of staff to complete a particular operation - this being done due to the sensitivity and the “room for insider exploitation”. Job rotation isn’t the right thing to put , as whilst it’s similar we want to include this employee with the task, but to have their job functions halved. For groups that rotate onto this, depending on how long the rotation is, they could begin to abuse the power i.e signing checks or running invoices through.

On the topic of insider exploitation, it may be that if you’re in the financial industry you have heard of mandatory vacations. Now you might be thinking - “Oh , the company cares and respects me !” . Well , not really. They force you to take a string of five days off, so they can see how the business functions without you in it, to see if you’re conducting any fraud , and they have time to check your office and run a user audit. If it turns out that you’re not involved in any embezzlement , then great ! But if not, you may come back to the office having to answer a fat folder of evidence. Knowing that you will be forced to take leave is an effective deterrent I guess, but it won’t prevent all fraud. Using the “defence in depth” methodology, to have many different layers and processes that this wily snake would have to weave through is best.

No policy is complete without some consequence it seems, and the adverse actions come right with the policy and if you end up breaking the agreement that you signed - saying you would follow the AUP then you might be fired, suspended or have to go on re-training. This documentation makes an employee liable, and so it isn’t likely that they will breach their agreements.

Above all though it is important for every employee, no matter their position to continue their education, at the very least because the corporation wants you to remember the policies you must uphold, to become more skilled and such, but aside from that crap it is great for you to learn a new skill, as the symbiosis of that skill to all your others make you exponentially sharper, a more efficient labourer and the bonus is you might get to finish work early!

And if we get to the point where we have to say goodbye to an employee, a good thing to do may be an exit interview. This is where an employee shouldn’t feel the need to hold back, and voice their concerns about current corporate practices, any moral or cultural problems they found themselves in or observing and what they thought the company did well or did not do well. Questions that may arise in an exit interview are :

What did you like the most / least about the job ?
Did you find the training for your position was adequate , would you have wanted to receive more?
Do you think changes need to be made to any policies or procedures ?

These interviews are typically done in the HR department, and they might remind you of any NDAs that you will still have to uphold, which may be especially true if you plan to continue working in the same industry, because if you leak trade secrets to competitors you can still go to court.

Side note: If you have gotten this far and are still a little confused on the difference between a Standard Operating Procedure and an Acceptable Use Policy then allow me to explain. An AUP is sort of like a little rule that governs a particular thing, so a clean desk policy or work clothing policy all target a specific aspect, which may then be incorporated into SOPs. SOPs are a business function, AUPs can be inserted into, and hang over these functions.

A collection of policies create guidelines, so for example the Coronavirus guidelines encompass policies on cleaning tables, wearing masks , getting people to use the Coronavirus app etc.

General Security Policies

These are policies that will most likely be enacted by every company, and things that all employees, regardless of industry should keep note of.

The first is social media policies, what you can and can’t post etc. I don’t just mean things in the NDA , but any company information which has been classified as private or higher. This is what labelling your data and training your employees to have the company’s mindset is crucial. Leaking personal details can leave you vulnerable to Phishing, Spear Phishing , social engineering the list goes on. We can leave things like our graduation dates, birthday , pet names all making it much easier for someone to make a password list to crack our accounts…

Alongside communication with peers, there are peer-to-peer (P2P) applications where users can download and share their own content and data. Now it doesn’t take a genius to see how unbelievably insecure this can be. Peers can become servers, and we put all our trust in there hands. Goodness knows what kind of malware we’re getting , or in fact what kind of data we end up sharing. If these applications are installed on company computers then corporate data can be leaked and then sold on. Luckily though this sort of software is swiftly blocked at the firewall level.

Another threat besides rogue websites and our own incompetence is something called a malvertisement. This is where an attacker may have used a Pharming attack to redirect our request from a trusted website to a malicious one - albeit similar in appearance - but the ads on that site may be loaded with bogus JavaScript , which could take us to yet another site - this one with the goal of having us download malware. You can see these sorts of bogus ads on social media, or even on legitimate websites that were either hacked into or tricked by an “up-and-coming , totally legitimate” advertising agency.

5.2 Concepts from a Business Impact Analysis, and Business Continuity Planning

Business continuity planning (BCP) is where a business will hope to construct a plan for potential outages of critical services and functions.

A BIA is simply an analysis done to identify and the components that are vital for the organisation’s success i.e which components, systems and functions make the most impact. It should be done before any disasters have happened , as we want to know what systems to prioritise, which systems need the most attention and vice versa; on the other hand, a Continuity of Operations plan should lay out the procedures after a disaster has occurred. A BIA should identify the downtime for those systems and the various scenarios that can impact these systems - and seeing the potential losses.

The Business Impact Analysis is part of the BCP and is the identification of all resources and vital functions, a BCP will then use this knowledge to see how they can best survive outages, fires, tornadoes etc. The BCP includes the actual implementations , the plan itself and the steps we will execute. The BIA is the scaffolding for these plans. Other components of the BCP is the Disaster Continuity Plan (DCP), which deals with steps to reinstate systems after a disaster, deals with hot sites, cold sites and how best to reinstate backups i.e do we use differential or incremental backups?

Based on how great these threats pose to our business and everyday functions, should be how we prioritise them. If we lived in Japan for example, which was on the intersection of three tectonic plates, then you can expect a lot more earthquakes than the average nation.

A BIA is comprised of several different functions and roles, which look at a particular disruption, threat or what have you and identify the key business functions that would be impacted by it. Then we would rank them and decide how much of our budget we would put towards each one, and it may be that in the area the business is located, particular services will cost more or less depending on the area, and hence this will impact the BIA.

Identification of Critical Systems

A BIA will hope to address these questions :

What are the critical systems and functions ? A BIA can also establish which are the vulnerable processes and functions , which may be liable to attacks in the future. The BIA gathers an awareness of resources per department, and the security controls for each. If it shows there is no video surveillance technology in the server room, then it suggests that attackers could enter and begin exfiltration without us being able to recover who had done this, we would be really weakened by such an event.
What do these critical systems and functions depend upon? How easy is it to make these assets fault-tolerant ? I mean, if we have web server hardware which is made from some unknown company and only they make the spare parts, well it is going to be a nightmare trying to get that baby up and running , never mind the costs of getting the parts themselves.
Looking at everyday business activity, what is the maximum downtime limit of these critical systems and functions ?
What scenarios are these critical systems most likely to be outdone by? A server room door shouldn’t be outwitted by the lockpick , and security guards shouldn’t be outdone by fake lanyards and high visibility uniforms. Once we establish the scenarios which may impact systems and functions, we can begin to hypothesise on the solutions.
What is the potential loss from one of those given scenarios?

Once we have identified these mission critical functions, we need to ask ourselves to what degree do we need them to be available? Should they be accessible during working hours, like a department computers, or should they be available at all times, which is usually the company website etc. Companies are all based of ideas, and how those ideas are instantiated (and their constituents) are most likely going to spawn systems, which are the crux of the business itself , moreover a high degree of availability - let’s say “5 nines” which is simply 99.999% of the year the system should be available.

In our departments, which are segregations of business critical systems (that perform critical functions), there are the SMEs (Subject Matter Experts) who make us more aware of the intricacies of the system, what makes it stand, what makes it a quality system and moreover assists us in our impact analysis.

Impacts of Critical Systems going down

After we have identified and combed through the topology as far down as we can, now we need to take this blueprint and realise the resources that go into each one. As we assess the impacts associated with the loss of such systems, its important we consider the impacts to the following:

Life
Property: offices , equipment
Employee safety
Financial
Reputation

Reputation is pretty much ubiquitous for every layer, but life is the most precious , as it is the most irreplaceable.

If you think of the business and its departments, offices and systems as the sort of map a war general would use, then you can think of the next stage in the BIA report to include the soldiers, weaponry , food etc that go into each layer, if that section were to be destroyed what exactly do we lose?

Say a hospital was running a BIA, obviously we expect the availability of things like drip-feeders, heart monitors to stay up as long as possible, and things like generators shelter them from being subject to things like power outages - thus allowing life to be preserved in some more intensive care cases.

Privacy Threshold Assessment and Privacy Impact Assessment

These two become part of the BIA process, and they are what we can start off with, as the PTA is conducted to determine what levels of information a system is collecting to determine if a PTA is required. Obviously not every business needs to run assessments of data collection if they don’t do any, much like this blog, but businesses like Google are an entirely different universe, and it goes way beyond just running assessments on data collection, due to the content of the information they are recording. Threshold refers to the level at which a company can assure a quality regulation and treatment of user data, to have the resources to stop breaches, and the trained staff to look after the trillions of bits that encode our lives. A PIA is an impact assessment that sees in the event of a breach, maybe due to the threshold being exceeded and standards slipping, or due to a zero-day bug, then it sees how much PII (Personally Identifiable Information) would be compromised.

A PIA is designed to evaluate the protections a system already has, to see if privacy risks can be mitigated.

Recovery Point Objective (RPO) and Recovery Time Objective (RTO)

Going back to our warzone analogy, this would be akin to the event of losing troops, how quickly can we get back to normal. The differences between the two really is that RPO is the last point in time that our backups were good (last point we had men on the ground). If the losses are fresh, and our backups were only hours ago then we would say the last point in time is very close to the present, hence mitigating loss of availability and such; RTO however is what time in the future will our systems be back up and able to accept jobs. It is the duration of time, which we aim to keep as near to the present as possible too, so we can say the disruption is no longer affecting us.

Mean Time To Fail (MTTF), Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR)

When we are evaluating the lifetimes of our systems, the health of our soldiers, its imperative to understand the metrics the BIA will use for each component of a system. The servers for example may be old and so the MTTF would be high, and if the parts are in short supply, then the MTTR would also be high, unless we mitigate this by buying large quantities of spare parts. The MTBF are the gaps of serviceable functionality we do get from the system, which with the inclusion of new parts and other upgrades with each repair may be elongated.

MTBF , formally , is the measurement of time between the time a device was repaired and the next time it fails.

MTTR is the average time it takes to repair a failed a component or device.

MTTF is the average time , over a business year usually, that a component or device will fail and become unable to be repaired, a new product would have to be installed.

Single Points of Failure

Anyone will tell you that having a SPoF is a major bottleneck to high availability, because if that single router, or single webserver goes down, then all the connections subsequently die and cannot be re-established. Moreover in the BIA it would be ranked as one of the highest priorities to get up and running again.

If there is a particular webserver that only writes to one disk, then the whole thing will crash if that drive slows down or bricks - so this is one thing we would wish to provide a degree of redundancy and fault-tolerance toward. We could go a step further, and say that if we only have one webserver, then this may be the real problem, as we can distribute the load , provide better service and allow the drives that each server uses to hopefully last longer. With this reasoning in mind we may decide to invest in fallback cluster as a means of providing fault-tolerance.

Going back to the example with the hospitals , in the events of natural disasters and things that is when a hospital is needed most. There will always be a need for a constant power supply running, and so backup generators are essential. An Uninterruptible Power Supply (UPS) will solve this problem - clue’s in the name!

Using the above solutions of keeping redundant copies, it may be that backup routers ,“Hot” webservers are kept running so the time the systems are down is a lot less.

5.3 Risk Management Processes and Concepts

Risk management is the practice of identifying , monitoring and limiting risks to a manageable level. It doesn’t eliminate risks , but instead identifies methods to limit or mitigate them, typical manager that won’t run through the solution with their own bare hands, but creates blueprints for others to execute.

The amount of risk that remains after running through risk management is called residual risk, which the executives will take responsibility for. After all security controls and assessments have been applied, if there is still a risk to doing the job , say working on an oil rig where equipment could break or you could slip.

Threats are a potential danger, remember from Chapter 1 when we talked about threat actors, people either inside or outside the organisation as a potential threat - same idea here. The most common threats in this case are :

Malicious human threats. These are all the threat actors, so script kiddies, insiders, competitors and even nation states.
Accidental human threats. So an employee unintentionally deleting or corrupting hard drives, a new employee getting lost and setting off alarms , An admin may make a change to fix one problem, but end up creating another one. All this though depends on the training we give employees, the kind of industry we’re in etc.
Environmental threats. This includes long term power failure due to fallen cable lines, floods, hurricanes, tornadoes etc.

In risk management we want to identify as many threats as possible, and see if they are imminent, inherent, negligible or some other attribute. Such categorising and identification is done in a threat assessment. Categorising is important , as we only have a limited budget and so we need to know what ranks as the most dangerous and costly to the organisation.

Risk Response Techniques

There are a few ways in which a company could deal with these threats, one of them being the simple acceptance of risk - usually due to the high price of security controls to mitigate this. We may choose to accept all risk that is residual , as there is nothing we could feasibly do about it without spending gargantuan amounts of money. The dark truth being that if an employee has an accident, which is probabilistically unlikely, so we will accept paying the insurance costs - rather than redesigning the whole oil rig and securing every member of staff.

Another scenario may be that to mitigate a given risk would cost £100,000 a year but the actual Annual Loss Expectancy (ALE) is only £50,000 a year then we would probably just take the risk head on. But if this event was pertinent to human life then we wouldn’t have the luxury of letting it hit us, we would probably have to be at a loss here…

Transferring risk is a risk management strategy that pushes risk from one company to another i.e when we purchase an insurance policy.

Risk avoidance is the act of eliminating elements that exposure our organisation, so we might avoid purchasing a set of offices and lease them instead.

Risk mitigation is the process of taking steps to reduce the adverse affects of a particular risk. This could be patching our systems to ensure that vulnerabilities are not present.

Quantitative and Qualitative Risk Assessments

When calculating risk we usually do it in the following manner: Over a year, how many times do we expect an accident, a disaster , or a break event to cost our organisation. The event itself let’s say is a webserver failing on us, which would probably happen 0.1 times a year, meaning it only happens once a decade; however when it does fail that is going to cost us around £15,000. This cost here is the Single Loss Expectancy (SLE). How much do we expect to lose given a single loss. Now we want to know the ARO , or Annual Rate of Occurrence which we said was 0.1. Now we multiply the two together so that we get the Annual Loss Expectancy (ALE) which is £1500. Obviously we’re not paying this out every year, which is where this system starts to become a bit silly, but when the ARO is > 1, then this becomes quite useful in business year budgets.

The SLE is composed of the asset value of a given thing, so the worth of the webserver, multiplied by something called the exposure factor i.e how much of that device, system etc was actually lost. Using our same example, the value of the server (the hardware) is £15,000 and if it ends up completely broken , complete exposure to a single threat , then we would say the exposure factor was 1. Likewise if it was completely fine, which is most cases then the exposure factor would be 0, as no threats were felt. In most cases then we assume hardware to completely become unusable, at least in the Security+ examples, so you never need to include the exposure factor in your calculations, you just take the value of the thing, and that will be how much the business loses as they need to spend that to get back to where they were - which is another fallacy as the price of hardware changes but nevermind.

Say we plan to insure our webserver, and it would be £1200 a month, that would then be better than buying a new one ourselves so let’s do it. One of the benefits to running the calculations first I guess as we know what sort of insurance plans become beneficial.

All this is what we call a quantitative risk assessment plan, as we have cold facts of the price, rate of failure and this makes it objective, there is no probability and hence no dispute.

But with a qualitative risk assessment plan, we plan based on the likelihood of an occurrence and impact. It’s more to do with threats (as they are supposed to be uncontrollable - and hence random - events), so if we had a public facing web server selling products online we would say the probability of attack is high, and the impact to the business is high. Now we gather data to come to a conclusion, although not always a consensus will arise , to plan accordingly.

5.4 Incident Response

Incident response is the organised response which hopes to address and manage the aftermath of a security breach or cyber attack. This is the main goal, just to limit the damage and reduce recovery time and the cost to the organisation.

An organisation that has an IR versus one who doesn’t have different mindsets , where the former is planning for when the attack would arise, whereas the latter is still in the make-believe land of if. There will be, and it’s only a matter of time.

IR Planning - Categorising Incidents

Much like threat assessments, the first step of an incident response it to prioritise and categorise the different possible incidents. The number of incidents and their nature all depends on the infrastructure of the business, what they do, what sort of equipment do they have, how much of it is in the public domain and this sort of thing goes under threat assessment, but the types of incident would then be how those components could be compromised by a cyber security attack or breach (a man-made attack).

There are some categories with which IR teams could respond:

External / removal media being stolen
Attrition. An attack that employs brute force methods to compromise, degrade, or destroy systems, networks, or services
Web-based attacks . This goes all the way back to chapter one.
Email , so phishing , spear-phishing etc
Improper usage. This is deliberately vague as this pretty much defines hacking, an extension of the functionality of a given component or system which could be any part of the corporation’s digital and physical infrastructure.
Loss or Theft of Equipment. Data wiping servers for example.
Other…

IR Planning - Roles and Responsibilities

Much like the war zone blueprint and cut-out soldiers analogy, we are going to war only this time we have been attacked by a man-made threat. As always the first step is the resources, the men on the ground and this is the incident response team itself. They are trained and tested for incident response, and should have a plan in place for every category we mentioned above, and their job is to try and uphold our RPO/RTO - indicating that the attack had been thwarted and business can go on as usual. IR team members go through practice exercises, training courses to determine the when, why and who to engage during particular incidents. It’s pretty much identical to a fireman trying to stop a house fire. The Cyber Incident Response Team (CIRT) , are SMEs (Subject Matter Experts) that we can draft in from outside the organisation, a predefined group of super-skilled individuals that can assist in incident handling.

Security Management. The management team should help lead the response team during the planning phase and with carrying out IR procedures. Management is the overseeing aspect, they cobble together reports and update other teams on the status of the response team. They also help drive corporate support towards them.

Compliance officers. Having detailed knowledge of compliance rules and procedures , compliance officers should be engaged to determine any steps the business must take to maintain compliance with a given standard. Compliance officers therefore watch over IR teams to see that they are executing their tasks inline with SOPs.

Other technical staff can also assist i.e technicians , network admins who could provide logs and backups - they all help to gather more details of the cyber incident.

Now that we have the team, we need to formalise an Incident Response Plan

First we need to define all the things that a team would respond to. This goes back to our categorisation of incidents , so if its a malware infection or data breach that is more than sufficient reason to respond, but if its a power outage or system crash it may not be something that requires more than a technician.

Now that we have gotten all the events we classify as important , we need to direct a CIRT (Cyber Incidence Response Team) towards it, and this consists of subject matter experts, IT staff with extensive penetration testing abilities and experience.

For each member we have to assign them a function that will try to carry out over the course of the incident response. So a network engineer may team up with one of the CIRT members to work on the problem itself (to contain the matter), whilst the team leader may be working on providing updates to subordinates and writing reports for senior management.

After we’ve identified an incident , and we see that it is cause for concern and we have our team (who understand their own responsibilities and roles) ready to go, we will need to escalate this and alert our head of department or whoever, and we shall begin our response!

Now we know that some things aren’t worth the CIRT going after, but there are also some incidents that isn’t worth the time of the executives. Unless it’s a data breach, which could affect reputation and may have some litigious consequences, then we should report it as we may have to notify customers…

And lastly are exercises. All IR team members should regularly meet and take part in exercises to practice responding to incidents. The main goal is to understand team dynamics, to establish a plan and to - basically - know what to do in case of an incident. During the incident is not the right time to ask - “So, what do you want me to do”?. We should have sharp knowledge of the tools we use, not faffing around with shiny new ones in a crisis, tabletop exercises are where we should test out new kit. Tabletop exercises are where the team discusses what steps they would take during a given scenario, and after the game has been played and the incident supposedly stopped, there is a run through at the end to look through the game, the documentation and see ways to improve going forward.

Just to recap , the plan will include:

Categorising incidents
Assembling CIRT members
Assigning roles and responsibilities to each member
Escalation if we find an incident we need to respond to
Assigning priority levels to which we report to executives
Exercises.

IR - The process

Preparation. How we can we prepare, change our method to respond to incidents faster?
Identification. What category does this incident fall into?
Containment. How can we contain the incident to just the point it arose?
Eradication. How can we nullify the incident?
Recovery. Are there any backups, spare parts that we can use for this system?
Lessons learned. What did we learn from this lesson? At this stage we will write up a final report, documenting the incident, the process and the damage. We should also include a root cause analysis, so we know at what point in our security , network or physical infrastructure there was a problem.

And repeat.

5.5 Digital Forensics

Digital forensics is a branch of computer of computer science that overlaps with forensic science, and it is the recovery and investigation of material found in digital devices. Forensic operations run under the assumption that the data that is collected will be used in court, so there needs to be processes in place to avoid any tampering with the data or the analysed computers.

Now, to investigate or recover data - data must be available in the first place. Volatility refers to the ease at which data within a certain memory bank (CPU registers, RAM) is lost. There is a hierarchy of volatility and it is crucial that we keep some kind of log , not on each one but of crucial processes and whatever hard drives or addresses such program uses we should be able to log in the event we need to go back and check for malicious behaviour.

Data stored in the CPU cache and stored in RAM should be the first collected as this data is overwritten quickly due to the cache being small and used for pretty much everything - apart from things like HSM which as we know is a Hardware Platform Module that signs, decrypts and generates keys itself without the OS or host knowing. Differences between TPM and HSM.

Most volatile to least volatile:

CPU cache and registers
Remaining data stored in RAM, system statistics , ARP cache etc
Temporary file systems
Running processes
Files written to disk
Remote monitoring data from that system
Archived data

When we see that a particular server, say the FTPS server , has been quite volatile then we should be looking at things like the partition table data, seeing if swap space is full , which partitions have ballooned in size and how can we manage this? Also, it is worth consulting the temporary directories , like /tmp and all subdirs , so we can understand what processes are chewing through all our memory.

Collecting evidence

Once the incident is contained by the CIRT , we can began to conduct an investigation on those compromised computers, and any damage done to the network, or even physical assets. We need to make sure that we can collect data whilst not at the expense of the system that is being harvested. For example, if we manage to keep a victim’s computer running , so the data in RAM isn’t lost , we don’t want to start running clunky tools which start to override this space. Maintaining the quality is essential, and we can insure ourselves of a lack of change by hashing the victim’s key folders or files, or even the whole drive and then checking it again after the collection is finished.

Capturing system images. This is where we want to get a replicate of the drive on the victim PC’s computer without modifying any bits, visiting any files - which might change the last-accessed date, we just need to pull the bits out. It differs slightly from something called a disk image which you may have heard from chapter 3, but a forensic image is different in that we don’t presuppose this one to be secure (we know it isn’t since it got compromised !) . Disk images are loaded with secure configurations and helps ensure a new system starts in a secure state. After capturing the state of the system , we want to preserve this one and keep it for evidence, we shall be snooping around most in the copies we make of the forensic image.
Network traffic and logs. If the trojan or whatever was trying to make outbound requests, then we should have some MAC and IP addresses to work off of, and by piecing together all the logs in our SIEM we can build a picture of what was going on in the moments before and during the attack.
Capturing video, so CCTV, screen recording…
Screenshots
Witness interviews

We should also be able to capture hashes of the system to be able to prove that it has been remained unaltered during our investigation.

Chain of custody

As evidence is collected, we need to ensure that we maintain the integrity of the evidence collected, as the data from things like registers is likely to be incredibly sensitive. Without effective controls in place, the evidence could be put in reports, real systems could be tampered with stuff could get leaked. The most effective method is to document everyone who comes into contact with this information, which is known as a chain of custody, or orders of responsibility.

Other than providing who had the evidence, an established chain of custody document should tell us where the data has been kept, for how long , and it’s classification level. We want to ensure that we preserve all data that we collect, as this is important for our current investigation. There will be a lot of data, and we need to keep it all.

This may also help in a future investigation (or if new evidence becomes available and we need to revisit the current incident).

Legal Hold

Courts will rule evidence inadmissible if there is a lack of documentation demonstrating adequate control.

In some cases, we may be asked to hold and preserve this evidence while awaiting future court dates. Organisations should have a documentable process in place for this on going preservation or legal hold.

During this legal hold, it may be that special procedures are employed that would not otherwise by overarching normal preservation processes. Great example would be GDPR, where an organisation holds onto a piece of data due to its significance in legal proceedings, despite it going outside of one of the guidelines.

Recovery

Strategic intelligence is when we scour through our screenshots, our log files and our system images (dumps of current system state) and we find something enlightening, something that shows how it was possible someone slipped through our system and how they broke our services to do what they wanted. This is all great information that can lead to policies being updated , maybe the way we handle data is poor and this leak showed us what we need to cover. Where else in the company is a similar attitude being employed and how can we rectify that system? If some of our procedures are proven to be inadequate then it can only makes us stronger.

Counterintelligence is where our findings can be used to go on the counter-attack. What did we learn about the attacker, any IP addresses, emails or websites that links us to him/her ? Were there any distinguishable features about the attack that we should be on the lookout for?

Active logging. If we truly are capturing every move an attacker makes, then we should be able to get a foothold on the way he/she made it out. If we want to go after this person, a good logging system is the very beginning of the hunt, once we find links then we get searching. A good attacker will want to use burner machines, be in different parts of the country to change IPs, this is one reason why free travel is so powerful in Europe, launch an attack from Belgium, get on the train and relax in France.

5.6 Disaster Recovery and Continuity of Operations

This chapter is going to focus on how we can recover from disaster as quickly as possible, depending on the type of disaster that is, and how we can continue to provide the same level of service as we did before the incident.

Disaster recovery is part of security planning , in which the goal is to minimise the effects of any significant events. Disaster recovery involves setting policies and procedures that enable the recovery of vital systems during a disaster. So, we could have a procedure in place that puts backup hard drives offsite in a remote facility and so if our server breaks down it isn’t a problem.

Continuity of operations is taking the measures we have put in place for disaster recovery and applying them to continue in business operations .The US government has mandated that agencies need to continue to provide services even during times of crisis, so uptime should be high.

Disaster recovery is part in plan of keeping a high degree of availability, but it isn’t to do with optimising current procedures or changing the shape and style of our network it is the insurance policy that sits alongside those measures of high availability. Disaster recovery goes hand in hand with RPO and RTO.

Recovery Sites

To get operations to continue, we need to replace our old, now broken , services with a replacement that can be the new server and acceptor of connections. The difference between the three levels of recovery sites are:

Hot sites. This is a production environment that should mirror the production environment previously used in real time, and it should allow for a seamless transition from one server to the other, of course this means that you’re having to pay for hosting fees most of the time unnecessarily for this second server, which may be costly. I guess if you had a high MTTF for your server, and it had a clunky part which you couldn’t replace at the time, then you would have this second server always ready and it will be in use a lot of the time, so it’s not a total waste of money as the business can keep going.
Warm sites. This is where parts of the server are already up, configurations already entered, databases are active but we still may need to do things like start the server itself, and they usually need data to be restored before operations can resume (compiling configuration into memory etc). We still have all the necessary hardware and the network connections established.
Cold sites. This is where we have all the stuff, not none of it is installed, operational or established. We need to turn the server on, connect it to the network, connect it to the DB, run the configurations that specify ports, redirects etc. It offers considerable cost savings but obviously it takes a much greater amount of time to transition to this system, and to get this service going.

Backup Types

Full backup , this makes an archived copy of every file selected for backup, which is usually the entire drive unless you specify files that should be left out.

Now, it doesn’t make sense to backup the whole drive every Sunday, as not every file changes, every week we would only want to keep track of what has changed from the last backup and archive that. This can be done in two ways:

Incremental backups. This is where every week (after the first full one), we have small, separate drives which hold each increment, so if we did this for 52 weeks of the year we would have 51 increments, some larger than others depending on the week itself. So you can see in the image below, to recover fully from the disaster we would need every drive.

Differential backups. This is where , every week (after the first full one) we would have , theoretically, one drive which gets fuller and fuller each week with the different increments for that week. These differentials build up on a single drive until it is full and we need another, but the number of recovery drives we have to manage is fewer , though what we gain in convenience we make up for with control.

Snapshots are backups of virtual machines, programs/applications and this sort of hard drive data, they capture the configurations of such things at a point in time, allowing us to scale back to a particular version and thus building them up again a lot more quickly.

Continuity of Operations Planning

During planning is when we want to identify and communicate our alternate processing sites and business planning - essentially trying to answer the question: “So how we bounce back from disaster as quickly as possible”? During a disaster , everyone should know what to expect, and what steps should be taken to ensure continuity of operations. Like IR, we want to imagine and play with different scenarios, so we don’t have to learn on the spot, we can exercise our plans through tabletop exercises and afterwards document and report what we discovered (any blindingly obvious errors that need tuning) so that we may improve our procedures.

When planning, we should following similar practices to that of IR. Which was:

Preparation. How we can we prepare, change our method to respond to incidents faster?
Identification. What category does this incident fall into?
Containment. How can we contain the incident to just the point it arose?
Eradication. How can we nullify the incident?
Recovery. Are there any backups, spare parts that we can use for this system?
Lessons learned. What did we learn from this lesson?

Geographic Considerations

When selecting an alternative site, maybe for storage in the case of a cold site, or a completely operational mirror (hot site) we need to consider what problems we may face when setting this up in a different geographical jurisdiction.

For off-site storage we need to check how much of our data the provider will be accessing (read the T’s and C’s) and what are their policies in the case of theft or breakage of their facilities. Compliance mandates like the Federal Information Systems Management Act (FISMA) and the Health Insurance Portability and Accountability Act (HIPAA) for PII pertaining to health.
Distance. How far will we go just so we can pay less? It should be a battle of accessibility versus recovery, as we may need to go to the site itself and check on our systems. The systems themselves should be outside the scope of a disaster, so for example if I was a Japanese business owner on the coast I would not want to keep my backups on site as they may be destroyed by flood, tsunami , earthquake, it would be safer to move them into the centre of Japan, or better into another country, but then they become harder to access, and if you have your web servers in the office then you would have to get them flown over, goodness knows how long that would take , before you can be receiving requests again. This is why AWS is just so powerful, as the backups come along with your own web infrastructure, and the backups can be interchangeably used by your systems. So storage providers add niceties with electronic transfer of data, where bandwidth suddenly becomes a factor that you have to think about.
Location. There could be different legal procedures in different states, countries, and the data we store in that country we will have our data subject to their data laws. Business regulations differ between countries and states, so what we can record and work with may be more or less prohibited. The staff that go to this location also need to have passports and the right paperwork as otherwise we have basically thrown our data away, and with the travelling paperwork, and legal documents should be included too.
Data sovereignty. Is the data itself subject to monitoring, would it be easy to move data out of that country if we were to change our minds? Depending on the data set, whether it is health information, emails etc there may be stricter procedures, and some countries might not allow you to work with them if you won’t hash data.

Failover clusters

I briefly mentioned this in 5.2 but I think it’s important to go over in more detail, as there are different ways in which we can manage our resources in times of crisis. One way of using failover clusters is to have one active server in your cluster configuration, and the other sits there and will come into action when that node fails. This is called an active - passive failover configuration.

These two nodes can intermittently send health checks to each other, to identify whether or not one needs to be switched out. This second node obviously needs to be active then, and if one of the checks turns out to show “failure” - or something along those lines - the cluster software installed on both nodes can be used to deactivate and activate the other.

5.7 Control Types

Control types are setting the right type of control (measure to prevent , minimise or discourage loss due to a known security risk) onto a given system, application, server or office.

The lowest control type “level” , and by level I mean closest to the physical are the physical controls - jeez I didn’t think you needed to define that one… but you’ll see where I am going with this. This level encompasses things like cages, fencing , locks , electrical wiring , key codes, security cameras, security guards, corporate ID badges - all of this is supposed to deter, detect and respond to any physical threats.

The layer that sits above the non-technological (or at least very low-level technology like RFID and NFC) would be the technical controls. Now we are in the realm of software and are messing with the technicalities of hardware. We include things like encryption algorithms , access management devices , auditing software and accountability records. It should aim to make it more difficult for an attacker to mess with the hardware once attained, it should make distant network attacks harder to get away with etc. Technical controls involve things like installing antivirus , firewalls and programs which will prevent compromise.

The highest layer of control , the most ethereal I mean , would be the realm of ideas, policies and procedures, which is just the particular ordering of people, software and hardware in a desirable configuration, much like software is the orientation and rhythm of electrical signals, and physical controls are the orientation of many materials and structures into a form we find fits our objective. Administrative controls define things like risk assessments, AUPs , SOPs (the difference between these two is that SOP is a procedure and should define basic business functions, whereas policies sit on top and govern what kind of procedures we can enact : For example, SOP would be needing to access and work with banking software as part of a normal business day, AUP would be how we can make that safe, acceptable and compliant with external regulations.

Control Objectives : Deterrent , Preventive , Detective , Corrective and Compensating

Deterrent - Here the objective is to dissuade a person from doing something , we should impose some sort of consequence, like physical pain (sharp tips on fencing) , or the possibility of arrest for breaking and entering. We are not directly preventing access, though we want to get into their minds. A nice piece of information to remember is that a login banner is a deterrent security control , as we’re saying - “Look you can try and access this system at your own risk, and we will be monitoring your actions”. The login page, which has the username and password fields, is a preventative security control as it aims to limit and control access, much like a firewall , door lock etc.

Preventive - Here the objective is to stop access, if someone gets into a part of the building where we have had preventive measures put in place then we can assume this person isn’t supposed to be here. You know the standard line in movies - “I need to see your badge, some sort of ID…”. Preventive measures are things like door locks, security guards, firewalls. Their imposing nature, and the difficulty of breaking into these though should deter people, but they are not in the deterrent group as they make it almost impossible to get access but also abide by the mechanism.

Detective measures. Again , this doesn’t prevent access, none of them do apart from preventive (funny that), the point of detective measures is just to keep an eye on resources , record any intrusion attempts and try and make it so we can identify who or what has breached our property. This could be security cameras, motion sensors, IDS which falls under detective, but there is also IPS which does the job of a firewall (detective and preventive).

With corrective controls, we want to fix , numb and essentially return to the state we were before the incident happened as best as possible. Say some attackers wiped our systems, if we have backups then we can return to normal pretty quickly - assuming we backed up within a close time frame to the incident. Good corrective controls mitigate to the last frame before the attack happened, like the theoretically perfect RPO. They are similar to compensating controls, but this would be where we make do with a substitute, as the system itself can’t necessarily be corrected. So compensating controls might be having to use a backup generator , a UPS , as the main power as stopped working. See how the use of backup can still jump into two different control types, it all has to do with whether they aid the main system, or create a substitute.

These would be the mechanisms that we put in place to satisfy security requirements , to have some sort of measure in place that compensates for an actual fix. Compensating controls are put in place when it would cost too much, or become too impractical to fix. Best example is probably an old bank having their core business functionality written in FORTRAN , a language written in the 60s that has hardly any programmers, but if we keep applying patches, it still works fine. It would cost billions to re-write, test and get another version of the core software up and running, so just patching and watching over it constantly seems to compensate (not).

5.8 Data Security and Privacy Practices

All data is not created equal. Handling and security requirements change depending on the sensitivity of the data type.

Public. There are no restrictions on who can access the data; moreover, this has to be the least sensitive data and shouldn’t be usable in business compromise, so typically we would think of things businesses wished other people knew, but the most sensitive strand of public data would probably be employee emails, phone numbers and names.
Private. Now, an employee may have both public and private information, but public is somewhat replicable, it doesn’t really identify a specific individual but private information (stuff that we do need to protect) could identify the employee as this contains social security numbers, health information which is a lot more sensitive. Such information could be of the employees or the customers, but either way we need to make sure such records are encrypted, monitored and the processes which operate on them tested. Person Identifiable information (PII) is any data that could single out an individual, whereas Personal Health Information (PHI) is any data about health status, provision of health care, or payment for health care treatments, which an individual is undergoing. Hackers want medical records because they contain a treasure trove of information, take the last bit of PHI , that would include social security numbers, bank details, email addresses, and if the hacker themselves doesn’t feel like tinkering with these , then they can just sell it on via the dark web and cash in that way.
Proprietary Information is the property of the organisation, and it’s within their ownership , typically trade secrets that pertain to their operations. It may be corporate information pertaining to employees, managers , and things which really should be kept private to themselves, or at most to their group or superior(s).
Confidential Information requires restrictive access from within the organisation, even employees may not be granted viewership unless they sign an NDA (Non-Disclosure Agreement).

Data Sensitivity and Labelling

So we may choose to categorise data with the labels we see above, and we need to make sure we classify everything. Backup tapes need to have labels which show what kinds of data they’re archiving, file servers should have a directory structure and reflect the permissions which groups will need to access them, and turn away people who don’t have the power. Aside from stringent separation , data can also be tagged with metadata, showing its level , author / owner and even usage. Metadata labelling is done by someone called the Data Steward who we will discuss in more detail later.

Good data labelling will mean users of it are aware of what data they are handling and processing. If we hire a new content creator for our company , he may have access to private company information, but if we tag with the “Proprietary” and / or “Confidential” tags - and assuming we trained him/her sufficiently to our culture - then there shouldn’t be an issue; however there may be a lack of organisation and so private information may be included in posts, articles or on social media.

Data Roles

Now let’s go back a bit to the part where we described general roles within each system of an organisation:

The Data Owner. This person is ultimately responsible for the security of the data, he or she doesn’t actively manage the data itself and run checks on the system that houses the data, but if there were to be a breach, the Data Owner would get the blame. This role is usually a CTO or even CEO role.
System admin. This person enables the use of the system to the outside world, to collect data in the first place, but as they open up the pipeline, they best be prepared to manage and close it, and it is the role of a system admin to oversee the current state of the system, change configurations if need be to ensure that things run smoothly.
System owner. This person oversees the system administrator, to see that it complies with the Data Owner’s requirements, and any international standards.
Outside of roles intimately connected to the system, there are the users. People who perform business functions with the data in mind, this could be the marketing team, it could be a data scientist running regressions , it could be privileged users which have a degree of power, and can make changes to the database and assist in its upkeep. Then there are the executive users, which is akin to the manager in a department (the Head of Sales/ Marketing, CEO/CTO,CFO etc…)

Data roles in a company aren’t really technical roles that come with their own certifications and “must know programming languages”, they are just designations made by the organisation based on responsibility , not purely technical in nature.

Now I’m going to talk again briefly about the Data Owner, who is the person ultimately responsible for the quality of the data, and it’s integrity , but delegates the actual maintenance of it to more technical staff. The VP of Sales for example would probably be responsible for overseeing the swathes of data that pertain to customer relationships, and the CFO would hold all the business' financial information. Data Owners appoint Data Custodians to look after the physical and technical aspects of data upholding, so the actual backing up of databases, fixing database issues, configuration databases (whether we log, how extreme is the logging , is there IPv6 support etc); moreover they might establish network connections between clients and databases.

So we could say that Data Custodians bare the technical accountability for data assets; contrary to this Data Stewards bare the corporate accountability for data assets - and what I mean by this is that a steward will talk with shareholders, privacy officers, data owners and such to develop a set of controls that the database should abide by, data within the database should be conforming to those rules (what tables should look like, what fields are encrypted, the value types themselves). Data Stewards may enter metadata values for certain tables, as many internal systems may reference these tables (like payroll, HR) but the difference between the two must mean that they view the data slightly differently, as their requests should be asking for certain things.

Data Custodian	Data Steward
Definition: A person who bares the technical accountability for a set of data assets.	A person who bares the business accountability for a set of data assets.
Data Security. Must implement the right security controls , the ones defined by the data steward, regarding things like accuracy and privacy.	Data quality
Will control the access rights of the data, as specified by the metadata labels that the Steward defined.	Data accuracy
Backup and restoration.	Data Acquisition and entry
Implementations, and the technical standards and implementing all policies in regards to technical upkeep, which may be specified by the privacy officer.	Proper labelling and metadata management of data, so as to show the segments of the data which different groups should be working with.
Creates the audit trail, as they manage the backups they can piece together a user’s history.	Being mindful of governance and compliance when making these policies and labels.

So to end, a Data Custodian is someone who looks after the actual structure of the data as bits, a Data Steward looks after data as entries, the values they represent, what they mean to different people, the accuracy of the data and the tables themselves recording what is demanded by stakeholders and such. A Privacy Officer is someone who , like the data owner - has a far-reaching responsibility but their job is to implement policies and procedures to help carry out privacy controls on the organisation’s data, trying to make it as secure as possible.

Here’s a nice chain of blame that would ensue in case of a disaster:

Data Owner -> Privacy Officer -> Data Custodian -> Data Steward

Data Retention

Data retention policies identify how long we aim to keep data that we have used in the past archived. Do we keep archives of email for one year, employee logs for two years, and execute emails indefinitely ? These are the sorts of questions that system owners, data owners, privacy officers will all have to tackle, and this may affect the outcomes of future legal proceedings - as courts can’t demand that they see corporate emails of three years ago since we just don’t have them. On the other hand, if the court sees there is no binding data retention policy in place, then it will demand to see emails from ten years ago or more, and admins will scramble to find as much as they can. Another tangent would be to think about the case where there is a data retention policy, but admins do find older data , well they must show it - as otherwise it would look like a cover up if they destroyed it.

Companies will make standard backups for three different things , outside of consumer data :

The first is the simplest and the most natural - it’s what gets reused the most as well - which is version control . Often times, when the organisation is developing applications, managing a lot of Virtual Machines or servers we will have many different versions - stages in their structural lifetime, which may be important to preserve so we can make modifications to certain frames as and when we need. In the application development sense, there may be a development version, which is slightly ahead of the more stable one so we can see what features we can push and experiment with without damaging the product itself.
Recovery from cyber attacks. Talking more broadly now about organisational data, if the company is subject to a breach, at a time when we’re not exactly sure, then having backups is essential so after the IR team has dealt with the issue, we can revert back to a point of tranquillity - though we need to have included the necessary patches and whatnot otherwise we are leaning into the same punch.
Legal/regulatory compliance. Many industries must maintain logs of their business practices, for accountability via law, records must be in accordance with such standards.

Each of these data retention principles have different storage requirements i.e encrypted versus non-encrypted , how long to store etc.

Protecting PII and PHI

One way that attackers can gather personal information on an employee, would be to stroll up to their PC , insert a USB and it may run a script which harvests everything in the home folder. Now there are a lot of things that would need to lead up to this, but for an insider though this would be easy ! The solution would be to add a rule to the OS to deny connections coming from USB, and to block all interfaces. To do this on Windows , see here.

Data Destruction and Media Sanitisation Techniques

Shredding. A simple method of destruction, as it makes it incredibly difficult for the keep dumpster diver to put all the pieces back together again, especially if its soiled with other rubbish. Dumpsters should have a lock around them though to prevent the shreds being accessed.
Burning. If you want to go a step further, you could just burn all the documents you don’t want people to see, so maybe it is the business' collection of health records of a deceased person, which the hospital keeps and you no longer need to keep track of. Other PII which has expired, due to contractual or legal obligations may see themselves destroyed as well.
Pulping. This is where we get a large tank of water, wash the documents through to get the ink out, and then the paper can be recycled.
Pulverising. A pulveriser is a type of shredder, though it is used to destroy hard drives, disks and such. It is meant to completely mangle the storage beyond any means of repair and any chance of accessibility. You could also just get a drill or hammer, and smash the drives this way so that the disk isn’t readable by conventional laser readers, though sectors may be retrievable if the work is poor.
Degaussing is the ultimate pulverisation , as it skews the electromagnetic field present in the drive(s) and the alignment of electrons present in the material is randomised and so the field strength is significantly weakened. Pockets will still be aligned, but a strong magnet will make it practically impossible to access.
Wiping. This is where we might not want to destroy the medium but instead we might like to sanitise it (just wipe it) so the storage medium can be reused. It can be as simple as overwriting the drive, or maybe you just want a sector done.
Purging. This is to do with sanitisation, but it doesn’t have to include the irretrievable wiping of data, it is removing nonetheless, but it may just be backed up on a separate drive, or moved onto another database.

Now, many companies don’t actually do their own destruction, they most often times just get a third-party to do it, but if you’re like me then you can instantly see the problem with this… and the worst came to worst when in 2013 the NHS sent old-drives of patient records to be destroyed by a third party , and they even got their “certificate of destruction” , just to say that the work had been done, when in fact the drives were sold on eBay… It is best practice to just drill a hole through them, so that they aren’t directly accessible, and get the third-party to finish the job and throw them into their massive pulveriser.