This is by no means a complete list nor anything that will fit into every-one's situation. I'm jotting down some things that I ran across in my experience as a security professional that, I think, would make things easier to handle scenarios when the proverbial "shit hits the fan". I merely started scratching the surface of this, and I intend to update it on a semi-regular basis. If you have something to add, please do post it in the comments.
First things first
The first and the foremost thing is to know what you must protect on your network.
Once you know, you can start gathering the threat data that is relevant to your business. Remember, you are protecting the business, so they have to talk to the business and make sure that you are not creating an unworkable environment for them. After all, they have to generate the profit that allows you to function.
Everyone has to adhere to them. The trick is to be able to use them to your advantage. Yet, keep in mind that the auditors will rip you apart for not following them to the letter. Remember what was said in the previous paragraph about the business... you cannot forget to let your users breathe.
Patience and attention to detail
Securing an environment is a continuous process, it takes time, it requires adjustments, it requires tons of dedication and attention to detail.
Design the perimeters with the knowledge of the activities and processes involved. Knowing the types of activities gives you a tactical advantage. Find commonalities, and use that to design what kind of defenses are needed for these areas. Not all areas require equal protection, and equal supervision, often times the business areas will give you push-backs because the protections might be too aggressive. Don't segment too much, it is like with the databases that were normalized too much, the key is finding the right balance.
Completeness and preparing for the worst case scenario
Once you are done with the segmentation or compartmentalization, make sure that you got all your OSI layers covered and then some by appropriate monitoring, blocking and the methods of mitigation. Ensure that the defenses that you've put in place are granular, and allow you to quickly and efficiently isolate compromised devices or applications. Make sure that you have accounted for all the ingress and egress points in your environment, and you have your taps there. Enumerate your weak spots, and determine how you can provide additional protection in those areas. Make sure that you know your security devices inside-out, it might be able to give you an edge against the intruders.
Real time software/hardware inventory, internal software repository, and hashes
You need to have a detailed knowledge of all the software in your environment. Even more, you need to know all the version and revision data. This is essential, yet mundane task. There is a hope if you can have a group to host an internal repository of all the software used, and then you can gather all the hashes of the known good files in your environment. This in turn can allow you for ad-hoc integrity testing against known good.
Configuration changes are best presented in diff output form, when it comes to teal-colored networking equipment and other devices that support textual configurations. Changes should happen inside of change management windows, unless you can make your SIEM has a ticketing system that can correlate the requested devices and requested change windows. Keep in mind that verifying what changed is also important.
Programs crash for various reasons, knowing what crashed, where, when, and why will prove beneficial on that Friday afternoon, when you get hit by semi-pros.
Trust but verify
People, people tend to get sloppy. Whatever your process, make sure that you’ve done your due-diligence. Automating testing and verification goes a long way, and reduces the time wasted, and people annoyed. If you can streamline some process, it is usually worth it. make sure you QA everything. In many cases, when something goes wrong, people will try to blame it on your devices, agents, software, you name it.
Baselining and usage patterns are many of the low-tech techniques you can leverage to search for anomalies. Sometimes you can as far as using the statistical methods such as an empirical rule to define what is normal and what is not. These methods can give you the pieces that don't fit the puzzle for free. On top, you'd be able to find design problems or bottlenecks in your environment. Your engineers will thank you later.
Make sure that you harden the devices before they go into production or your hair will go grey early. It’s good to have some notification measures built in, e.g. when admin account is used send a notification to your system. It usually requires lots of hand holding, and testing to get it right. The goal is to closely work with the admins, to make sure that the solution actually works, and has minimal performance impact. With the hardening step, it is absolutely essential to set up all your devices to perform NTP synchronization (with keys), or else you’ll have problems with temporal chaos, that will kill your SIEM’s database.
IP pool affinity and user mobility
Investigating devices that often change IPs is hard. With rigorous inventory management, it would be possible to assign IPs based on MAC or sometimes assign them based on userid. The longer leases, the better. Unless you got some kind of smart user tracking system, some nice NAC system, or identity management. Sometimes, you’d be able to harvest access card swipe data, and determine user’s locality. Just another trail of crumbs to use to build some smart rules based on event sequencing.
Obviously, tracking user login/logoff times with addition of the location awareness is another very good place to pick up anomalies; it adds some more meta data to your equations. How could someone be in two places at the same time?
The fact of life is that you got encrypted traffic traversing your perimeter. There are a few things you can sometimes do about it. Sometimes you’d encounter people trying to corkscrew ssh sessions through your authenticated web proxies, or even use web shells. Sometimes, you won’t be able to do much about it. In the worst case knowing source, destination, combined size of the transferred data, and frequency will allow you to draw some conclusions.
There are some extra smart firewalls nowadays, that evolved beyond the industry standards. See who’s the biggest visionary, and why. Why use the tried and true, when you can get so much more for your money. Don't forget to set up something to diff the rules every 10 minutes and open a tracking ticket with the diffs.
They allow you to AV scan content going in and out if you can use ICAP, you can even set up a Flash/PDF/ZIP/Java vault for collecting, scanning, “defusing”, and out-of-band delivering the defused content to the users. On top of that proxies can be be useful in picking up the species of malware that are not proxy aware or that can’t handle authenticated proxies.
Compare the threat data against the coverage your vendors provide, and try to fill in the gaps with custom signatures. Review the signature stats, remove the noisy and irrelevant signatures, tune the parameters, update to reduce false positives, wash, rinse, and repeat.
Got to have them all, no matter what they are, then have them analyzed, parsed, categorized, correlated, workflow/use-case sequenced, and one-offs reviewed.
So, you spent some substantial part of your budget on this shiny new SIEM. You got your firewalls, netflow data, IDS/IPS devices, multiple AVs, web proxies, NAC, mail gateways, you name it, you have them all. Then you realize that all you get is a headache from data overload, false positives coming from diverse sources, and the database in your SIEM is blowing up from the data overload.
Binding it all together
Great, what do I do about all that data? Bah! You need to organize it to make any sense out of it. Divide and conquer is one of the algorithms that you can use. You also need to use the knowledge of your environment to fine tune these massive amounts of events that are worthless or not applicable.
So, you need to use metadata, that is the data about the data you are collecting.
To bring some order to the chaos, you need to establish usage patterns, apply the knowledge you have about the business processes, and other events that can cause the anomalies. You want to have all your data available in a single place. Why? Because, you'd be able to derive some conclusions from it later on, and because you'd need to have to compare it to the threat data you're constantly collecting.
Yet I have to witness a SIEM system that allows you to feed it with your threat intelligence data, whether CVEs, Bugtraq IDs, or all sorts of advisories from various vendors, and combine that with all the data derived from your different compartments, IDS/IPS signatures, HIDS signatures, AV updates, patch deployment details, and any other inventory related data. With all that at your disposal you should be able to greatly improve signal to noise ratio in your SIEM. On another level, all this data, allows you to identify any deficiencies in your defenses, as well as produce reports that provide meaningful and measurable metrics to the upper management.