What is it and why should I care?
Today’s topic is about two of the areas that are weakest in application security – data collection and sharing. We do a pretty terrible job as an industry in both areas, though there have been some marked improvements in the last couple of years that bring hope.
While there is no confusion around what data collection and sharing means in general, there is a lot of disagreement about both topics in specific areas. Let’s briefly define both here for clarity:
Data Collection: The gathering and storage (collection) of specific points of interest (data).
Data Sharing: The distribution (sharing) of collected points of interest (data) with interested parties.
Both of these definitions are intentionally broad. IMHO, The issues brought up about what constitutes data vs. information (collection) and who gets the data (sharing) are fruitless when you consider how very little data we’re talking about to start with. If we have broad data collection and sharing (different, better problem), we can then address the needs of the community as far as standardizing the what to share and with whom to share it.
The lack of data collection and sharing in any industry essentially means that you are unaware of what others are seeing and doing. That can be particularly challenging in security as we all share a common resource (the network) and we all are using a relatively small subset of tooling to perform very similar tasks. In many cases, data that comes from one organization could be helpfully utilized by another. This applies to our industry much more than in some other industries with wider variances on tools and processes. The reverse is also true: sharing positively benefits our industry more quickly and to a greater degree specifically because there is so much commonality.
What should I do about it?
My basic hope is that you look for ways to share your data. We all benefit from it. We are theoretically a science and engineering based field, but have a rough track record of sharing actual data to support our hypotheses. However, we do have some shining examples, and that should both give us hope and motivation to get better. On the collection side, we can look at folks like Etsy. They decided that data collection would be a central part of their DNA and invested engineering resources into building tools for data collection and monitoring – a great success story. For data sharing, there are a few popular ones, like the Veracode State of Software Security Report, the Verizon Data Breach Investigations Report (DBIR) and the WhiteHat Website Security Statistics Report. (Full Disclosure: I currently work for WhiteHat) All of these are great examples of organizations sharing the data they see for the benefit of the community. One other great example is that of Security Ninja sharing 3 years worth of data that he’s collected. He also makes a poignant quote in that article – “If you have the data don’t hide it”. Note: He’s speaking of sharing with internal teams – an extremely valuable (and nearly equally as uncommon) form of sharing in addition to sharing publicly.
As I mentioned above we have some bright spots to give us hope that we can do better. Now let’s touch on a few points you should consider before sharing to make sure you’re doing your due diligence.
Stay Legal and Compliant
Certain organizations can’t share certain data. That’s a legal and regulatory reality. In general, security practitioners have been fairly conservative with what data is shared because it tends to be sensitive, however sharing is becoming more commonplace, and that seems to be helping us all get a handle on the reality in practice. Just because you may have some restrictions on what or how you can share doesn’t mean those restrictions are completely limiting. A good example is the FS-ISAC which shares data within the financial services industry, allowing similar organizations with similar concerns to share their data in a controlled environment. In short, make sure you are allowed to share something before you share it.
The data you share may or may not have privacy-related issues. If it does, you have to make sure you anonymize the data. No one’s private information should ever be shared publicly, especially when the data that is desirable to share is not affected by the private information at all. Make sure you take care of your users and customers and don’t share anything you shouldn’t. There are lots of tools that can help with this process if you need them.
Try to Share Raw Data
As much as possible (making inherent inferences is difficult to avoid, even pre-collection), you should try to stick to sharing raw data (anonymized of course). That way, others can analyze your data and evaluate (support or contradict) your conclusions. For instance, my recent post about password storage referenced a great spreadsheet considering the cost tradeoffs to attackers and defenders for password protection schemes. This raw (even generated) data gives others a way to make their own analysis and makes us all more aware of the actual data.
In conclusion, we discussed a weak spot for the application security industry: data collection and sharing. While we have historically been pretty bad at this, there are some bright spots and we’re starting to see both collection and sharing happening on a broader scale, which is hopeful. Following a few simple due diligence tasks, we can make sure that we’re sharing safely and can help the industry as a whole move forward.