Downloading and analyzing NVD CVE feed

In previous post “New National Vulnerability Database visualizations and feeds” I mentioned JSON NVD feed.

NVD JSON feed parse python

Let’s see what data it contains, how to download and analyse it. First of all, we need to download all files with CVEs from NVD database and save them to some directory.

nvd feed json download

Unfortunately, there is no way to download all the content at once. Only one year archives. We need to get urls first. Url looks like this: https://static.nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-2017.json.zip. Then we will download them all.

import requests
import re

r = requests.get('https://nvd.nist.gov/vuln/data-feeds#JSON_FEED')
for filename in re.findall("nvdcve-1.0-[0-9]*\.json\.zip",r.text):
    print(filename)
    r_file = requests.get("https://static.nvd.nist.gov/feeds/json/cve/1.0/" + filename, stream=True)
    with open("nvd/" + filename, 'wb') as f:
        for chunk in r_file:
            f.write(chunk)

Output:

nvdcve-1.0-2017.json.zip
nvdcve-1.0-2016.json.zip
nvdcve-1.0-2015.json.zip
...
nvdcve-1.0-2004.json.zip
nvdcve-1.0-2003.json.zip
nvdcve-1.0-2002.json.zip

Ok, now when we have files in nvd/ directory we can easily parse and analyse them.

from os import listdir
from os.path import isfile, join
import zipfile
import json

files = [f for f in listdir("nvd/") if isfile(join("nvd/", f))]
files.sort()
for file in files:
    archive = zipfile.ZipFile(join("nvd/", file), 'r')
    jsonfile = archive.open(archive.namelist()[0])
    cve_dict = json.loads(jsonfile.read())
    jsonfile.close()

All necessary content will be in cve_dict and if we make print(cve_dict.keys()), we get:

[u'CVE_data_timestamp', u'CVE_data_version', u'CVE_Items', u'CVE_data_format', u'CVE_data_numberOfCVEs', u'CVE_data_type']

CVE data is placed in cve_dict['CVE_Items'] list, and other parameters are for information only:

print("CVE_data_timestamp: " + str(cve_dict['CVE_data_timestamp']))
print("CVE_data_version: " + str(cve_dict['CVE_data_version']))
print("CVE_data_format: " + str(cve_dict['CVE_data_format']))
print("CVE_data_numberOfCVEs: " + str(cve_dict['CVE_data_numberOfCVEs']))
print("CVE_data_type: " + str(cve_dict['CVE_data_type']))

Output for nvdcve-1.0-2017.json.zip:

CVE_data_timestamp: 2017-09-30T07:02Z
CVE_data_version: 4.0
CVE_data_format: MITRE
CVE_data_numberOfCVEs: 7583
CVE_data_type: CVE

Ok. now let’s see how the CVE item looks with
print(json.dumps(cve_dict['CVE_Items'][0], sort_keys=True, indent=4, separators=(',', ': ')))

{
    "configurations": {
        "CVE_data_version": "4.0",
        "nodes": [
            {
                "cpe": [
                    {
                        "cpe23Uri": "cpe:2.3:a:microsoft:word:2016:*:*:*:*:*:*:*",
                        "cpeMatchString": "cpe:/a:microsoft:word:2016",
                        "vulnerable": true
                    }
                ],
                "operator": "OR"
            }
        ]
    },
    "cve": {
        "CVE_data_meta": {
            "ID": "CVE-2017-0019"
        },
        "affects": {
            "vendor": {
                "vendor_data": [
                    {
                        "product": {
                            "product_data": [
                                {
                                    "product_name": "word",
                                    "version": {
                                        "version_data": [
                                            {
                                                "version_value": "2016"
                                            }
                                        ]
                                    }
                                }
                            ]
                        },
                        "vendor_name": "microsoft"
                    }
                ]
            }
        },
        "data_format": "MITRE",
        "data_type": "CVE",
        "data_version": "4.0",
        "description": {
            "description_data": [
                {
                    "lang": "en",
                    "value": "Microsoft Word 2016 allows remote attackers to execute arbitrary code or cause a denial of service (memory corruption) via a crafted document, aka \"Microsoft Office Memory Corruption Vulnerability.\" This vulnerability is different from those described in CVE-2017-0006, CVE-2017-0020, CVE-2017-0030, CVE-2017-0031, CVE-2017-0052, and CVE-2017-0053."
                }
            ]
        },
        "problemtype": {
            "problemtype_data": [
                {
                    "description": [
                        {
                            "lang": "en",
                            "value": "CWE-119"
                        }
                    ]
                }
            ]
        },
        "references": {
            "reference_data": [
                {
                    "url": "http://www.securityfocus.com/bid/96042"
                },
                {
                    "url": "http://www.securitytracker.com/id/1038010"
                },
                {
                    "url": "https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2017-0019"
                }
            ]
        }
    },
    "impact": {
        "baseMetricV2": {
            "cvssV2": {
                "accessComplexity": "MEDIUM",
                "accessVector": "NETWORK",
                "authentication": "NONE",
                "availabilityImpact": "COMPLETE",
                "baseScore": 9.3,
                "confidentialityImpact": "COMPLETE",
                "integrityImpact": "COMPLETE",
                "vectorString": "(AV:N/AC:M/Au:N/C:C/I:C/A:C)"
            },
            "exploitabilityScore": 8.6,
            "impactScore": 10.0,
            "obtainAllPrivilege": false,
            "obtainOtherPrivilege": false,
            "obtainUserPrivilege": false,
            "severity": "HIGH",
            "userInteractionRequired": true
        },
        "baseMetricV3": {
            "cvssV3": {
                "attackComplexity": "LOW",
                "attackVector": "LOCAL",
                "availabilityImpact": "HIGH",
                "baseScore": 7.8,
                "baseSeverity": "HIGH",
                "confidentialityImpact": "HIGH",
                "integrityImpact": "HIGH",
                "privilegesRequired": "NONE",
                "scope": "UNCHANGED",
                "userInteraction": "REQUIRED",
                "vectorString": "AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H"
            },
            "exploitabilityScore": 1.8,
            "impactScore": 5.9
        }
    },
    "lastModifiedDate": "2017-07-12T01:29Z",
    "publishedDate": "2017-03-17T00:59Z"
}

Well, I am interested in formalized data about vulnerable software products (CPEs) and criticality description (CVSS).

Talking about vulnerable software products, you can see that this information exist both in “configurations” and “cve->affects”. Probably the “configurations” is more like detection criteria and “cve->affects” are the lists of vulnerable software.

Anyway, this is a good example why you can’t use this data for vulnerability detection in many cases. Let’s say cpe:/a:microsoft:word:2016 is vulnerable. Patched version of MS Word will also have this cpe id cpe:/a:microsoft:word:2016. It’s good to know what the software may be affected by the CVE, but for detection it simply won’t be enough.

But sometimes it’s pretty good. Like this one for Skype vulnerability:

[{u'operator': u'OR', u'cpe': [{u'cpe23Uri': u'cpe:2.3:a:microsoft:skype:7.2:*:*:*:*:*:*:*', u'cpeMatchString': u'cpe:/a:microsoft:skype:7.2', u'vulnerable': True}, {u'cpe23Uri': u'cpe:2.3:a:microsoft:skype:7.35:*:*:*:*:*:*:*', u'cpeMatchString': u'cpe:/a:microsoft:skype:7.35', u'vulnerable': True}, {u'cpe23Uri': u'cpe:2.3:a:microsoft:skype:7.36:*:*:*:*:*:*:*', u'cpeMatchString': u'cpe:/a:microsoft:skype:7.36', u'vulnerable': True}]}]

Let’s see how many CVEs have information about products (filtering CVEs with “** REJECT **” in description)

CVEs in NVD with and without CPE data

year,with_cpe,without_cpe
2002,6540,127
2003,1496,3
2004,2632,10
2005,4613,1
2006,6983,0
2007,6442,0
2008,6988,0
2009,4858,0
2010,4928,2
2011,4382,1
2012,5134,1
2013,5616,1
2014,7649,7
2015,7026,59
2016,8027,4
2017,7315,191

As you can see, there are not CPEs for some very old vulnerabilities and those that are currently in work.

RECEIVED not processed CVEs

But for the majority of vulnerabilities CPE data is somehow presented.

We can also see the situation with CVSS. How many CVEs have only CVSS v2, both CVSS v2 and v3, no CVSS data at all. Looking on “baseMetricV2” and “baseMetricV3” in item['impact']:

CVEs in NVD with CVSS v2 and v3

year, CVSS v2, CVSS v2 and v3, no CVSS
2002,6665,2,0
2003,1498,1,0
2004,2641,1,0
2005,4613,1,0
2006,6981,2,0
2007,6436,6,0
2008,6986,2,0
2009,4853,5,0
2010,4913,15,2
2011,4370,12,1
2012,5110,24,1
2013,5575,42,0
2014,7275,374,7
2015,5434,1592,59
2016,206,7821,4
2017,0,7315,191

As you can see, switching to CVSS v3 goes well.

It may be also interesting to have a look on site references in CVE items and to classify these sites. But I will probably do it next time.

Leave a Reply

Your email address will not be published. Required fields are marked *