Downloading and analyzing NVD CVE feed

Downloading and analyzing NVD CVE feed. In previous post “New National Vulnerability Database visualizations and feeds” I mentioned JSON NVD feed.

NVD JSON feed parse python

Let’s see what data it contains, how to download and analyse it. First of all, we need to download all files with CVEs from NVD database and save them to some directory.

nvd feed json download

Unfortunately, there is no way to download all the content at once. Only one year archives. We need to get urls first. Url looks like this: https://static.nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-2017.json.zip. Then we will download them all.

import requests
import re

r = requests.get('https://nvd.nist.gov/vuln/data-feeds#JSON_FEED')
for filename in re.findall("nvdcve-1.1-[0-9]*\.json\.zip",r.text):
    print(filename)
    url = "https://nvd.nist.gov/feeds/json/cve/1.1/" + filename
    print(url)
    r_file = requests.get(url, stream=True)
    with open("nvd/" + filename, 'wb') as f:
        for chunk in r_file:
            f.write(chunk)

upd. 11.07.2022 Updated the code for feed version 1.1
upd. 19.02.2022 Fixed the url

Output:

nvdcve-1.1-2023.json.zip
https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2023.json.zip
nvdcve-1.1-2022.json.zip
https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2022.json.zip
nvdcve-1.1-2021.json.zip
https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2021.json.zip
...
nvdcve-1.1-2002.json.zip
https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2002.json.zip

Ok, now when we have files in nvd/ directory we can easily parse and analyse them.

from os import listdir
from os.path import isfile, join
import zipfile
import json

files = [f for f in listdir("nvd/") if isfile(join("nvd/", f))]
files.sort()
for file in files:
    archive = zipfile.ZipFile(join("nvd/", file), 'r')
    jsonfile = archive.open(archive.namelist()[0])
    cve_dict = json.loads(jsonfile.read())
    jsonfile.close()

All necessary content will be in cve_dict and if we make print(cve_dict.keys()), we get:

[u'CVE_data_timestamp', u'CVE_data_version', u'CVE_Items', u'CVE_data_format', u'CVE_data_numberOfCVEs', u'CVE_data_type']

CVE data is placed in cve_dict['CVE_Items'] list, and other parameters are for information only:

print("CVE_data_timestamp: " + str(cve_dict['CVE_data_timestamp']))
print("CVE_data_version: " + str(cve_dict['CVE_data_version']))
print("CVE_data_format: " + str(cve_dict['CVE_data_format']))
print("CVE_data_numberOfCVEs: " + str(cve_dict['CVE_data_numberOfCVEs']))
print("CVE_data_type: " + str(cve_dict['CVE_data_type']))

Output for nvdcve-1.0-2017.json.zip:

CVE_data_timestamp: 2017-09-30T07:02Z
CVE_data_version: 4.0
CVE_data_format: MITRE
CVE_data_numberOfCVEs: 7583
CVE_data_type: CVE

Ok. now let’s see how the CVE item looks with
print(json.dumps(cve_dict['CVE_Items'][0], sort_keys=True, indent=4, separators=(',', ': ')))

{
    "configurations": {
        "CVE_data_version": "4.0",
        "nodes": [
            {
                "cpe": [
                    {
                        "cpe23Uri": "cpe:2.3:a:microsoft:word:2016:*:*:*:*:*:*:*",
                        "cpeMatchString": "cpe:/a:microsoft:word:2016",
                        "vulnerable": true
                    }
                ],
                "operator": "OR"
            }
        ]
    },
    "cve": {
        "CVE_data_meta": {
            "ID": "CVE-2017-0019"
        },
        "affects": {
            "vendor": {
                "vendor_data": [
                    {
                        "product": {
                            "product_data": [
                                {
                                    "product_name": "word",
                                    "version": {
                                        "version_data": [
                                            {
                                                "version_value": "2016"
                                            }
                                        ]
                                    }
                                }
                            ]
                        },
                        "vendor_name": "microsoft"
                    }
                ]
            }
        },
        "data_format": "MITRE",
        "data_type": "CVE",
        "data_version": "4.0",
        "description": {
            "description_data": [
                {
                    "lang": "en",
                    "value": "Microsoft Word 2016 allows remote attackers to execute arbitrary code or cause a denial of service (memory corruption) via a crafted document, aka \"Microsoft Office Memory Corruption Vulnerability.\" This vulnerability is different from those described in CVE-2017-0006, CVE-2017-0020, CVE-2017-0030, CVE-2017-0031, CVE-2017-0052, and CVE-2017-0053."
                }
            ]
        },
        "problemtype": {
            "problemtype_data": [
                {
                    "description": [
                        {
                            "lang": "en",
                            "value": "CWE-119"
                        }
                    ]
                }
            ]
        },
        "references": {
            "reference_data": [
                {
                    "url": "http://www.securityfocus.com/bid/96042"
                },
                {
                    "url": "http://www.securitytracker.com/id/1038010"
                },
                {
                    "url": "https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2017-0019"
                }
            ]
        }
    },
    "impact": {
        "baseMetricV2": {
            "cvssV2": {
                "accessComplexity": "MEDIUM",
                "accessVector": "NETWORK",
                "authentication": "NONE",
                "availabilityImpact": "COMPLETE",
                "baseScore": 9.3,
                "confidentialityImpact": "COMPLETE",
                "integrityImpact": "COMPLETE",
                "vectorString": "(AV:N/AC:M/Au:N/C:C/I:C/A:C)"
            },
            "exploitabilityScore": 8.6,
            "impactScore": 10.0,
            "obtainAllPrivilege": false,
            "obtainOtherPrivilege": false,
            "obtainUserPrivilege": false,
            "severity": "HIGH",
            "userInteractionRequired": true
        },
        "baseMetricV3": {
            "cvssV3": {
                "attackComplexity": "LOW",
                "attackVector": "LOCAL",
                "availabilityImpact": "HIGH",
                "baseScore": 7.8,
                "baseSeverity": "HIGH",
                "confidentialityImpact": "HIGH",
                "integrityImpact": "HIGH",
                "privilegesRequired": "NONE",
                "scope": "UNCHANGED",
                "userInteraction": "REQUIRED",
                "vectorString": "AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H"
            },
            "exploitabilityScore": 1.8,
            "impactScore": 5.9
        }
    },
    "lastModifiedDate": "2017-07-12T01:29Z",
    "publishedDate": "2017-03-17T00:59Z"
}

Well, I am interested in formalized data about vulnerable software products (CPEs) and criticality description (CVSS).

Talking about vulnerable software products, you can see that this information exist both in “configurations” and “cve->affects”. Probably the “configurations” is more like detection criteria and “cve->affects” are the lists of vulnerable software.

Anyway, this is a good example why you can’t use this data for vulnerability detection in many cases. Let’s say cpe:/a:microsoft:word:2016 is vulnerable. Patched version of MS Word will also have this cpe id cpe:/a:microsoft:word:2016. It’s good to know what the software may be affected by the CVE, but for detection it simply won’t be enough.

But sometimes it’s pretty good. Like this one for Skype vulnerability:

[{u'operator': u'OR', u'cpe': [{u'cpe23Uri': u'cpe:2.3:a:microsoft:skype:7.2:*:*:*:*:*:*:*', u'cpeMatchString': u'cpe:/a:microsoft:skype:7.2', u'vulnerable': True}, {u'cpe23Uri': u'cpe:2.3:a:microsoft:skype:7.35:*:*:*:*:*:*:*', u'cpeMatchString': u'cpe:/a:microsoft:skype:7.35', u'vulnerable': True}, {u'cpe23Uri': u'cpe:2.3:a:microsoft:skype:7.36:*:*:*:*:*:*:*', u'cpeMatchString': u'cpe:/a:microsoft:skype:7.36', u'vulnerable': True}]}]

Let’s see how many CVEs have information about products (filtering CVEs with “** REJECT **” in description)

CVEs in NVD with and without CPE data
year,with_cpe,without_cpe
2002,6540,127
2003,1496,3
2004,2632,10
2005,4613,1
2006,6983,0
2007,6442,0
2008,6988,0
2009,4858,0
2010,4928,2
2011,4382,1
2012,5134,1
2013,5616,1
2014,7649,7
2015,7026,59
2016,8027,4
2017,7315,191

As you can see, there are not CPEs for some very old vulnerabilities and those that are currently in work.

RECEIVED not processed CVEs

But for the majority of vulnerabilities CPE data is somehow presented.

We can also see the situation with CVSS. How many CVEs have only CVSS v2, both CVSS v2 and v3, no CVSS data at all. Looking on “baseMetricV2” and “baseMetricV3” in item['impact']:

CVEs in NVD with CVSS v2 and v3
year, CVSS v2, CVSS v2 and v3, no CVSS
2002,6665,2,0
2003,1498,1,0
2004,2641,1,0
2005,4613,1,0
2006,6981,2,0
2007,6436,6,0
2008,6986,2,0
2009,4853,5,0
2010,4913,15,2
2011,4370,12,1
2012,5110,24,1
2013,5575,42,0
2014,7275,374,7
2015,5434,1592,59
2016,206,7821,4
2017,0,7315,191

As you can see, switching to CVSS v3 goes well.

It may be also interesting to have a look on site references in CVE items and to classify these sites. But I will probably do it next time.

12 thoughts on “Downloading and analyzing NVD CVE feed

  1. Pingback: CWEs in NVD CVE feed: analysis and complaints | Alexander V. Leonov

  2. Mike

    Is there a possibility to convert json to a csv or xlsx file with a format like cve mitre?:
    -> Name, ID, Description,…,
    -> Name,ID, Description,…

    Reply
  3. Pingback: What’s wrong with patch-based Vulnerability Management checks? | Alexander V. Leonov

  4. rinku

    hello the content above is very good, but how can i use it for scanning source code, if i have zipped source code file in my local system and i want to check how vulnerable the source code is.
    let’s say i downloaded source code file form github of any opensource software such as notepad++, then how is can scan the folders and show the result to me.

    Plz help thanks in advance.

    Reply
  5. Ramansh

    Hi, when you said that “Probably the “configurations” is more like detection criteria and “cve->affects” are the lists of vulnerable software.”, did you deduce this information or is it an official statement by NIST or NVD guys?

    Reply
    1. Michele

      I mean, like ID, CVE-2019-xxxxx,
      description, the description,

      Or something showing key/value?

      I know this is an old post, but hopefully you’re still looking.

      Thanks!

      Reply
      1. abha

        print(cve_dict[‘CVE_Items’][0][‘cve’][‘description’][“description_data”][0][‘value’])
        print(cve_dict[‘CVE_Items’][0][‘cve’][‘CVE_data_meta’][“ID”])

        Reply
  6. Pingback: Linux Kernel CVE Data Analysis (updated) – TuxCare

  7. Pingback: Vulchain Scanner: 5 basic principles | Alexander V. Leonov

  8. Pingback: Linux Kernel CVE Data Analysis (Updated) | TuxCare.com

  9. Pingback: Linux Kernel CVE Data Analysis (Updated)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.