Merge ~eslerm/ubuntu-cve-tracker:nvd-api-client into ubuntu-cve-tracker:master

Proposed by Mark Esler
Status: Merged
Merged at revision: 6ac40f852bfe5131090cacc156ce6af015c2be35
Proposed branch: ~eslerm/ubuntu-cve-tracker:nvd-api-client
Merge into: ubuntu-cve-tracker:master
Diff against target: 417 lines (+411/-0)
1 file modified
scripts/nvd_api_client.py (+411/-0)
Reviewer Review Type Date Requested Status
Seth Arnold Approve
Steve Beattie Pending
Alex Murray Pending
Review via email: mp+448538@code.launchpad.net

Commit message

nvd-api-client init

To post a comment you must log in.
Revision history for this message
Alex Murray (alexmurray) wrote :

A few high level comments (I haven't yet actually run the code but will try that soon)

1. You should be able to use the configobj package to parse the configuration file rather than hand-parsing this (see cve_lib.py for some historical code for this)

2. Would it be possible to make the script as automagic as possible? ie. When it is run, it goes and looks for existing json and if that doesn't exist, then it does --init automatically. But if it does exist, then instead it uses the timestamp of that json to infer the --since date? You can keep both the --init and --since parameters as I can imagine they may be useful, but in general when we can infer and do-the-right-thing I think we should.

3. It might be useful to show a progressbar or similar AND perhaps show some indication when sleeping, since the currently implementation looks like it will sleep for 6 seconds each request which will take a long time - it would be good to give the user some kind of indication how long this is expected to take or to atleast show some kind of progress along the way so they don't think the script has hung - see the use of progressbar in sis-changes or check-cves for inspiration

Revision history for this message
Mark Esler (eslerm) wrote :

Thank you Alex!

1. Can do.

Could we add `[DEFAULT]` to the top of our teams ~/.ubuntu-cve-tracker.conf environment?

Then the config becomes a valid INI file for the Python builtin configparse: https://wiki.python.org/moin/ConfigParserExamples

2. automagic --since is doable, but I have a concern about --init

The maintenance function searches a time span of modified CVE records. NVD adds metrics after CVE record creation, so --since needs to be set to the most recent lastModified value from the local dataset. Just finding the most recent locally modified file is good enough, even if there is an older lastModified in the local dataset the overlap between them is small.

Their API doesn't document this, but searching a lastModified data of >6 months 404s. I should handle that.

(note that theses API searches download unpublished CVEs. This is acceptable/desired, since the CVE List is the primary source of CVE data which should drive triage. NVD data is supplemental to CVE List data.)

With an automated --init, a misconfigured path could cause 1.3G of API strain to NVD. I added a prompt in the init function to slow down users. Is it okay to keep that? Item X might change the context of --init.

3. That would be helpful :D

I reworked --debug and uncommitted changes look like:

```
./scripts/nvd_api_client.py --since 2023-07-01 --debug
DEBUG: searching for modified NVD CVEs between 2023-07-01T00:00:00.000001%2B00:00 and 2023-08-07T22:01:32.263899%2B00:00
DEBUG: local NVD mirror path is "/home/eslerm/mirrors/nvd"
DEBUG: saved results 0 through 2000 of 4834
DEBUG: saved results 2000 through 4000 of 4834
DEBUG: saved results 4000 through 4834 of 4834
DEBUG: NVD sync complete \o/
```

How does that look?

X. Ideally we should maintain NVD data from a central source to prevent discrepancies. Seth suggested that we may want to explore Canonistack.

Y. I'm hoping this work will also benefit https://github.com/olbat/nvdcve/issues/7

Revision history for this message
Mark Esler (eslerm) wrote :

Alex, I've added automation and believe this is ready for review.

Revision history for this message
Alex Murray (alexmurray) wrote :

Thanks Mark - apologies for the delay in this review.

The only other thing that would be great to see is some tests - whilst we haven't traditionally had a lot test for our different internal tools, I think we should always aim to improve things going forward so would you be able to add some tests for this as well?

Revision history for this message
Mark Esler (eslerm) wrote :

Hi Alex, thanks for the review.

I can add tests. Test suggestions are very welcome.

Revision history for this message
Mark Esler (eslerm) wrote :

This now natively works with ~/.ubuntu-cve-tracker.conf

I did not want to implement reading this file like the rest of UCT, since that requires a non-built-in library. Ultimately, I believe we should use the INI standard so we can use configparser. (This could just be adding `[DEFAULT]` to the top of the config, And updating UCT...)

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Thanks for tackling this! It'd be nice to have some test cases where that makes sense, and I'm afraid that having two different configuration files for this will cause problems in the long run. I'd suggest trimming out the new ~/.config/nvd-api-client.conf path and just using the same file as all our other tools. If we ever switch to an ini-format we can move the file just to have a clean break between old and new.

Thanks

review: Approve
Revision history for this message
Steve Beattie (sbeattie) wrote :

On Tue, Oct 17, 2023 at 09:12:46PM -0000, Mark Esler wrote:
> This now natively works with ~/.ubuntu-cve-tracker.conf
>
> I did not want to implement reading this file like the rest of UCT, since that requires a non-built-in library. Ultimately, I believe we should use the INI standard so we can use configparser. (This could just be adding `[DEFAULT]` to the top of the config, And updating UCT...)

The reason to not use an INI style format is because shell scripts also
use this config file via '.' / 'source' (see e.g. packages-mirror).

We could fix that by having a python script that converts the [insert
flame war over config file format flame war here] config file to
something that shell scripts could evaluate dynamically; and in fact
it'd be nice to have a cve_lib.sh that shell scripts could source that
did this work for them and also accumulate common UCT shell
functionality.

--
Steve Beattie
<email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/scripts/nvd_api_client.py b/scripts/nvd_api_client.py
2new file mode 100755
3index 0000000..493b3ad
4--- /dev/null
5+++ b/scripts/nvd_api_client.py
6@@ -0,0 +1,411 @@
7+#!/usr/bin/env python3
8+
9+"""
10+nvd-api-client: download and maintain NVD's CVE dataset
11+
12+
13+Configure path to local NVD mirror by creating an INI file located in
14+~/.config/nvd-api-client.conf similar to:
15+
16+ [DEFAULT]
17+ nvd_path=/home/eslerm/mirrors/nvd/
18+
19+Alternatively, the non-INI ~/.ubuntu-cve-tracker.conf can be used with the same
20+key.
21+
22+Make sure to create this directory!
23+
24+
25+nvd-api-client has three primary modes:
26+
27+ --init
28+
29+ To initialize the mirror by downloading NVD's CVE dataset, run:
30+ ./scripts/nvd_api_client --init
31+ and follow the prompt.
32+
33+ --maintain-since
34+
35+ To maintain your NVD CVE dataset mirror, run the following command with the
36+ date set to the last time maintenance was ran:
37+ ./scripts/nvd_api_client --maintain-since 2022-12-25
38+ The above command will download all CVEs since December 25th 2022 UCT until
39+ now.
40+
41+ ISO-8601 datetime is also allowed as maintenance input:
42+ ./scripts/nvd_api_client --maintain-since 2023-08-01T00:00:00
43+ ./scripts/nvd_api_client --maintain-since 2023-08-01T00:00:00.000001+00:00
44+
45+ The --maintain-since value must be within 120 days of today. (This is an
46+ undocumented API restriction.)
47+
48+ --auto
49+
50+ To automatically maintain your dataset (without needing to know when
51+ maintenance was last ran) run:
52+ ./scripts/nvd_api_client --auto
53+
54+All modes accept --debug or --verbose which print information in stderr.
55+ nb: use these options to monitor update progress
56+"""
57+
58+
59+__author__ = "Mark Esler"
60+__copyright__ = "Copyright (C) 2023 Canonical Ltd."
61+__license__ = "BSD-3-Clause"
62+__version__ = "1.0"
63+
64+
65+import argparse
66+import configparser
67+from datetime import datetime, timezone
68+import json
69+from pathlib import Path
70+import sys
71+import time
72+from typing import Optional
73+import requests
74+
75+
76+# API Client Headers
77+HEADERS = {"Accept-Language": "en-US", "User-Agent": "nvd-api-client"}
78+
79+
80+# NVD_API_KEY not implemented
81+NVD_API_KEY = None
82+
83+
84+# seconds to wait after a request
85+# maximally efficient timing isn't critical
86+# NVD's public rate limit is 5 requests in a rolling 30 second window
87+# public default based on 5 / 30 * 2 = 12, round down to 10 requests a minute
88+# sleeping 6.0 seconds aligns with NVD's Best Practices
89+if NVD_API_KEY:
90+ # 50 requests in a rolling 30 second window
91+ RATE_LIMIT = 0.60
92+else:
93+ RATE_LIMIT = 6.0
94+
95+
96+# requests timeout
97+TIMEOUT = 30.0
98+
99+
100+def debug(msg: str) -> None:
101+ """print to stderr"""
102+ print("DEBUG: " + msg, file=sys.stderr)
103+
104+
105+def find_conf() -> Path:
106+ """find configuration file"""
107+ for filename in [".ubuntu-cve-tracker.conf", ".config/nvd-api-client.conf"]:
108+ path = Path.home() / filename
109+ if path.is_file():
110+ return path
111+ raise ValueError(
112+ """
113+No configuration file.
114+Create ~/.ubuntu-cve-tracker.conf or ~/.config/nvd-api-client.conf"""
115+ )
116+
117+
118+def load_path(conf: Path) -> Path:
119+ """
120+ read configuration file for path to local NVD mirror
121+
122+ UCT does not use an INI style configuration file. Code in this section
123+ is a little messy to accomadate this. The rest of UCT requires a
124+ non-built-in Python package, which this seeks to avoid.
125+ """
126+ config = configparser.ConfigParser()
127+ try:
128+ # nb: encoding is unset
129+ with open(conf) as file:
130+ try:
131+ config.read_file(file)
132+ try:
133+ path = Path(config["DEFAULT"]["nvd_path"])
134+ except KeyError as exc:
135+ raise KeyError(
136+ "nvd_path not defined in configuration file"
137+ ) from exc
138+ # this is for uct not using an INI file
139+ except configparser.MissingSectionHeaderError:
140+ for line in file:
141+ if line.startswith("nvd_path="):
142+ path = Path(line.split("=")[1][:-1])
143+ except OSError as exc:
144+ msg = f"error reading {conf}"
145+ raise OSError(msg) from exc
146+ if DEBUG:
147+ debug(f"local NVD mirror path is {path}")
148+ return path
149+
150+
151+def verify_dirs() -> Path:
152+ """create directory structure if needed and return local NVD mirror path"""
153+ config_path = find_conf()
154+ nvd_path = load_path(config_path)
155+
156+ nvd_path.mkdir(parents=True, exist_ok=True)
157+
158+ current_year = int(time.strftime("%Y", time.gmtime()))
159+ for i in range(1999, current_year + 1):
160+ Path(nvd_path / str(i)).mkdir(parents=True, exist_ok=True)
161+
162+ return nvd_path
163+
164+
165+def get_url(url: str) -> requests.models.Response:
166+ """
167+ return a url response after sleeping
168+
169+ NOTE: could be modified for https://github.com/tomasbasham/ratelimit
170+ """
171+ if VERBOSE:
172+ debug(f"requesting {url}")
173+ response = requests.get(url, timeout=TIMEOUT, headers=HEADERS)
174+
175+ if response.status_code != 200:
176+ msg = f"API response: {response.status_code}"
177+ raise Exception(msg)
178+
179+ time.sleep(RATE_LIMIT)
180+
181+ return response
182+
183+
184+def save_cve(page_json: dict, nvd_path: Path) -> None:
185+ """save all json files from a page"""
186+ for i in page_json["vulnerabilities"]:
187+ cve = i["cve"]
188+ year = cve["id"][4:8]
189+ file_path = Path(f'{nvd_path / year / cve["id"]}.json')
190+ if VERBOSE:
191+ debug(f'saving {cve["id"]}')
192+ with open(file_path, "w", encoding="utf-8") as file:
193+ json.dump(cve, file)
194+
195+
196+def save_pages(query: Optional[str] = None) -> None:
197+ """
198+ get all pages of CVE results and save them
199+
200+ see https://nvd.nist.gov/developers/vulnerabilities for parameters
201+ """
202+
203+ nvd_path = verify_dirs()
204+
205+ base_url = "https://services.nvd.nist.gov/rest/json/cves/2.0"
206+ start_index = 0
207+ results_per_page = 2000
208+ total_results = results_per_page + 1
209+
210+ while start_index < total_results:
211+ if query:
212+ url = (
213+ f"{base_url}?{query}&"
214+ + f"resultsPerPage={results_per_page}&startIndex={start_index}"
215+ )
216+ else:
217+ url = (
218+ f"{base_url}?"
219+ + f"resultsPerPage={results_per_page}&startIndex={start_index}"
220+ )
221+
222+ page = get_url(url)
223+ page_json = page.json()
224+ page.close()
225+
226+ save_cve(page_json, nvd_path)
227+
228+ total_results = page_json["totalResults"]
229+
230+ if DEBUG:
231+ if total_results == 0:
232+ debug("no new updates from NVD")
233+ elif (start_index + results_per_page) >= total_results:
234+ debug(
235+ f"saved results {start_index} through {total_results}"
236+ + f" of {total_results}"
237+ )
238+ else:
239+ debug(
240+ f"saved results {start_index} through {start_index + results_per_page}"
241+ + f" of {total_results}"
242+ )
243+
244+ start_index += results_per_page
245+
246+
247+def nvd_init() -> None:
248+ """
249+ create initial NVD dataset
250+
251+ NVD's Best Practices for Initial Data Population state:
252+ - Users should start by calling the API beginning with a startIndex of 0
253+ - Iterative requests should increment the startIndex by the value of
254+ resultsPerPage until the response's startIndex has exceeded the value
255+ in totalResults
256+ NVD text accessed Aug 1st 2023
257+ - https://nvd.nist.gov/developers/start-here
258+ """
259+ res = input(
260+ 'Are you certain that you want to download all NVD data? Enter "Yes" to agree: '
261+ )
262+ if res == "Yes":
263+ save_pages()
264+
265+
266+def nvd_maintain(since: datetime) -> None:
267+ """
268+ maintain NVD dataset
269+
270+ set the since datetime to the time that NVD dataset was last maintained
271+
272+ it is not recommended to run this function more than once every two hours
273+
274+ large organizations should use a single requester
275+
276+ see https://nvd.nist.gov/developers/vulnerabilities for parameters
277+
278+ NVD's Best Practices for Maintaining Data state:
279+ - After initial data population has occurred, the last modified date
280+ parameters provide an efficient way to update a user's local
281+ repository and stay within the API rate limits. No more than once
282+ every two hours, automated requests should include a range where
283+ lastModStartDate equals the time of the last CVE or CPE received and
284+ lastModEndDate equals the current time.
285+ - It is recommended that users "sleep" their scripts for six seconds
286+ between requests.
287+ - It is recommended to use the default resultsPerPage value as this value
288+ has been optimized for the API response.
289+ - Enterprise scale development should enforce these practices through a
290+ single requestor to ensure all users are in sync and have the latest
291+ CVE, Change History, CPE, and CPE match criteria information.
292+ NVD text accessed Aug 1st 2023
293+ - https://nvd.nist.gov/developers/start-here
294+ """
295+ start_date = since.isoformat()
296+ end_date = datetime.now(timezone.utc).isoformat()
297+
298+ if DEBUG:
299+ debug(f"searching for modified NVD CVEs between {start_date} and {end_date}")
300+
301+ query = f"lastModStartDate={start_date}&lastModEndDate={end_date}".replace(
302+ "+", "%2B"
303+ )
304+
305+ save_pages(query)
306+
307+
308+def check_last_modified(last_modified: datetime) -> None:
309+ """raise error if an unallowed lastModified date is requested"""
310+ delta = last_modified - datetime.now(timezone.utc)
311+ if delta.days < -120:
312+ msg = "NVD API does not allow searching lastModified dates greater than 120 days ago"
313+ raise argparse.ArgumentTypeError(msg)
314+
315+
316+# https://stackoverflow.com/questions/25470844/specify-date-format-for-python-
317+# argparse-input-arguments
318+def format_date(date_str: str) -> datetime:
319+ """
320+ verify and format a date string into a datetime for NVD's API
321+
322+ always returns UTC
323+
324+ note that converting the datetime to a string requires .replace("+", "%2B")
325+ before running get_url()
326+ """
327+ try:
328+ # API requires microseconds
329+ date = datetime.strptime(date_str, "%Y-%m-%d").replace(
330+ tzinfo=timezone.utc, microsecond=1
331+ )
332+ except ValueError:
333+ try:
334+ date = datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc)
335+ except ValueError as exc:
336+ msg = f"not a valid date: {date_str}"
337+ raise argparse.ArgumentTypeError(msg) from exc
338+ return date
339+
340+
341+def nvd_last_modified_file() -> datetime:
342+ """
343+ search local dataset for most recent lastModified value
344+
345+ inefficiency is fine if user does not know when maintenance was last ran
346+ """
347+ nvd_path = verify_dirs()
348+ if DEBUG:
349+ debug("searching NVD dataset for most recent lastModified value")
350+ # compare strings instead of datetimes
351+ last_modified_string = "0"
352+ for path in nvd_path.rglob("*"):
353+ if path.is_dir():
354+ continue
355+ try:
356+ # nb: encoding is unset
357+ with open(path) as file:
358+ data = json.load(file)
359+ except OSError as exc:
360+ msg = f"error reading {path}"
361+ raise OSError(msg) from exc
362+ if data["lastModified"] > last_modified_string:
363+ last_modified_string = data["lastModified"]
364+ if DEBUG:
365+ debug(f"most recent lastModified value is: {last_modified_string}")
366+ last_modified = format_date(last_modified_string)
367+ check_last_modified(last_modified)
368+ return last_modified
369+
370+
371+def nvd_auto() -> None:
372+ """run nvd_maintain with most recent lastModified value in dataset"""
373+ last_modified = nvd_last_modified_file()
374+ check_last_modified(last_modified)
375+ nvd_maintain(last_modified)
376+
377+
378+if __name__ == "__main__":
379+ parser = argparse.ArgumentParser(description="NVD API Client")
380+ parser.add_argument(
381+ "--init",
382+ help="initialize mirror of NVD dataset",
383+ action="store_true",
384+ )
385+ parser.add_argument(
386+ "-s",
387+ "--maintain-since",
388+ help="maintain NVD dataset since YY-MM-DD or ISO-8601 datetime",
389+ type=format_date,
390+ )
391+ parser.add_argument("--auto", help="automated maintenance", action="store_true")
392+ parser.add_argument("--debug", help="add debug info", action="store_true")
393+ parser.add_argument("--verbose", help="add verbose debug info", action="store_true")
394+
395+ args = parser.parse_args()
396+
397+ if args.verbose:
398+ VERBOSE = True
399+ DEBUG = True
400+ elif args.debug:
401+ VERBOSE = False
402+ DEBUG = True
403+ else:
404+ VERBOSE = False
405+ DEBUG = False
406+
407+ if args.init:
408+ nvd_init()
409+ elif args.auto:
410+ nvd_auto()
411+ elif args.maintain_since:
412+ nvd_maintain(args.maintain_since)
413+ else:
414+ raise ValueError("an argument is needed, see --help")
415+
416+ if DEBUG:
417+ debug("NVD sync complete \\o/")

Subscribers

People subscribed via source and target branches