Merge ~eslerm/ubuntu-cve-tracker:nvd-api-client into ubuntu-cve-tracker:master

Proposed by Mark Esler
Status: Merged
Merged at revision: 6ac40f852bfe5131090cacc156ce6af015c2be35
Proposed branch: ~eslerm/ubuntu-cve-tracker:nvd-api-client
Merge into: ubuntu-cve-tracker:master
Diff against target: 417 lines (+411/-0)
1 file modified
scripts/nvd_api_client.py (+411/-0)
Reviewer Review Type Date Requested Status
Seth Arnold Approve
Steve Beattie Pending
Alex Murray Pending
Review via email: mp+448538@code.launchpad.net

Commit message

nvd-api-client init

To post a comment you must log in.
Revision history for this message
Alex Murray (alexmurray) wrote :

A few high level comments (I haven't yet actually run the code but will try that soon)

1. You should be able to use the configobj package to parse the configuration file rather than hand-parsing this (see cve_lib.py for some historical code for this)

2. Would it be possible to make the script as automagic as possible? ie. When it is run, it goes and looks for existing json and if that doesn't exist, then it does --init automatically. But if it does exist, then instead it uses the timestamp of that json to infer the --since date? You can keep both the --init and --since parameters as I can imagine they may be useful, but in general when we can infer and do-the-right-thing I think we should.

3. It might be useful to show a progressbar or similar AND perhaps show some indication when sleeping, since the currently implementation looks like it will sleep for 6 seconds each request which will take a long time - it would be good to give the user some kind of indication how long this is expected to take or to atleast show some kind of progress along the way so they don't think the script has hung - see the use of progressbar in sis-changes or check-cves for inspiration

Revision history for this message
Mark Esler (eslerm) wrote :

Thank you Alex!

1. Can do.

Could we add `[DEFAULT]` to the top of our teams ~/.ubuntu-cve-tracker.conf environment?

Then the config becomes a valid INI file for the Python builtin configparse: https://wiki.python.org/moin/ConfigParserExamples

2. automagic --since is doable, but I have a concern about --init

The maintenance function searches a time span of modified CVE records. NVD adds metrics after CVE record creation, so --since needs to be set to the most recent lastModified value from the local dataset. Just finding the most recent locally modified file is good enough, even if there is an older lastModified in the local dataset the overlap between them is small.

Their API doesn't document this, but searching a lastModified data of >6 months 404s. I should handle that.

(note that theses API searches download unpublished CVEs. This is acceptable/desired, since the CVE List is the primary source of CVE data which should drive triage. NVD data is supplemental to CVE List data.)

With an automated --init, a misconfigured path could cause 1.3G of API strain to NVD. I added a prompt in the init function to slow down users. Is it okay to keep that? Item X might change the context of --init.

3. That would be helpful :D

I reworked --debug and uncommitted changes look like:

```
./scripts/nvd_api_client.py --since 2023-07-01 --debug
DEBUG: searching for modified NVD CVEs between 2023-07-01T00:00:00.000001%2B00:00 and 2023-08-07T22:01:32.263899%2B00:00
DEBUG: local NVD mirror path is "/home/eslerm/mirrors/nvd"
DEBUG: saved results 0 through 2000 of 4834
DEBUG: saved results 2000 through 4000 of 4834
DEBUG: saved results 4000 through 4834 of 4834
DEBUG: NVD sync complete \o/
```

How does that look?

X. Ideally we should maintain NVD data from a central source to prevent discrepancies. Seth suggested that we may want to explore Canonistack.

Y. I'm hoping this work will also benefit https://github.com/olbat/nvdcve/issues/7

Revision history for this message
Mark Esler (eslerm) wrote :

Alex, I've added automation and believe this is ready for review.

Revision history for this message
Alex Murray (alexmurray) wrote :

Thanks Mark - apologies for the delay in this review.

The only other thing that would be great to see is some tests - whilst we haven't traditionally had a lot test for our different internal tools, I think we should always aim to improve things going forward so would you be able to add some tests for this as well?

Revision history for this message
Mark Esler (eslerm) wrote :

Hi Alex, thanks for the review.

I can add tests. Test suggestions are very welcome.

Revision history for this message
Mark Esler (eslerm) wrote :

This now natively works with ~/.ubuntu-cve-tracker.conf

I did not want to implement reading this file like the rest of UCT, since that requires a non-built-in library. Ultimately, I believe we should use the INI standard so we can use configparser. (This could just be adding `[DEFAULT]` to the top of the config, And updating UCT...)

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Thanks for tackling this! It'd be nice to have some test cases where that makes sense, and I'm afraid that having two different configuration files for this will cause problems in the long run. I'd suggest trimming out the new ~/.config/nvd-api-client.conf path and just using the same file as all our other tools. If we ever switch to an ini-format we can move the file just to have a clean break between old and new.

Thanks

review: Approve
Revision history for this message
Steve Beattie (sbeattie) wrote :

On Tue, Oct 17, 2023 at 09:12:46PM -0000, Mark Esler wrote:
> This now natively works with ~/.ubuntu-cve-tracker.conf
>
> I did not want to implement reading this file like the rest of UCT, since that requires a non-built-in library. Ultimately, I believe we should use the INI standard so we can use configparser. (This could just be adding `[DEFAULT]` to the top of the config, And updating UCT...)

The reason to not use an INI style format is because shell scripts also
use this config file via '.' / 'source' (see e.g. packages-mirror).

We could fix that by having a python script that converts the [insert
flame war over config file format flame war here] config file to
something that shell scripts could evaluate dynamically; and in fact
it'd be nice to have a cve_lib.sh that shell scripts could source that
did this work for them and also accumulate common UCT shell
functionality.

--
Steve Beattie
<email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/scripts/nvd_api_client.py b/scripts/nvd_api_client.py
0new file mode 1007550new file mode 100755
index 0000000..493b3ad
--- /dev/null
+++ b/scripts/nvd_api_client.py
@@ -0,0 +1,411 @@
1#!/usr/bin/env python3
2
3"""
4nvd-api-client: download and maintain NVD's CVE dataset
5
6
7Configure path to local NVD mirror by creating an INI file located in
8~/.config/nvd-api-client.conf similar to:
9
10 [DEFAULT]
11 nvd_path=/home/eslerm/mirrors/nvd/
12
13Alternatively, the non-INI ~/.ubuntu-cve-tracker.conf can be used with the same
14key.
15
16Make sure to create this directory!
17
18
19nvd-api-client has three primary modes:
20
21 --init
22
23 To initialize the mirror by downloading NVD's CVE dataset, run:
24 ./scripts/nvd_api_client --init
25 and follow the prompt.
26
27 --maintain-since
28
29 To maintain your NVD CVE dataset mirror, run the following command with the
30 date set to the last time maintenance was ran:
31 ./scripts/nvd_api_client --maintain-since 2022-12-25
32 The above command will download all CVEs since December 25th 2022 UCT until
33 now.
34
35 ISO-8601 datetime is also allowed as maintenance input:
36 ./scripts/nvd_api_client --maintain-since 2023-08-01T00:00:00
37 ./scripts/nvd_api_client --maintain-since 2023-08-01T00:00:00.000001+00:00
38
39 The --maintain-since value must be within 120 days of today. (This is an
40 undocumented API restriction.)
41
42 --auto
43
44 To automatically maintain your dataset (without needing to know when
45 maintenance was last ran) run:
46 ./scripts/nvd_api_client --auto
47
48All modes accept --debug or --verbose which print information in stderr.
49 nb: use these options to monitor update progress
50"""
51
52
53__author__ = "Mark Esler"
54__copyright__ = "Copyright (C) 2023 Canonical Ltd."
55__license__ = "BSD-3-Clause"
56__version__ = "1.0"
57
58
59import argparse
60import configparser
61from datetime import datetime, timezone
62import json
63from pathlib import Path
64import sys
65import time
66from typing import Optional
67import requests
68
69
70# API Client Headers
71HEADERS = {"Accept-Language": "en-US", "User-Agent": "nvd-api-client"}
72
73
74# NVD_API_KEY not implemented
75NVD_API_KEY = None
76
77
78# seconds to wait after a request
79# maximally efficient timing isn't critical
80# NVD's public rate limit is 5 requests in a rolling 30 second window
81# public default based on 5 / 30 * 2 = 12, round down to 10 requests a minute
82# sleeping 6.0 seconds aligns with NVD's Best Practices
83if NVD_API_KEY:
84 # 50 requests in a rolling 30 second window
85 RATE_LIMIT = 0.60
86else:
87 RATE_LIMIT = 6.0
88
89
90# requests timeout
91TIMEOUT = 30.0
92
93
94def debug(msg: str) -> None:
95 """print to stderr"""
96 print("DEBUG: " + msg, file=sys.stderr)
97
98
99def find_conf() -> Path:
100 """find configuration file"""
101 for filename in [".ubuntu-cve-tracker.conf", ".config/nvd-api-client.conf"]:
102 path = Path.home() / filename
103 if path.is_file():
104 return path
105 raise ValueError(
106 """
107No configuration file.
108Create ~/.ubuntu-cve-tracker.conf or ~/.config/nvd-api-client.conf"""
109 )
110
111
112def load_path(conf: Path) -> Path:
113 """
114 read configuration file for path to local NVD mirror
115
116 UCT does not use an INI style configuration file. Code in this section
117 is a little messy to accomadate this. The rest of UCT requires a
118 non-built-in Python package, which this seeks to avoid.
119 """
120 config = configparser.ConfigParser()
121 try:
122 # nb: encoding is unset
123 with open(conf) as file:
124 try:
125 config.read_file(file)
126 try:
127 path = Path(config["DEFAULT"]["nvd_path"])
128 except KeyError as exc:
129 raise KeyError(
130 "nvd_path not defined in configuration file"
131 ) from exc
132 # this is for uct not using an INI file
133 except configparser.MissingSectionHeaderError:
134 for line in file:
135 if line.startswith("nvd_path="):
136 path = Path(line.split("=")[1][:-1])
137 except OSError as exc:
138 msg = f"error reading {conf}"
139 raise OSError(msg) from exc
140 if DEBUG:
141 debug(f"local NVD mirror path is {path}")
142 return path
143
144
145def verify_dirs() -> Path:
146 """create directory structure if needed and return local NVD mirror path"""
147 config_path = find_conf()
148 nvd_path = load_path(config_path)
149
150 nvd_path.mkdir(parents=True, exist_ok=True)
151
152 current_year = int(time.strftime("%Y", time.gmtime()))
153 for i in range(1999, current_year + 1):
154 Path(nvd_path / str(i)).mkdir(parents=True, exist_ok=True)
155
156 return nvd_path
157
158
159def get_url(url: str) -> requests.models.Response:
160 """
161 return a url response after sleeping
162
163 NOTE: could be modified for https://github.com/tomasbasham/ratelimit
164 """
165 if VERBOSE:
166 debug(f"requesting {url}")
167 response = requests.get(url, timeout=TIMEOUT, headers=HEADERS)
168
169 if response.status_code != 200:
170 msg = f"API response: {response.status_code}"
171 raise Exception(msg)
172
173 time.sleep(RATE_LIMIT)
174
175 return response
176
177
178def save_cve(page_json: dict, nvd_path: Path) -> None:
179 """save all json files from a page"""
180 for i in page_json["vulnerabilities"]:
181 cve = i["cve"]
182 year = cve["id"][4:8]
183 file_path = Path(f'{nvd_path / year / cve["id"]}.json')
184 if VERBOSE:
185 debug(f'saving {cve["id"]}')
186 with open(file_path, "w", encoding="utf-8") as file:
187 json.dump(cve, file)
188
189
190def save_pages(query: Optional[str] = None) -> None:
191 """
192 get all pages of CVE results and save them
193
194 see https://nvd.nist.gov/developers/vulnerabilities for parameters
195 """
196
197 nvd_path = verify_dirs()
198
199 base_url = "https://services.nvd.nist.gov/rest/json/cves/2.0"
200 start_index = 0
201 results_per_page = 2000
202 total_results = results_per_page + 1
203
204 while start_index < total_results:
205 if query:
206 url = (
207 f"{base_url}?{query}&"
208 + f"resultsPerPage={results_per_page}&startIndex={start_index}"
209 )
210 else:
211 url = (
212 f"{base_url}?"
213 + f"resultsPerPage={results_per_page}&startIndex={start_index}"
214 )
215
216 page = get_url(url)
217 page_json = page.json()
218 page.close()
219
220 save_cve(page_json, nvd_path)
221
222 total_results = page_json["totalResults"]
223
224 if DEBUG:
225 if total_results == 0:
226 debug("no new updates from NVD")
227 elif (start_index + results_per_page) >= total_results:
228 debug(
229 f"saved results {start_index} through {total_results}"
230 + f" of {total_results}"
231 )
232 else:
233 debug(
234 f"saved results {start_index} through {start_index + results_per_page}"
235 + f" of {total_results}"
236 )
237
238 start_index += results_per_page
239
240
241def nvd_init() -> None:
242 """
243 create initial NVD dataset
244
245 NVD's Best Practices for Initial Data Population state:
246 - Users should start by calling the API beginning with a startIndex of 0
247 - Iterative requests should increment the startIndex by the value of
248 resultsPerPage until the response's startIndex has exceeded the value
249 in totalResults
250 NVD text accessed Aug 1st 2023
251 - https://nvd.nist.gov/developers/start-here
252 """
253 res = input(
254 'Are you certain that you want to download all NVD data? Enter "Yes" to agree: '
255 )
256 if res == "Yes":
257 save_pages()
258
259
260def nvd_maintain(since: datetime) -> None:
261 """
262 maintain NVD dataset
263
264 set the since datetime to the time that NVD dataset was last maintained
265
266 it is not recommended to run this function more than once every two hours
267
268 large organizations should use a single requester
269
270 see https://nvd.nist.gov/developers/vulnerabilities for parameters
271
272 NVD's Best Practices for Maintaining Data state:
273 - After initial data population has occurred, the last modified date
274 parameters provide an efficient way to update a user's local
275 repository and stay within the API rate limits. No more than once
276 every two hours, automated requests should include a range where
277 lastModStartDate equals the time of the last CVE or CPE received and
278 lastModEndDate equals the current time.
279 - It is recommended that users "sleep" their scripts for six seconds
280 between requests.
281 - It is recommended to use the default resultsPerPage value as this value
282 has been optimized for the API response.
283 - Enterprise scale development should enforce these practices through a
284 single requestor to ensure all users are in sync and have the latest
285 CVE, Change History, CPE, and CPE match criteria information.
286 NVD text accessed Aug 1st 2023
287 - https://nvd.nist.gov/developers/start-here
288 """
289 start_date = since.isoformat()
290 end_date = datetime.now(timezone.utc).isoformat()
291
292 if DEBUG:
293 debug(f"searching for modified NVD CVEs between {start_date} and {end_date}")
294
295 query = f"lastModStartDate={start_date}&lastModEndDate={end_date}".replace(
296 "+", "%2B"
297 )
298
299 save_pages(query)
300
301
302def check_last_modified(last_modified: datetime) -> None:
303 """raise error if an unallowed lastModified date is requested"""
304 delta = last_modified - datetime.now(timezone.utc)
305 if delta.days < -120:
306 msg = "NVD API does not allow searching lastModified dates greater than 120 days ago"
307 raise argparse.ArgumentTypeError(msg)
308
309
310# https://stackoverflow.com/questions/25470844/specify-date-format-for-python-
311# argparse-input-arguments
312def format_date(date_str: str) -> datetime:
313 """
314 verify and format a date string into a datetime for NVD's API
315
316 always returns UTC
317
318 note that converting the datetime to a string requires .replace("+", "%2B")
319 before running get_url()
320 """
321 try:
322 # API requires microseconds
323 date = datetime.strptime(date_str, "%Y-%m-%d").replace(
324 tzinfo=timezone.utc, microsecond=1
325 )
326 except ValueError:
327 try:
328 date = datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc)
329 except ValueError as exc:
330 msg = f"not a valid date: {date_str}"
331 raise argparse.ArgumentTypeError(msg) from exc
332 return date
333
334
335def nvd_last_modified_file() -> datetime:
336 """
337 search local dataset for most recent lastModified value
338
339 inefficiency is fine if user does not know when maintenance was last ran
340 """
341 nvd_path = verify_dirs()
342 if DEBUG:
343 debug("searching NVD dataset for most recent lastModified value")
344 # compare strings instead of datetimes
345 last_modified_string = "0"
346 for path in nvd_path.rglob("*"):
347 if path.is_dir():
348 continue
349 try:
350 # nb: encoding is unset
351 with open(path) as file:
352 data = json.load(file)
353 except OSError as exc:
354 msg = f"error reading {path}"
355 raise OSError(msg) from exc
356 if data["lastModified"] > last_modified_string:
357 last_modified_string = data["lastModified"]
358 if DEBUG:
359 debug(f"most recent lastModified value is: {last_modified_string}")
360 last_modified = format_date(last_modified_string)
361 check_last_modified(last_modified)
362 return last_modified
363
364
365def nvd_auto() -> None:
366 """run nvd_maintain with most recent lastModified value in dataset"""
367 last_modified = nvd_last_modified_file()
368 check_last_modified(last_modified)
369 nvd_maintain(last_modified)
370
371
372if __name__ == "__main__":
373 parser = argparse.ArgumentParser(description="NVD API Client")
374 parser.add_argument(
375 "--init",
376 help="initialize mirror of NVD dataset",
377 action="store_true",
378 )
379 parser.add_argument(
380 "-s",
381 "--maintain-since",
382 help="maintain NVD dataset since YY-MM-DD or ISO-8601 datetime",
383 type=format_date,
384 )
385 parser.add_argument("--auto", help="automated maintenance", action="store_true")
386 parser.add_argument("--debug", help="add debug info", action="store_true")
387 parser.add_argument("--verbose", help="add verbose debug info", action="store_true")
388
389 args = parser.parse_args()
390
391 if args.verbose:
392 VERBOSE = True
393 DEBUG = True
394 elif args.debug:
395 VERBOSE = False
396 DEBUG = True
397 else:
398 VERBOSE = False
399 DEBUG = False
400
401 if args.init:
402 nvd_init()
403 elif args.auto:
404 nvd_auto()
405 elif args.maintain_since:
406 nvd_maintain(args.maintain_since)
407 else:
408 raise ValueError("an argument is needed, see --help")
409
410 if DEBUG:
411 debug("NVD sync complete \\o/")

Subscribers

People subscribed via source and target branches