Merge ~alexmurray/ubuntu-cve-tracker:mistriage-as-cve-triage into ubuntu-cve-tracker:master
- Git
- lp:~alexmurray/ubuntu-cve-tracker
- mistriage-as-cve-triage
- Merge into master
Status: | Merged |
---|---|
Merge reported by: | Alex Murray |
Merged at revision: | ee53fabe0ed758e347b9f503f4b0a9adf287411b |
Proposed branch: | ~alexmurray/ubuntu-cve-tracker:mistriage-as-cve-triage |
Merge into: | ubuntu-cve-tracker:master |
Diff against target: |
255 lines (+104/-18) 2 files modified
scripts/check-cves (+93/-17) scripts/process_cves (+11/-1) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Alex Murray | Approve | ||
Review via email: mp+387806@code.launchpad.net |
Commit message
Description of the change
Mike Salvatore (mikesalvatore) wrote : | # |
Seth Arnold (seth-arnold) wrote : | # |
I'm a bit concerned about the runtime of the thing, since the standalone
script takes a while on my system:
$ time ./scripts/
Loading /home/sarnold/
CVE-2020-8034 ['php-horde-
[.. many lines snipped ..]
CVE-2002-0392 ['apache2'] # apache2, fixed pre-Ubuntu
CVE-2002-0391 ['acm', 'glibc', 'dietlibc', 'krb5', 'openafs'] # krb5, fixed pre-Ubuntu
real 8m45.907s
user 4m29.767s
sys 4m15.820s
Is the probabilistic cve selection (neat idea) going to do a bunch of work
and then discard some of it?
Thanks
Alex Murray (alexmurray) wrote : | # |
Most of the runtime of the standalone script is in loading the Debian CVEs - since we already do that for check-cves anyway that shouldn't be a problem. The probabalistic calculations are pretty simple - doing a quick benchmark to run that code 100000 times and using a year of 2004 (which means the test will be almost always False) shows it takes 0.06s - so even if we are looking at 100000 CVEs it will only add less than a tenth of a second - so I don't think that should be a problem either :)
#!/usr/bin/env python3
import datetime
import math
import random
import timeit
def benchmark():
i = 0
while i < 100000:
year = 2004
now = datetime.
prob = 1.0 / math.pow(2, now - year)
rand = random.random()
if rand > prob:
pass
i = i + 1
print(timeit.
- 15ae1b6... by Alex Murray
-
scripts/check-cves: Clarify --mistriaged docstring
- d3d003b... by Alex Murray
-
scripts/check-cves: Collect all mistriaged CVEs and whittle down
This cleans up the logic to address feedback from mikesalvatore in
https://code.launchpad .net/~alexmurra y/ubuntu- cve-tracker/ +git/ubuntu- cve-tracker- 1/+merge/ 387806 - 186021d... by Alex Murray
-
scripts/check-cves: Minor cleanups to doc string placement and style
Alex Murray (alexmurray) wrote : | # |
See updated commits.
- ee53fab... by Alex Murray
-
scripts/check-cves: Fix up comment to match code
Mike Salvatore (mikesalvatore) wrote : | # |
I like the probability idea. There is the potential, however, that it causes the script to behave differently than inspected. Consider the contrived case where there are 0 2020 mistriaged CVEs, 10 2019 mistriaged CVEs, and no earlier mistriaged CVEs. If I pass `--mistriaged 10` to the script, roughly 5 mistriaged CVEs will be presented to me since they will be selected with probability 1/2. Even though there are 10 mistriaged CVEs and I asked for 10, I only get to evaluate 5.
I don't think this should block the merge; it's just an edge case to consider.
Alex Murray (alexmurray) wrote : | # |
Mike I agree that this is something that should be done to improve this later down the track. Since there have been no other objections, merging this as is. Will add improvements in further merge requests.
Preview Diff
1 | diff --git a/scripts/check-cves b/scripts/check-cves |
2 | index 5855daf..4696327 100755 |
3 | --- a/scripts/check-cves |
4 | +++ b/scripts/check-cves |
5 | @@ -17,9 +17,11 @@ |
6 | |
7 | import datetime |
8 | import json |
9 | +import math |
10 | import optparse |
11 | import os |
12 | import os.path |
13 | +import random |
14 | import re |
15 | import shutil |
16 | import subprocess |
17 | @@ -55,6 +57,9 @@ parser.add_option("--import-missing-debian", help="Process missing Debian CVEs", |
18 | parser.add_option("--debug", help="Report debugging information", action="store_true") |
19 | parser.add_option("--cve", help="Check only the listed comma-separated CVEs and ignore others", action="store", |
20 | default="") |
21 | +parser.add_option("--mistriaged", help="Process the specified number of possible mistriaged CVEs compared to Debian\n" |
22 | + "Implies --import-missing-debian", |
23 | + action="store", type=int, default=0) |
24 | (opt, args) = parser.parse_args() |
25 | |
26 | experimental = False |
27 | @@ -71,6 +76,8 @@ destdir = "." |
28 | # Skip stuff older than 2005 |
29 | cve_limit = 2004 |
30 | |
31 | +mistriaged_hint = 'Previously triaged as ignored in Ubuntu\n\n' |
32 | + |
33 | ignore_strings = [ |
34 | "** REJECT **", "Internet Explorer", "Microsoft Edge", "Windows 98", |
35 | "Windows 2000", "Windows XP", "Windows Server 2003", "Windows NT", |
36 | @@ -323,6 +330,24 @@ def import_debian(handler): |
37 | today = datetime.date.today() |
38 | known = set(CVEKnownList + CVEIgnoreList) |
39 | |
40 | + def mistriaged(cve): |
41 | + if cve in CVEIgnoreNotForUsList and \ |
42 | + cve not in CVEIgnoreMistriagedList and \ |
43 | + handler.debian[cve]['state'] == 'FOUND': |
44 | + # check that at least one of the assigned packages exist |
45 | + # in Ubuntu |
46 | + pkgs = {} |
47 | + for pkg in handler.debian[cve]['pkgs'].keys(): |
48 | + # ignore package if debian says is not affected |
49 | + if handler.debian[cve]['pkgs'][pkg]['state'] == '<not-affected>': |
50 | + continue |
51 | + answer = source_map.madison(source, pkg) |
52 | + if len(answer) > 0: |
53 | + pkgs[pkg] = answer |
54 | + if len(pkgs) > 0: |
55 | + return True |
56 | + return False |
57 | + |
58 | # pull in CVEs from data/DSA/list |
59 | dsas = cve_lib.load_debian_dsas(cve_lib.config['secure_testing_path'] + '/data/DSA/list', opt.verbose) |
60 | for dsa in dsas: |
61 | @@ -336,9 +361,14 @@ def import_debian(handler): |
62 | if year < cve_limit: |
63 | continue |
64 | |
65 | - # If we already know about the CVE, skip it |
66 | + # If we already know about the CVE, skip it unless is |
67 | + # mistriaged |
68 | if cve in known: |
69 | - continue |
70 | + if mistriaged(cve): |
71 | + # add a note about how this was originally classified |
72 | + dsas[dsa]['desc'] = mistriaged_hint + dsas[dsa]['desc'] |
73 | + else: |
74 | + continue |
75 | |
76 | cves[cve] = dict() |
77 | cves[cve]['subject'] = escape(dsas[dsa]['desc']) |
78 | @@ -368,11 +398,15 @@ def import_debian(handler): |
79 | print("Skipping %s, year %d predates %d" % (cve, year, cve_limit), file=sys.stderr) |
80 | continue |
81 | |
82 | - # If we already know about the CVE, skip it |
83 | + # If we already know about the CVE, skip it unless is mistriaged |
84 | if cve in known: |
85 | - if opt.verbose: |
86 | - print("Skipping %s, already known" % cve, file=sys.stderr) |
87 | - continue |
88 | + if mistriaged(cve): |
89 | + # add a note about how this was originally classified |
90 | + handler.debian[cve]['desc'] = mistriaged_hint + handler.debian[cve]['desc'] |
91 | + else: |
92 | + if opt.verbose: |
93 | + print("Skipping %s, already known" % cve, file=sys.stderr) |
94 | + continue |
95 | |
96 | if handler.debian[cve]['desc'] or handler.debian[cve]['state'] == 'FOUND': |
97 | cves[cve] = dict() |
98 | @@ -760,8 +794,8 @@ class CVEHandler(xml.sax.handler.ContentHandler): |
99 | self.curr_refs += [(self.curr_source, self.curr_chars, self.curr_url)] |
100 | |
101 | def handle_cve(self): |
102 | - # Skip CVEs we know about already |
103 | - if self.curr_cve in self.cve_ignore: |
104 | + # Skip CVEs we know about already unless this is a mistriaged CVE |
105 | + if self.curr_cve in self.cve_ignore and mistriaged_hint not in self.curr_desc: |
106 | return |
107 | |
108 | limit = cve_limit |
109 | @@ -1261,6 +1295,13 @@ class CVEHandler(xml.sax.handler.ContentHandler): |
110 | print('%s %s %s\n%s' % (action, cve, data, desc)) |
111 | |
112 | def add_cve(self, cve, packages, priority=None): |
113 | + # remove from not-for-us.txt if adding and ensure we remove any |
114 | + # mistriaged_hint from the description |
115 | + if cve in CVEIgnoreNotForUsList: |
116 | + cmd = ['sed', '-i', '/^%s #.*$/d' % cve, './ignored/not-for-us.txt'] |
117 | + subprocess.call(cmd) |
118 | + self.cve_data[cve]['desc'] = self.cve_data[cve]['desc'].replace(mistriaged_hint, '') |
119 | + |
120 | # Build up list of reference urls |
121 | ref_urls = [] |
122 | if self.debian and \ |
123 | @@ -1365,8 +1406,10 @@ class CVEHandler(xml.sax.handler.ContentHandler): |
124 | self.num_added += 1 |
125 | |
126 | def ignore_cve(self, cve, reason): |
127 | - # Append to ignore list |
128 | - with open('%s/ignored/not-for-us.txt' % (destdir), 'a') as f: |
129 | + # Append to ignore list unless is already in CVEIgnoreList and then |
130 | + # append to the ignored/ignore-mistriaged.txt |
131 | + txtfile = 'ignore-mistriaged.txt' if cve in CVEIgnoreNotForUsList else 'not-for-us.txt' |
132 | + with open('%s/ignored/%s' % (destdir, txtfile), 'a') as f: |
133 | f.write('%s # %s\n' % (cve, reason)) |
134 | |
135 | self.num_ignored += 1 |
136 | @@ -1459,19 +1502,32 @@ class CheckCVETest(unittest.TestCase): |
137 | self.assertEqual("Exquisite Ultimate Newspaper theme for WordPress", h.get_ignore_suggestion('''The Exquisite Ultimate Newspaper theme 1.3.3 for WordPress has XSS via the anchor identifier to assets/js/jquery.foundation.plugins.js.''')) |
138 | |
139 | |
140 | +ignored_notforus_path = 'ignored/not-for-us.txt' |
141 | +if destdir != './' and destdir != '.': |
142 | + ignored_notforus_path = os.path.join(destdir, ignored_notforus_path) |
143 | +# CVEIgnoreNotForUsList is a list of all CVEs that we have previously |
144 | +# chosen to ignore since they don't apply to software in Ubuntu |
145 | +CVEIgnoreNotForUsList = cve_lib.parse_CVEs_from_uri(ignored_notforus_path) |
146 | + |
147 | +ignored_mistriaged_path = 'ignored/ignore-mistriaged.txt' |
148 | +if destdir != './' and destdir != '.': |
149 | + ignored_mistriaged_path = os.path.join(destdir, ignored_mistriaged_path) |
150 | +# CVEIgnoreMistriagedList is a list of all CVEs that we want to definitely |
151 | +# ignore when doing mistriaged CVE detection - they should exist in both |
152 | +# CVEIgnoreNotForUsList and CVEIgnoreMistriagedList |
153 | +CVEIgnoreMistriagedList = cve_lib.parse_CVEs_from_uri(ignored_mistriaged_path) |
154 | + |
155 | # CVEIgnoreList is a list of all CVEs we know about already. These will be |
156 | # ignored when checking MITRE for new CVEs |
157 | -ignored_path = 'ignored/not-for-us.txt' |
158 | -if destdir != './' and destdir != '.': |
159 | - ignored_path = os.path.join(destdir, ignored_path) |
160 | -CVEIgnoreList = cve_lib.parse_CVEs_from_uri(ignored_path) |
161 | +CVEIgnoreList = list(CVEIgnoreNotForUsList) |
162 | + |
163 | CVEKnownList = [] |
164 | CVEKnownList += [cve for cve in os.listdir(destdir + "/ignored/") if cve.startswith('CVE-')] |
165 | CVEKnownList += [cve for cve in os.listdir(destdir + "/retired/") if cve.startswith('CVE-')] |
166 | (ActiveList, EmbargoList) = cve_lib.get_cve_list() |
167 | CVEKnownList += [cve for cve in ActiveList if cve not in EmbargoList] |
168 | |
169 | -if not opt.refresh: |
170 | +if not opt.refresh and not opt.mistriaged: |
171 | CVEIgnoreList += CVEKnownList |
172 | |
173 | if opt.known: |
174 | @@ -1520,7 +1576,7 @@ if opt.rhel8oval: |
175 | args.append(untriaged_json) |
176 | |
177 | debian_import_json = "" |
178 | -if opt.import_missing_debian and handler.debian is not None: |
179 | +if (opt.import_missing_debian or opt.mistriaged) and handler.debian is not None: |
180 | debian_import_json = import_debian(handler) |
181 | args.append(debian_import_json) |
182 | |
183 | @@ -1630,6 +1686,26 @@ for cve in new_cves: |
184 | if len(specific_cves) > 0 and cve not in specific_cves: |
185 | # ignore this cve |
186 | continue |
187 | + # if this got marked as mistriaged, probablistically choose it for |
188 | + # processing |
189 | + if mistriaged_hint in handler.cve_data[cve]['desc']: |
190 | + if opt.mistriaged == 0: |
191 | + # ignore this one |
192 | + continue |
193 | + else: |
194 | + # choose CVEs at random but pick most recent first - if year is |
195 | + # now then we want to pick it, if year is 1 year ago we want to |
196 | + # pick half as often, if year is 2 years ago we want to pick |
197 | + # 1/4 as often etc - so do 1/2^(diff) |
198 | + year = int(re.split('-', cve)[1]) |
199 | + now = datetime.datetime.utcnow().year |
200 | + prob = 1.0 / math.pow(2, now - year) |
201 | + rand = random.random() |
202 | + if rand > prob: |
203 | + continue |
204 | + # selected! |
205 | + opt.mistriaged = opt.mistriaged - 1 |
206 | + |
207 | count += 1 |
208 | |
209 | if opt.report: |
210 | @@ -1667,7 +1743,7 @@ if experimental: |
211 | if total > 0: |
212 | _spawn_editor(fout.name) |
213 | else: |
214 | - prompt_user('Not spawning editor when as no CVEs to process') |
215 | + prompt_user('Not spawning editor as no CVEs to process') |
216 | fout.seek(0) |
217 | try: |
218 | handler.process_command_file(fout, preprocess=True) |
219 | diff --git a/scripts/process_cves b/scripts/process_cves |
220 | index 5bcb619..b4db9bf 100755 |
221 | --- a/scripts/process_cves |
222 | +++ b/scripts/process_cves |
223 | @@ -126,6 +126,11 @@ process_missing_debian() { |
224 | ./scripts/check-cves --import-missing-debian |
225 | } |
226 | |
227 | +mistriage() { |
228 | + echo "check-cves --mistriage 10" |
229 | + ./scripts/check-cves --mistriage 10 |
230 | +} |
231 | + |
232 | download_missing_rhsa() { |
233 | tmpdir="$1" |
234 | archive="$2.txt" |
235 | @@ -236,7 +241,8 @@ case "$action" in |
236 | echo "Skipping merge from kernel team" |
237 | echo "check-cves nvdcve-*.json" |
238 | ./scripts/check-cves nvdcve-*.json |
239 | - process_missing_debian |
240 | + # this will also do missing debian |
241 | + mistriage |
242 | |
243 | check_syntax |
244 | nag_about_mailing_lists |
245 | @@ -270,6 +276,10 @@ case "$action" in |
246 | update_debian |
247 | process_missing_debian |
248 | ;; |
249 | + mistriage) |
250 | + update_debian |
251 | + mistriage |
252 | + ;; |
253 | update) |
254 | # Do no triage |
255 | update_uct |
The ignore- mistriaged. txt list gives me pause. I'd like to have some kind of process/ policy/ strategy in place to try to avoid needing to do a mis-mistriaged feature in 5 years. I think it's the most reasonable solution, but we need to avoid becoming trigger-happy with the "ignore" button when processing mistriaged CVEs.