Merge ~alexmurray/ubuntu-cve-tracker:mistriage-as-cve-triage into ubuntu-cve-tracker:master

Proposed by Alex Murray
Status: Merged
Merge reported by: Alex Murray
Merged at revision: ee53fabe0ed758e347b9f503f4b0a9adf287411b
Proposed branch: ~alexmurray/ubuntu-cve-tracker:mistriage-as-cve-triage
Merge into: ubuntu-cve-tracker:master
Diff against target: 255 lines (+104/-18)
2 files modified
scripts/check-cves (+93/-17)
scripts/process_cves (+11/-1)
Reviewer Review Type Date Requested Status
Alex Murray Approve
Review via email: mp+387806@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Mike Salvatore (mikesalvatore) wrote :

The ignore-mistriaged.txt list gives me pause. I'd like to have some kind of process/policy/strategy in place to try to avoid needing to do a mis-mistriaged feature in 5 years. I think it's the most reasonable solution, but we need to avoid becoming trigger-happy with the "ignore" button when processing mistriaged CVEs.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

I'm a bit concerned about the runtime of the thing, since the standalone
script takes a while on my system:

$ time ./scripts/report-mistriaged-cves.py
Loading /home/sarnold/Canonical/debian-secure-testing/data/CVE/list ...
CVE-2020-8034 ['php-horde-gollem'] # Horde Groupware Webmail
[.. many lines snipped ..]
CVE-2002-0392 ['apache2'] # apache2, fixed pre-Ubuntu
CVE-2002-0391 ['acm', 'glibc', 'dietlibc', 'krb5', 'openafs'] # krb5, fixed pre-Ubuntu

real 8m45.907s
user 4m29.767s
sys 4m15.820s

Is the probabilistic cve selection (neat idea) going to do a bunch of work
and then discard some of it?

Thanks

Revision history for this message
Alex Murray (alexmurray) wrote :

Most of the runtime of the standalone script is in loading the Debian CVEs - since we already do that for check-cves anyway that shouldn't be a problem. The probabalistic calculations are pretty simple - doing a quick benchmark to run that code 100000 times and using a year of 2004 (which means the test will be almost always False) shows it takes 0.06s - so even if we are looking at 100000 CVEs it will only add less than a tenth of a second - so I don't think that should be a problem either :)

#!/usr/bin/env python3
import datetime
import math
import random
import timeit

def benchmark():
   i = 0
   while i < 100000:
     year = 2004
     now = datetime.datetime.utcnow().year
     prob = 1.0 / math.pow(2, now - year)
     rand = random.random()
     if rand > prob:
       pass
     i = i + 1

print(timeit.timeit("benchmark()", setup="from __main__ import benchmark", number=10)/10)

15ae1b6... by Alex Murray

scripts/check-cves: Clarify --mistriaged docstring

d3d003b... by Alex Murray

scripts/check-cves: Collect all mistriaged CVEs and whittle down

This cleans up the logic to address feedback from mikesalvatore in
https://code.launchpad.net/~alexmurray/ubuntu-cve-tracker/+git/ubuntu-cve-tracker-1/+merge/387806

186021d... by Alex Murray

scripts/check-cves: Minor cleanups to doc string placement and style

Revision history for this message
Alex Murray (alexmurray) wrote :

See updated commits.

ee53fab... by Alex Murray

scripts/check-cves: Fix up comment to match code

Revision history for this message
Mike Salvatore (mikesalvatore) wrote :

I like the probability idea. There is the potential, however, that it causes the script to behave differently than inspected. Consider the contrived case where there are 0 2020 mistriaged CVEs, 10 2019 mistriaged CVEs, and no earlier mistriaged CVEs. If I pass `--mistriaged 10` to the script, roughly 5 mistriaged CVEs will be presented to me since they will be selected with probability 1/2. Even though there are 10 mistriaged CVEs and I asked for 10, I only get to evaluate 5.

I don't think this should block the merge; it's just an edge case to consider.

Revision history for this message
Alex Murray (alexmurray) wrote :

Mike I agree that this is something that should be done to improve this later down the track. Since there have been no other objections, merging this as is. Will add improvements in further merge requests.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/scripts/check-cves b/scripts/check-cves
2index 5855daf..4696327 100755
3--- a/scripts/check-cves
4+++ b/scripts/check-cves
5@@ -17,9 +17,11 @@
6
7 import datetime
8 import json
9+import math
10 import optparse
11 import os
12 import os.path
13+import random
14 import re
15 import shutil
16 import subprocess
17@@ -55,6 +57,9 @@ parser.add_option("--import-missing-debian", help="Process missing Debian CVEs",
18 parser.add_option("--debug", help="Report debugging information", action="store_true")
19 parser.add_option("--cve", help="Check only the listed comma-separated CVEs and ignore others", action="store",
20 default="")
21+parser.add_option("--mistriaged", help="Process the specified number of possible mistriaged CVEs compared to Debian\n"
22+ "Implies --import-missing-debian",
23+ action="store", type=int, default=0)
24 (opt, args) = parser.parse_args()
25
26 experimental = False
27@@ -71,6 +76,8 @@ destdir = "."
28 # Skip stuff older than 2005
29 cve_limit = 2004
30
31+mistriaged_hint = 'Previously triaged as ignored in Ubuntu\n\n'
32+
33 ignore_strings = [
34 "** REJECT **", "Internet Explorer", "Microsoft Edge", "Windows 98",
35 "Windows 2000", "Windows XP", "Windows Server 2003", "Windows NT",
36@@ -323,6 +330,24 @@ def import_debian(handler):
37 today = datetime.date.today()
38 known = set(CVEKnownList + CVEIgnoreList)
39
40+ def mistriaged(cve):
41+ if cve in CVEIgnoreNotForUsList and \
42+ cve not in CVEIgnoreMistriagedList and \
43+ handler.debian[cve]['state'] == 'FOUND':
44+ # check that at least one of the assigned packages exist
45+ # in Ubuntu
46+ pkgs = {}
47+ for pkg in handler.debian[cve]['pkgs'].keys():
48+ # ignore package if debian says is not affected
49+ if handler.debian[cve]['pkgs'][pkg]['state'] == '<not-affected>':
50+ continue
51+ answer = source_map.madison(source, pkg)
52+ if len(answer) > 0:
53+ pkgs[pkg] = answer
54+ if len(pkgs) > 0:
55+ return True
56+ return False
57+
58 # pull in CVEs from data/DSA/list
59 dsas = cve_lib.load_debian_dsas(cve_lib.config['secure_testing_path'] + '/data/DSA/list', opt.verbose)
60 for dsa in dsas:
61@@ -336,9 +361,14 @@ def import_debian(handler):
62 if year < cve_limit:
63 continue
64
65- # If we already know about the CVE, skip it
66+ # If we already know about the CVE, skip it unless is
67+ # mistriaged
68 if cve in known:
69- continue
70+ if mistriaged(cve):
71+ # add a note about how this was originally classified
72+ dsas[dsa]['desc'] = mistriaged_hint + dsas[dsa]['desc']
73+ else:
74+ continue
75
76 cves[cve] = dict()
77 cves[cve]['subject'] = escape(dsas[dsa]['desc'])
78@@ -368,11 +398,15 @@ def import_debian(handler):
79 print("Skipping %s, year %d predates %d" % (cve, year, cve_limit), file=sys.stderr)
80 continue
81
82- # If we already know about the CVE, skip it
83+ # If we already know about the CVE, skip it unless is mistriaged
84 if cve in known:
85- if opt.verbose:
86- print("Skipping %s, already known" % cve, file=sys.stderr)
87- continue
88+ if mistriaged(cve):
89+ # add a note about how this was originally classified
90+ handler.debian[cve]['desc'] = mistriaged_hint + handler.debian[cve]['desc']
91+ else:
92+ if opt.verbose:
93+ print("Skipping %s, already known" % cve, file=sys.stderr)
94+ continue
95
96 if handler.debian[cve]['desc'] or handler.debian[cve]['state'] == 'FOUND':
97 cves[cve] = dict()
98@@ -760,8 +794,8 @@ class CVEHandler(xml.sax.handler.ContentHandler):
99 self.curr_refs += [(self.curr_source, self.curr_chars, self.curr_url)]
100
101 def handle_cve(self):
102- # Skip CVEs we know about already
103- if self.curr_cve in self.cve_ignore:
104+ # Skip CVEs we know about already unless this is a mistriaged CVE
105+ if self.curr_cve in self.cve_ignore and mistriaged_hint not in self.curr_desc:
106 return
107
108 limit = cve_limit
109@@ -1261,6 +1295,13 @@ class CVEHandler(xml.sax.handler.ContentHandler):
110 print('%s %s %s\n%s' % (action, cve, data, desc))
111
112 def add_cve(self, cve, packages, priority=None):
113+ # remove from not-for-us.txt if adding and ensure we remove any
114+ # mistriaged_hint from the description
115+ if cve in CVEIgnoreNotForUsList:
116+ cmd = ['sed', '-i', '/^%s #.*$/d' % cve, './ignored/not-for-us.txt']
117+ subprocess.call(cmd)
118+ self.cve_data[cve]['desc'] = self.cve_data[cve]['desc'].replace(mistriaged_hint, '')
119+
120 # Build up list of reference urls
121 ref_urls = []
122 if self.debian and \
123@@ -1365,8 +1406,10 @@ class CVEHandler(xml.sax.handler.ContentHandler):
124 self.num_added += 1
125
126 def ignore_cve(self, cve, reason):
127- # Append to ignore list
128- with open('%s/ignored/not-for-us.txt' % (destdir), 'a') as f:
129+ # Append to ignore list unless is already in CVEIgnoreList and then
130+ # append to the ignored/ignore-mistriaged.txt
131+ txtfile = 'ignore-mistriaged.txt' if cve in CVEIgnoreNotForUsList else 'not-for-us.txt'
132+ with open('%s/ignored/%s' % (destdir, txtfile), 'a') as f:
133 f.write('%s # %s\n' % (cve, reason))
134
135 self.num_ignored += 1
136@@ -1459,19 +1502,32 @@ class CheckCVETest(unittest.TestCase):
137 self.assertEqual("Exquisite Ultimate Newspaper theme for WordPress", h.get_ignore_suggestion('''The Exquisite Ultimate Newspaper theme 1.3.3 for WordPress has XSS via the anchor identifier to assets/js/jquery.foundation.plugins.js.'''))
138
139
140+ignored_notforus_path = 'ignored/not-for-us.txt'
141+if destdir != './' and destdir != '.':
142+ ignored_notforus_path = os.path.join(destdir, ignored_notforus_path)
143+# CVEIgnoreNotForUsList is a list of all CVEs that we have previously
144+# chosen to ignore since they don't apply to software in Ubuntu
145+CVEIgnoreNotForUsList = cve_lib.parse_CVEs_from_uri(ignored_notforus_path)
146+
147+ignored_mistriaged_path = 'ignored/ignore-mistriaged.txt'
148+if destdir != './' and destdir != '.':
149+ ignored_mistriaged_path = os.path.join(destdir, ignored_mistriaged_path)
150+# CVEIgnoreMistriagedList is a list of all CVEs that we want to definitely
151+# ignore when doing mistriaged CVE detection - they should exist in both
152+# CVEIgnoreNotForUsList and CVEIgnoreMistriagedList
153+CVEIgnoreMistriagedList = cve_lib.parse_CVEs_from_uri(ignored_mistriaged_path)
154+
155 # CVEIgnoreList is a list of all CVEs we know about already. These will be
156 # ignored when checking MITRE for new CVEs
157-ignored_path = 'ignored/not-for-us.txt'
158-if destdir != './' and destdir != '.':
159- ignored_path = os.path.join(destdir, ignored_path)
160-CVEIgnoreList = cve_lib.parse_CVEs_from_uri(ignored_path)
161+CVEIgnoreList = list(CVEIgnoreNotForUsList)
162+
163 CVEKnownList = []
164 CVEKnownList += [cve for cve in os.listdir(destdir + "/ignored/") if cve.startswith('CVE-')]
165 CVEKnownList += [cve for cve in os.listdir(destdir + "/retired/") if cve.startswith('CVE-')]
166 (ActiveList, EmbargoList) = cve_lib.get_cve_list()
167 CVEKnownList += [cve for cve in ActiveList if cve not in EmbargoList]
168
169-if not opt.refresh:
170+if not opt.refresh and not opt.mistriaged:
171 CVEIgnoreList += CVEKnownList
172
173 if opt.known:
174@@ -1520,7 +1576,7 @@ if opt.rhel8oval:
175 args.append(untriaged_json)
176
177 debian_import_json = ""
178-if opt.import_missing_debian and handler.debian is not None:
179+if (opt.import_missing_debian or opt.mistriaged) and handler.debian is not None:
180 debian_import_json = import_debian(handler)
181 args.append(debian_import_json)
182
183@@ -1630,6 +1686,26 @@ for cve in new_cves:
184 if len(specific_cves) > 0 and cve not in specific_cves:
185 # ignore this cve
186 continue
187+ # if this got marked as mistriaged, probablistically choose it for
188+ # processing
189+ if mistriaged_hint in handler.cve_data[cve]['desc']:
190+ if opt.mistriaged == 0:
191+ # ignore this one
192+ continue
193+ else:
194+ # choose CVEs at random but pick most recent first - if year is
195+ # now then we want to pick it, if year is 1 year ago we want to
196+ # pick half as often, if year is 2 years ago we want to pick
197+ # 1/4 as often etc - so do 1/2^(diff)
198+ year = int(re.split('-', cve)[1])
199+ now = datetime.datetime.utcnow().year
200+ prob = 1.0 / math.pow(2, now - year)
201+ rand = random.random()
202+ if rand > prob:
203+ continue
204+ # selected!
205+ opt.mistriaged = opt.mistriaged - 1
206+
207 count += 1
208
209 if opt.report:
210@@ -1667,7 +1743,7 @@ if experimental:
211 if total > 0:
212 _spawn_editor(fout.name)
213 else:
214- prompt_user('Not spawning editor when as no CVEs to process')
215+ prompt_user('Not spawning editor as no CVEs to process')
216 fout.seek(0)
217 try:
218 handler.process_command_file(fout, preprocess=True)
219diff --git a/scripts/process_cves b/scripts/process_cves
220index 5bcb619..b4db9bf 100755
221--- a/scripts/process_cves
222+++ b/scripts/process_cves
223@@ -126,6 +126,11 @@ process_missing_debian() {
224 ./scripts/check-cves --import-missing-debian
225 }
226
227+mistriage() {
228+ echo "check-cves --mistriage 10"
229+ ./scripts/check-cves --mistriage 10
230+}
231+
232 download_missing_rhsa() {
233 tmpdir="$1"
234 archive="$2.txt"
235@@ -236,7 +241,8 @@ case "$action" in
236 echo "Skipping merge from kernel team"
237 echo "check-cves nvdcve-*.json"
238 ./scripts/check-cves nvdcve-*.json
239- process_missing_debian
240+ # this will also do missing debian
241+ mistriage
242
243 check_syntax
244 nag_about_mailing_lists
245@@ -270,6 +276,10 @@ case "$action" in
246 update_debian
247 process_missing_debian
248 ;;
249+ mistriage)
250+ update_debian
251+ mistriage
252+ ;;
253 update)
254 # Do no triage
255 update_uct

Subscribers

People subscribed via source and target branches