Merge into 2.1 : csrf-injective : Code : GNU Mailman

Status:	Merged
Merged at revision:	1759
Proposed branch:	lp:~ralfjung-e/mailman/csrf-injective
Merge into:	lp:mailman/2.1
Diff against target:	33 lines (+6/-6) 2 files modified Mailman/Cgi/listinfo.py (+3/-3) Mailman/Cgi/subscribe.py (+3/-3)
To merge this branch:	bzr merge lp:~ralfjung-e/mailman/csrf-injective
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Mark Sapiro		2018-06-03	Approve on 2018-06-03
Review via email: mp+347340@code.launchpad.net

Commit message

Separate data in CSRF token by colon to avoid collisions.

Description of the change

This makes the data-to-token function injective. Previously, for example, the
list called "list1" and the IP "10.0.0.0" would have the same hash as the list
called "list" and the IP "110.0.0.0", as the strings were just concatenated.

Revision history for this message

Jim Popovitch (jimpop) wrote on 2018-06-03:

#

A couple thoughts:

1) The original code in listinfo.py uses 'now' twice in the token. I wonder if that's necessary.

2) Should use something besides a colon as colons are part of IPv6 addresses?

Revision history for this message

Mark Sapiro (msapiro) wrote on 2018-06-03:

#

Thank you for your contribution. It is appreciated and I will merge it.

Regarding @jimpop thoughts, 'now' is used twice because it is part of the data that's hashed in order to make the hash more random, but it also has to be in plain text in the token (which maybe means putting it in the hash doesn't really add anything) because it is required both for the FORM_LIFETIME and SUBSCRIBE_FORM_MIN_TIME checks and to compute the hash to check against. Granted, the time doesn't need to be in the hash, but the original implementation did it like that, and I see no need to change it.

And, the colon is OK as a separator because even though an IPv6 address can begin with a colon, the list name can't end with one, and it doesn't matter anyway. We are only throwing in an extra character to avoid the list name's ending in a digit creating the same hash as a list name without the digit and an IP address with the added digit.

review: Approve

Revision history for this message

Ralf Jung (ralfjung-e) wrote on 2018-06-04:

#

> 1) The original code in listinfo.py uses 'now' twice in the token. I wonder if that's necessary.

It's not really twice in the (hashed) token, it's one in clear-text and then once more in the hashed token. It needs to be in clear-text as otherwise subscribe.py can't re-compute the hash, and it's in the token to make tokens expire. You don't want an attacker to grab a CSRF token once and then use that same token forever, the timestamp (together with the FORM_LIFETIME check) prevents that.

> it is required both for the FORM_LIFETIME and SUBSCRIBE_FORM_MIN_TIME checks and to compute the hash to check against. Granted, the time doesn't need to be in the hash, but the original implementation did it like that, and I see no need to change it.

If it wasn't in the hash, the MIN_LIFETIME and FORM_LIFETIME would not actually be enforced: An attacker could just always pick a timestamp from 1min ago. Hashing the timestamp prevents the attacker from forging it.

> 2) Should use something besides a colon as colons are part of IPv6 addresses?

The same thought occurred to me this morning. However, assuming the FORM_SECRET does not contain a colon, one can still take the first three colons as separators and everything after the third colon as the IP address. The important property is that it is possible, given the string that's put into the hash, to parse out the field values -- IOW, if any of the field values differ, the hash must be different. (This is assuming list names cannot contain colons, is that correct? Essentially, if only the last field can contain our separator, we are good.)

One easy fix to avoid even having to think about this would be to use ":::" as the separator, which is not valid even in IPv6 addresses.

The "right" way to do this would be to use some kind of serialization (i.e., to pickle the data that goes into the hashed token). Since serialization is guaranteed to be invertible by deserialization, that works without any assumptions about which fields can contain which characters. I can change the patch to do that, if you want.

> 1) The original code in listinfo.py uses 'now' twice in the token. I wonder if that's necessary.

It's not really twice in the (hashed) token, it's one in clear-text and then once more in the hashed token. It needs to be in clear-text as otherwise subscribe.py can't re-compute the hash, and it's in the token to make tokens expire.  You don't want an attacker to grab a CSRF token once and then use that same token forever, the timestamp (together with the FORM_LIFETIME check) prevents that.

> it is required both for the FORM_LIFETIME and SUBSCRIBE_FORM_MIN_TIME checks and to compute the hash to check against. Granted, the time doesn't need to be in the hash, but the original implementation did it like that, and I see no need to change it.

If it wasn't in the hash, the MIN_LIFETIME and FORM_LIFETIME would not actually be enforced: An attacker could just always pick a timestamp from 1min ago.  Hashing the timestamp prevents the attacker from forging it.

> 2) Should use something besides a colon as colons are part of IPv6 addresses?

The same thought occurred to me this morning.  However, assuming the FORM_SECRET does not contain a colon, one can still take the first three colons as separators and everything after the third colon as the IP address.  The important property is that it is possible, given the string that's put into the hash, to parse out the field values -- IOW, if any of the field values differ, the hash must be different. (This is assuming list names cannot contain colons, is that correct? Essentially, if only the last field can contain our separator, we are good.)

One easy fix to avoid even having to think about this would be to use ":::" as the separator, which is not valid even in IPv6 addresses.

The "right" way to do this would be to use some kind of serialization (i.e., to pickle the data that goes into the hashed token).  Since serialization is guaranteed to be invertible by deserialization, that works without any assumptions about which fields can contain which characters.  I can change the patch to do that, if you want.

Revision history for this message

Ralf Jung (ralfjung-e) wrote on 2018-06-04:

#

> I can change the patch to do that, if you want.

Never mind, I only now saw it is already merged. Thanks!

1	=== modified file 'Mailman/Cgi/listinfo.py'
2	--- Mailman/Cgi/listinfo.py 2018-05-26 16:22:35 +0000
3	+++ Mailman/Cgi/listinfo.py 2018-06-03 20:42:57 +0000
4	@@ -218,9 +218,9 @@
5	remote = remote.rsplit(':', 1)[0]
6	replacements['<mm-subscribe-form-start>'] += (
7	'<input type="hidden" name="sub_form_token" value="%s:%s">\n'
8	- % (now, Utils.sha_new(mm_cfg.SUBSCRIBE_FORM_SECRET +
9	- now +
10	- mlist.internal_name() +
11	+ % (now, Utils.sha_new(mm_cfg.SUBSCRIBE_FORM_SECRET + ":" +
12	+ now + ":" +
13	+ mlist.internal_name() + ":" +
14	remote
15	).hexdigest()
16	)
17
18	=== modified file 'Mailman/Cgi/subscribe.py'
19	--- Mailman/Cgi/subscribe.py 2018-04-11 09:36:40 +0000
20	+++ Mailman/Cgi/subscribe.py 2018-06-03 20:42:57 +0000
21	@@ -173,9 +173,9 @@
22	except ValueError:
23	ftime = fhash = ''
24	then = 0
25	- token = Utils.sha_new(mm_cfg.SUBSCRIBE_FORM_SECRET +
26	- ftime +
27	- mlist.internal_name() +
28	+ token = Utils.sha_new(mm_cfg.SUBSCRIBE_FORM_SECRET + ":" +
29	+ ftime + ":" +
30	+ mlist.internal_name() + ":" +
31	remote1).hexdigest()
32	if ftime and now - then > mm_cfg.FORM_LIFETIME:
33	results.append(_('The form is too old. Please GET it again.'))

GNU Mailman

Merge lp:~ralfjung-e/mailman/csrf-injective into lp:mailman/2.1

Commit message

Description of the change

Preview Diff

Subscribers