Merge lp:~andrea.corbellini/beeseek/sniffer into lp:beeseek

Proposed by Andrea Corbellini
Status: Merged
Merged at revision: 30
Proposed branch: lp:~andrea.corbellini/beeseek/sniffer
Merge into: lp:beeseek
Diff against target: 581 lines (+528/-0)
10 files modified
.bzrignore (+1/-0)
sniffer/Makefile (+15/-0)
sniffer/include/handler.h (+22/-0)
sniffer/include/parser.h (+14/-0)
sniffer/include/sender.h (+16/-0)
sniffer/include/sniffer.h (+8/-0)
sniffer/src/app.c (+68/-0)
sniffer/src/handler.c (+98/-0)
sniffer/src/parser.c (+162/-0)
sniffer/src/sender.c (+124/-0)
To merge this branch: bzr merge lp:~andrea.corbellini/beeseek/sniffer
Reviewer Review Type Date Requested Status
Lorenzo Allegrucci Approve
BeeSeek Team Pending
Review via email: mp+28393@code.launchpad.net

Description of the change

This branch adds the TCP/IP packets sniffer. It's fully written in C, asynchronous and optimized for low memory usage and light CPU load; also privacy is respected.

First, a quick overview of what it does. Basically, every sniffed packet is checked to see if it looks like a HTTP request. If so, it sends the most important information of the request to the analyzer. Here's how you can test it:

* Build it with `make` or `make DEBUG=1`.
* Launch a script that emulates the analyzer: http://paste.ubuntu.com/454410/
* Launch `beeseek-sniffer` with two options: the Ethernet device (most likely eth0 or wlan0) and the IP/hostname of the analyzer (localhost).
* With the web browser, visit some pages.
* The script that emulates the analyzer should now display all the pages visited (plus CSS, Javascript and images, of course).

Here is a detailed description of how it works:

* The packets sniffed are just the ones with the destination port 80 (HTTP). [include/handler.h: Bf_PCAP_FILTER_EXP]
* When a packet is caught, a parser checks if it starts with "GET" or "HEAD" (we don't care about other methods such as POST or PUT). [src/parser.c: BfHTTPRequest_ReadRequestLine]
* In case of a positive match, it assumes that the packet contains a HTTP request and looks for the URI.
* It then scans all headers looking for 'Host'. [src/parser.c: BfHTTPRequest_ReadHeader]
* If both the URI and the Host have been found, it sends them to the analyzer. [src/sender.c: BfSender_SendReq]

This implementation has however some problems:

* The parser assumes that every HTTP request fits in just one packet. If, for example, the request line and the headers are in two different packets, the request is lost.
* The parser assumes also that for every packet there's just one request. If in a packet there are two or more requests, only the first one is considered.
* Finally, the parser assumes that every HTTP request starts at the beginning of a packet.

These problems may seem critical, however they're not so important. In fact every web browser I've used sends every request in a single packet. Also, fixing the problems above would slow down the application and sightly increase the memory usage.

Although the sniffer application is finished and may be used, it needs some tuning. In include/parser.h the two constants BfHTTPRequest_URL_SIZE and BfHTTPRequest_HOST_SIZE should be set to a reasonable size (currently, URL_SIZE is too small). Also unit tests are missing, but I'll work on them as soon as this branch will be approved.

To post a comment you must log in.
58. By Andrea Corbellini

Ignore HEAD requests.

Revision history for this message
Lorenzo Allegrucci (l-allegrucci) wrote :

The sniffer seems to work here, just increment the buffers for hostname and url to 64 and 2048 bytes, they are too strict.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file '.bzrignore'
--- .bzrignore 2010-03-21 12:19:12 +0000
+++ .bzrignore 2010-06-26 14:49:23 +0000
@@ -1,1 +1,2 @@
1beeseek/_version_info.py1beeseek/_version_info.py
2sniffer/beeseek-sniffer
23
=== added directory 'sniffer'
=== added file 'sniffer/Makefile'
--- sniffer/Makefile 1970-01-01 00:00:00 +0000
+++ sniffer/Makefile 2010-06-26 14:49:23 +0000
@@ -0,0 +1,15 @@
1# Copyright 2010 BeeSeek Developers. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4ifdef DEBUG
5 GCC := gcc -g -DBfDebug
6else
7 GCC := gcc
8endif
9
10build: beeseek-sniffer
11beeseek-sniffer:
12 $(GCC) $(GCC_OPTS) -Wall -Iinclude/ -lpcap -o beeseek-sniffer src/*.c
13
14clean:
15 rm -f beeseek-sniffer
016
=== added directory 'sniffer/include'
=== added file 'sniffer/include/handler.h'
--- sniffer/include/handler.h 1970-01-01 00:00:00 +0000
+++ sniffer/include/handler.h 2010-06-26 14:49:23 +0000
@@ -0,0 +1,22 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4#include <pcap.h>
5
6typedef struct {
7 u_int32_t client_addr;
8 u_int16_t client_port;
9 u_int32_t server_addr;
10 u_int16_t server_port;
11 char *data;
12 unsigned int data_len;
13} BfPacket;
14
15#define Bf_PCAP_FILTER_EXP "tcp and ip and dst port 80"
16#define Bf_PCAP_BUF_SIZE 8192
17
18BfPacket *BfPacket_New(BfPacket *packet, const u_char *packet_data);
19char *BfPacket_Repr(BfPacket *packet);
20void BfSniffer_HandlePacket(u_char *args, const struct pcap_pkthdr *header,
21 const u_char *packet_data);
22int BfSniffer_SniffDevice(const char *device_name);
023
=== added file 'sniffer/include/parser.h'
--- sniffer/include/parser.h 1970-01-01 00:00:00 +0000
+++ sniffer/include/parser.h 2010-06-26 14:49:23 +0000
@@ -0,0 +1,14 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4#define BfHTTPRequest_URL_SIZE 200
5#define BfHTTPRequest_HOST_SIZE 20
6
7typedef struct {
8 char url[BfHTTPRequest_URL_SIZE+1];
9 char host[BfHTTPRequest_HOST_SIZE+1];
10} BfHTTPRequest;
11
12int BfHTTPRequest_ParsePacket(BfPacket *packet);
13int BfHTTPRequest_ReadRequestLine(BfHTTPRequest *request, BfPacket *packet);
14int BfHTTPRequest_ReadHeader(BfHTTPRequest *request, BfPacket *packet);
015
=== added file 'sniffer/include/sender.h'
--- sniffer/include/sender.h 1970-01-01 00:00:00 +0000
+++ sniffer/include/sender.h 2010-06-26 14:49:23 +0000
@@ -0,0 +1,16 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4extern int BfSender_Socket;
5
6#define BfSender_DEFAULT_PORT 7222
7
8int BfSender_Connect(char *address, int port);
9int BfSender_SendRaw(char *data);
10int BfSender_SendReq(BfHTTPRequest *request);
11void BfSender_Close(void);
12
13#define BfSender_MSG_CONNECT "PUT /api/sniffer-interface HTTP/1.1\r\n\r\n"
14#define BfSender_MSG_PAGE_TMPL "http://%s%s\n"
15#define BfSender_MSG_PAGE_SIZE \
16 BfHTTPRequest_URL_SIZE + BfHTTPRequest_HOST_SIZE + 8
017
=== added file 'sniffer/include/sniffer.h'
--- sniffer/include/sniffer.h 1970-01-01 00:00:00 +0000
+++ sniffer/include/sniffer.h 2010-06-26 14:49:23 +0000
@@ -0,0 +1,8 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4extern char *Bf_ProgramName;
5
6#include "handler.h"
7#include "parser.h"
8#include "sender.h"
09
=== added directory 'sniffer/src'
=== added file 'sniffer/src/app.c'
--- sniffer/src/app.c 1970-01-01 00:00:00 +0000
+++ sniffer/src/app.c 2010-06-26 14:49:23 +0000
@@ -0,0 +1,68 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4#include <signal.h>
5#include <stdio.h>
6#include <stdlib.h>
7#include <string.h>
8#include "sniffer.h"
9
10/* The command used to launch the application (i.e. `argv[0]`). */
11char *Bf_ProgramName;
12
13/* Display an help message. */
14static void
15Bf_PrintUsage(void)
16{
17 printf("Usage: %s DEVICE HOST\n\n", Bf_ProgramName);
18 printf("Options:\n");
19 printf(" -h, --help show this help message and exit\n");
20}
21
22/* Handle SIGINT/SIGQUIT. */
23static void
24Bf_HandleQuit(int sig)
25{
26 BfSender_Close();
27 printf("%s: sniffer stopped\n", Bf_ProgramName);
28 exit(0);
29}
30
31/* Application ingress point. */
32int
33main(int argc, char **argv)
34{
35 /* Read and check the command line arguments. */
36 Bf_ProgramName = argv[0];
37 if (argc == 1) {
38 fprintf(stderr, "%s: error: missing interface name\n", Bf_ProgramName);
39 Bf_PrintUsage();
40 return 2;
41 }
42 else if (argc == 2) {
43 if (strcmp(argv[1], "-h") == 0 || strcmp(argv[1], "--help") == 0) {
44 Bf_PrintUsage();
45 return 0;
46 }
47 else {
48 fprintf(stderr, "%s: error: no destination host\n", Bf_ProgramName);
49 Bf_PrintUsage();
50 return 2;
51 }
52 }
53 else if (argc > 3) {
54 fprintf(stderr, "%s: error: too many arguments\n", Bf_ProgramName);
55 Bf_PrintUsage();
56 return 2;
57 }
58
59 /* Initialize the sender. */
60 if (BfSender_Connect(argv[2], BfSender_DEFAULT_PORT) < 0)
61 return 1;
62 /* Connect signals. */
63 signal(SIGINT, Bf_HandleQuit);
64 signal(SIGQUIT, Bf_HandleQuit);
65
66 /* Start sniffing. */
67 return BfSniffer_SniffDevice(argv[1]);
68}
069
=== added file 'sniffer/src/handler.c'
--- sniffer/src/handler.c 1970-01-01 00:00:00 +0000
+++ sniffer/src/handler.c 2010-06-26 14:49:23 +0000
@@ -0,0 +1,98 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4#include <netinet/ip.h>
5#include <netinet/tcp.h>
6#include <pcap.h>
7#include <stdlib.h>
8#include "sniffer.h"
9
10
11/* Return a BfPacket initialized with the information contained in a packet.
12 * If the packet doesn't contain data to be parsed, this function returns NULL.
13 */
14BfPacket *
15BfPacket_Init(BfPacket *packet, const u_char *packet_data)
16{
17 const struct iphdr *ip;
18 const struct tcphdr *tcp;
19
20 /* Consume the Ethernet header (we don't need it). */
21 packet_data += 14;
22 /* Read the IP and TCP headers. */
23 ip = (struct iphdr *)packet_data;
24 packet_data += ip->ihl * 4;
25 tcp = (struct tcphdr *)packet_data;
26 packet_data += tcp->doff * 4;
27
28 if (tcp->psh == 0)
29 /* This packet has no data. */
30 return NULL;
31
32 /* Populate the BfPacket. */
33 packet->client_addr = ip->saddr;
34 packet->client_port = tcp->source;
35 packet->server_addr = ip->saddr;
36 packet->server_port = tcp->dest;
37 packet->data = (char *)packet_data;
38 packet->data_len = ntohs(ip->tot_len) - ip->ihl * 4 - tcp->doff * 4;
39
40 return packet;
41}
42
43/* Return a string representing the sender and the receiver of a packet in a
44 human-readable form. */
45char *
46BfPacket_Repr(BfPacket *packet)
47{
48 static char repr[42] = "\0";
49 sprintf(repr, "%d.%d.%d.%d:%d->%d.%d.%d.%d:%d",
50 packet->client_addr & 0xFF, (packet->client_addr >> 8) & 0xFF,
51 (packet->client_addr >> 16) & 0xFF, packet->client_addr >> 24,
52 ntohs(packet->client_port),
53 packet->server_addr & 0xFF, (packet->server_addr >> 8) & 0xFF,
54 (packet->server_addr >> 16) & 0xFF, packet->server_addr >> 24,
55 ntohs(packet->server_port));
56 return repr;
57}
58
59/* Handle a packet. */
60void
61BfSniffer_HandlePacket(u_char *args, const struct pcap_pkthdr *header,
62 const u_char *packet_data)
63{
64 BfPacket packet;
65 if (BfPacket_Init(&packet, packet_data) == NULL)
66 return;
67#ifdef BfDebug
68 printf("%s: debug: [%s] new packet with data received\n", Bf_ProgramName,
69 BfPacket_Repr(&packet));
70#endif
71 if (BfHTTPRequest_ParsePacket(&packet) < 0)
72 exit(1);
73}
74
75/* Start sniffing packets from the device running the PCAP loop. */
76int
77BfSniffer_SniffDevice(const char *device_name)
78{
79 pcap_t *handler;
80 char errbuf[PCAP_ERRBUF_SIZE];
81 struct bpf_program filter_program;
82
83 /* Open the Ethernet device. */
84 handler = pcap_open_live(device_name, Bf_PCAP_BUF_SIZE, 1, 1000, errbuf);
85 if (handler == NULL) {
86 fprintf(stderr, "%s: error: cannot open device: %s\n",
87 Bf_ProgramName, errbuf);
88 return 1;
89 }
90
91 /* Apply the filter expression. */
92 pcap_compile(handler, &filter_program, Bf_PCAP_FILTER_EXP, 0, 0);
93 pcap_setfilter(handler, &filter_program);
94
95 /* Handle the packets. */
96 printf("%s: sniffing\n", Bf_ProgramName);
97 return pcap_loop(handler, -1, BfSniffer_HandlePacket, NULL);
98}
099
=== added file 'sniffer/src/parser.c'
--- sniffer/src/parser.c 1970-01-01 00:00:00 +0000
+++ sniffer/src/parser.c 2010-06-26 14:49:23 +0000
@@ -0,0 +1,162 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4#include <ctype.h>
5#include <stdio.h>
6#include <string.h>
7#include "sniffer.h"
8
9/* Parse a BfPacket and send a BfHTTPRequest to the analyzer. */
10int
11BfHTTPRequest_ParsePacket(BfPacket *packet)
12{
13 /* Here we assume that requests are always at the beginning of a packet.
14 Although this is not always true, most of the browsers do this, so we
15 should be able to catch almost every request. */
16
17 int status;
18 BfHTTPRequest request;
19
20 if (BfHTTPRequest_ReadRequestLine(&request, packet) < 0)
21 return 0;
22
23 while (packet->data_len > 0) {
24 status = BfHTTPRequest_ReadHeader(&request, packet);
25 if (status == -1)
26 return 0;
27 else if (status == 0)
28 continue;
29 return BfSender_SendReq(&request);
30 }
31 return 0;
32}
33
34/* Read the request line and put the parsed data into the given `request`. */
35int
36BfHTTPRequest_ReadRequestLine(BfHTTPRequest *request, BfPacket *packet)
37{
38 int url_len;
39 int line_len;
40
41 if (packet->data_len < 6)
42 /* This line is too short to be a request line. */
43 return -1;
44
45 /* Check the request method. */
46 if (strncmp(packet->data, "GET ", 4) != 0)
47 /* We don't care about methods other than GET. */
48 return -1;
49 packet->data += 4;
50 packet->data_len -= 4;
51
52 /* Get the end of the line. */
53 line_len = memchr(packet->data, '\n', packet->data_len) -
54 (void *)packet->data + 1;
55 if (line_len <= 0)
56 return -1;
57
58 /* Get the URL size. */
59 url_len = memchr(packet->data, ' ', line_len) - (void *)packet->data;
60 if (url_len <= 0)
61 return -1;
62 else if (url_len >= BfHTTPRequest_URL_SIZE) {
63 fprintf(stderr,
64 "%s: error: [%s] URL too long (%d bytes), request discarded\n",
65 Bf_ProgramName, BfPacket_Repr(packet), url_len);
66 return -1;
67 }
68
69 /* Put the URL into the request, adding the NULL terminator. */
70 memcpy(request->url, packet->data, url_len);
71 request->url[url_len] = '\0';
72#ifdef BfDebug
73 printf("%s: debug: [%s] caught request: GET %s\n", Bf_ProgramName,
74 BfPacket_Repr(packet), request->url);
75#endif
76 /* Consume the bytes used. */
77 packet->data += line_len;
78 packet->data_len -= line_len;
79 return 0;
80}
81
82/* Read a header and, if it's Host, put it into the request. */
83int
84BfHTTPRequest_ReadHeader(BfHTTPRequest *request, BfPacket *packet)
85{
86 char *buf;
87 char *line_end;
88 char *host;
89 char *host_end;
90 int host_len;
91
92 /* TODO Skip lines starting with a space. */
93
94 if (packet->data_len < 7)
95 /* This line is too short to be a Host header (probably headers are
96 finished). */
97 return -1;
98
99 /* Get the end of the line. */
100 line_end = memchr(packet->data, '\n', packet->data_len);
101 if (line_end == NULL)
102 return -1;
103 line_end--;
104
105 /* Look for the header name-value separator, making the header name lower
106 case. */
107 for (buf = packet->data; buf < line_end; buf++) {
108 if (buf[0] == ':')
109 break;
110 buf[0] = tolower(buf[0]);
111 }
112 if (buf[0] != ':')
113 /* This is not a header. */
114 return -1;
115
116 if (strncmp(packet->data, "host", buf - packet->data) != 0) {
117 /* This is not Host, but an another header. Consume the bytes of the
118 line and return. */
119 packet->data_len -= line_end + 2 - packet->data;
120 packet->data = line_end + 2;
121 return 0;
122 }
123
124 /* Get the beginning of the host name, skipping any blank space. */
125 for (buf++ ; buf < line_end; buf++)
126 if (buf[0] != ' ' && buf[0] != '\t' && buf[0] != '\r') {
127 host = buf;
128 break;
129 }
130 if (buf == line_end)
131 /* The header value is not specified, or continues on the next line. */
132 return 0;
133
134 /* Get the end of the host name, skipping the blank spaces. */
135 for (buf = line_end; buf > host; buf--)
136 if (buf[0] != ' ' && buf[0] != '\t' && buf[0] != '\r') {
137 host_end = buf + 1;
138 break;
139 }
140
141 /* Check if the host name length. */
142 host_len = host_end - host;
143 if (host_len > BfHTTPRequest_HOST_SIZE) {
144 fprintf(stderr,
145 "%s: error: [%s] host name too long (%d bytes), "
146 "request discarded\n", Bf_ProgramName, BfPacket_Repr(packet),
147 host_len);
148 return -1;
149 }
150
151 /* Put the host name into the request, adding the NULL terminator. */
152 memcpy(request->host, host, host_len);
153 request->host[host_len] = '\0';
154#ifdef BfDebug
155 printf("%s: debug: [%s] found host: %s\n", Bf_ProgramName,
156 BfPacket_Repr(packet), request->host);
157#endif
158 /* Consume the bytes used. */
159 packet->data_len -= line_end + 2 - packet->data;
160 packet->data = line_end + 2;
161 return 1;
162}
0163
=== added file 'sniffer/src/sender.c'
--- sniffer/src/sender.c 1970-01-01 00:00:00 +0000
+++ sniffer/src/sender.c 2010-06-26 14:49:23 +0000
@@ -0,0 +1,124 @@
1/* Copyright 2010 BeeSeek Developers. This software is licensed under the
2 GNU Affero General Public License version 3 (see the file LICENSE). */
3
4#include <errno.h>
5#include <netdb.h>
6#include <netinet/in.h>
7#include <stdio.h>
8#include <string.h>
9#include <sys/socket.h>
10#include <unistd.h>
11#include "sniffer.h"
12
13/* Socket used to send data to the BeeSeek server. */
14int BfSender_Socket;
15
16
17/* Connect to the BeeSeek server. */
18int
19BfSender_Connect(char *address, int port)
20{
21 struct hostent *host;
22 struct sockaddr_in server_addr;
23
24#ifdef BfDebug
25 printf("%s: debug: connecting to server at %s:%d\n", Bf_ProgramName,
26 address, port);
27#endif
28
29 /* Create the socket. */
30 BfSender_Socket = socket(AF_INET, SOCK_STREAM, 0);
31 if (BfSender_Socket < 0) {
32 fprintf(stderr, "%s: error: cannot create socket: %s\n",
33 Bf_ProgramName, strerror(errno));
34 return -1;
35 }
36
37 /* Get the destination host. */
38 host = gethostbyname(address);
39 if (host == 0) {
40 fprintf(stderr, "%s: error: cannot connect to server: %s\n",
41 Bf_ProgramName, strerror(errno));
42 return -1;
43 }
44
45 /* Insert the address information. */
46 memset(&server_addr, 0, sizeof(server_addr));
47 server_addr.sin_family = AF_INET;
48 server_addr.sin_addr.s_addr = ((struct in_addr *)(host->h_addr))->s_addr;
49 server_addr.sin_port = htons(port);
50
51 /* Connect to the server. */
52 if (connect(BfSender_Socket, (struct sockaddr *) &server_addr,
53 sizeof(server_addr)) < 0) {
54 fprintf(stderr, "%s: error: cannot connect to server: %s\n",
55 Bf_ProgramName, strerror(errno));
56 return -1;
57 }
58
59 /* Initialize the HTTP communication with the server. */
60 if (BfSender_SendRaw(BfSender_MSG_CONNECT) < 0)
61 return -1;
62
63 char data[12];
64 int bytes_recvd;
65 int data_len = 0;
66
67 /* Receive the response from the server. */
68 while (data_len < 12) {
69 bytes_recvd = recv(BfSender_Socket, data + data_len, 12 - data_len, 0);
70 if (bytes_recvd < 0) {
71 fprintf(stderr, "%s: error: cannot get response from server: %s\n",
72 Bf_ProgramName, strerror(errno));
73 return -1;
74 }
75 else if (bytes_recvd == 0) {
76 fprintf(stderr, "%s: error: server dropped the connection\n",
77 Bf_ProgramName);
78 return -1;
79 }
80 data_len += bytes_recvd;
81 }
82 /* The status code of the response should be 2XX (e.g. "HTTP/1.1 200 OK").
83 */
84 if (data[9] != '2') {
85 fprintf(stderr, "%s: error: bad response from server: %c%c%c\n",
86 Bf_ProgramName, data[9], data[10], data[11]);
87 return -1;
88 }
89 return 0;
90}
91
92int
93BfSender_SendRaw(char *data)
94{
95 int bytes_sent;
96 while (strlen(data) > 0) {
97 bytes_sent = send(BfSender_Socket, data, strlen(data), MSG_NOSIGNAL);
98 if (bytes_sent < 0) {
99 fprintf(stderr, "%s: error: cannot send message to server: %s\n",
100 Bf_ProgramName, strerror(errno));
101 return -1;
102 }
103 data += bytes_sent;
104 }
105 return 0;
106}
107
108int
109BfSender_SendReq(BfHTTPRequest *request)
110{
111 char data[BfSender_MSG_PAGE_SIZE];
112 sprintf(data, BfSender_MSG_PAGE_TMPL, request->host, request->url);
113 return BfSender_SendRaw(data);
114}
115
116void
117BfSender_Close(void)
118{
119 shutdown(BfSender_Socket, SHUT_RDWR);
120 close(BfSender_Socket);
121#ifdef BfDebug
122 printf("%s: debug: connection to the server closed\n", Bf_ProgramName);
123#endif
124}

Subscribers

People subscribed via source and target branches