Ubuntu

Merge lp:~mitya57/ubuntu/precise/virtualbox/4.1.12-dfsg-2ubuntu0.3 into lp:ubuntu/precise/virtualbox

Proposed by Dmitry Shachnev on 2013-03-14
Status: Work in progress
Proposed branch: lp:~mitya57/ubuntu/precise/virtualbox/4.1.12-dfsg-2ubuntu0.3
Merge into: lp:ubuntu/precise/virtualbox
Diff against target: 21532 lines (+21066/-102) 26 files modified
To merge this branch: bzr merge lp:~mitya57/ubuntu/precise/virtualbox/4.1.12-dfsg-2ubuntu0.3
Reviewer Review Type Date Requested Status
James Page Approve on 2013-03-25
Ubuntu branches 2013-03-14 Pending
Review via email: mp+153346@code.launchpad.net

Description of the Change

This fixes a bug where virtualbox was not installable on precise systems with 3.5 kernel. Now I've verified that it builds, installs and runs correctly.

This is based on Felix's previous upload which was rejected because of missing SRU information and bad changelog entry. Now that is fixed.

virtualbox (4.1.12-dfsg-2ubuntu0.3) precise-proposed; urgency=low

  [ Felix Geyer ]
  * Fix build errors with kernel 3.5. (LP: #1081307)
    - Add 39-kernel-35.patch
  * Fix crash when running 64-bit guests on a 32-bit host system.
    (LP: #1071344)
    - Add 40-fix-crash-64bit-guests.patch

  [ Dmitry Shachnev ]
  * Fix the changelog, refresh patches and re-upload to precise-proposed.

 -- Dmitry Shachnev <email address hidden> Thu, 14 Mar 2013 16:37:24 +0400

To post a comment you must log in.
James Page (james-page) wrote :

Uploaded to precise-proposed for SRU team review.

review: Approve

Unmerged revisions

17. By Dmitry Shachnev on 2013-03-14

Releasing version 4.1.12-dfsg-2ubuntu0.3

16. By Dmitry Shachnev on 2013-03-13

[ Felix Geyer ]
* Fix build errors with kernel 3.5. (LP: #1081307)
  - Add 39-kernel-35.patch
* Fix crash when running 64-bit guests on a 32-bit host system.
  (LP: #1071344)
  - Add 40-fix-crash-64bit-guests.patch

[ Dmitry Shachnev ]
* Fix the changelog, refresh patches and re-upload to precise-proposed.

15. By Felix Geyer on 2012-10-26

* SECURITY UPDATE: Missing privilege check for task gate switches
  (LP: #1044634)
  - debian/patches/cve-2012-3221.patch: patch from upstream
  - CVE-2012-3221

Preview Diff

1=== added file '.pc/.quilt_patches'
2--- .pc/.quilt_patches 1970-01-01 00:00:00 +0000
3+++ .pc/.quilt_patches 2013-03-14 12:42:23 +0000
4@@ -0,0 +1,1 @@
5+debian/patches
6
7=== added file '.pc/.quilt_series'
8--- .pc/.quilt_series 1970-01-01 00:00:00 +0000
9+++ .pc/.quilt_series 2013-03-14 12:42:23 +0000
10@@ -0,0 +1,1 @@
11+series
12
13=== added directory '.pc/37-fix-unregister-netdevice.patch'
14=== added directory '.pc/37-fix-unregister-netdevice.patch/src'
15=== added directory '.pc/37-fix-unregister-netdevice.patch/src/VBox'
16=== added directory '.pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers'
17=== added directory '.pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt'
18=== added file '.pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/VBoxNetFltInternal.h'
19--- .pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/VBoxNetFltInternal.h 1970-01-01 00:00:00 +0000
20+++ .pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/VBoxNetFltInternal.h 2013-03-14 12:42:23 +0000
21@@ -0,0 +1,468 @@
22+/* $Id: VBoxNetFltInternal.h $ */
23+/** @file
24+ * VBoxNetFlt - Network Filter Driver (Host), Internal Header.
25+ */
26+
27+/*
28+ * Copyright (C) 2008 Oracle Corporation
29+ *
30+ * This file is part of VirtualBox Open Source Edition (OSE), as
31+ * available from http://www.virtualbox.org. This file is free software;
32+ * you can redistribute it and/or modify it under the terms of the GNU
33+ * General Public License (GPL) as published by the Free Software
34+ * Foundation, in version 2 as it comes in the "COPYING" file of the
35+ * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
36+ * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
37+ */
38+
39+#ifndef ___VBoxNetFltInternal_h___
40+#define ___VBoxNetFltInternal_h___
41+
42+#include <VBox/sup.h>
43+#include <VBox/intnet.h>
44+#include <iprt/semaphore.h>
45+#include <iprt/assert.h>
46+
47+
48+RT_C_DECLS_BEGIN
49+
50+/** Pointer to the globals. */
51+typedef struct VBOXNETFLTGLOBALS *PVBOXNETFLTGLOBALS;
52+
53+
54+/**
55+ * The state of a filter driver instance.
56+ *
57+ * The state machine differs a bit between the platforms because of
58+ * the way we hook into the stack. On some hosts we can dynamically
59+ * attach when required (on CreateInstance) and on others we will
60+ * have to connect when the network stack is bound up. These modes
61+ * are called static and dynamic config and governed at compile time
62+ * by the VBOXNETFLT_STATIC_CONFIG define.
63+ *
64+ * See sec_netflt_msc for more details on locking and synchronization.
65+ */
66+typedef enum VBOXNETFTLINSSTATE
67+{
68+ /** The usual invalid state. */
69+ kVBoxNetFltInsState_Invalid = 0,
70+ /** Initialization.
71+ * We've reserved the interface name but need to attach to the actual
72+ * network interface outside the lock to avoid deadlocks.
73+ * In the dynamic case this happens during a Create(Instance) call.
74+ * In the static case it happens during driver initialization. */
75+ kVBoxNetFltInsState_Initializing,
76+#ifdef VBOXNETFLT_STATIC_CONFIG
77+ /** Unconnected, not hooked up to a switch (static only).
78+ * The filter driver instance has been instantiated and hooked up,
79+ * waiting to be connected to an internal network. */
80+ kVBoxNetFltInsState_Unconnected,
81+#endif
82+ /** Connected to an internal network. */
83+ kVBoxNetFltInsState_Connected,
84+ /** Disconnecting from the internal network and possibly the host network interface.
85+ * Partly for reasons of deadlock avoidance again. */
86+ kVBoxNetFltInsState_Disconnecting,
87+ /** The instance has been disconnected from both the host and the internal network. */
88+ kVBoxNetFltInsState_Destroyed,
89+
90+ /** The habitual 32-bit enum hack. */
91+ kVBoxNetFltInsState_32BitHack = 0x7fffffff
92+} VBOXNETFTLINSSTATE;
93+
94+
95+/**
96+ * The per-instance data of the VBox filter driver.
97+ *
98+ * This is data associated with a network interface / NIC / wossname which
99+ * the filter driver has been or may be attached to. When possible it is
100+ * attached dynamically, but this may not be possible on all OSes so we have
101+ * to be flexible about things.
102+ *
103+ * A network interface / NIC / wossname can only have one filter driver
104+ * instance attached to it. So, attempts at connecting an internal network
105+ * to an interface that's already in use (connected to another internal network)
106+ * will result in a VERR_SHARING_VIOLATION.
107+ *
108+ * Only one internal network can connect to a filter driver instance.
109+ */
110+typedef struct VBOXNETFLTINS
111+{
112+ /** Pointer to the next interface in the list. (VBOXNETFLTGLOBAL::pInstanceHead) */
113+ struct VBOXNETFLTINS *pNext;
114+ /** Our RJ-45 port.
115+ * This is what the internal network plugs into. */
116+ INTNETTRUNKIFPORT MyPort;
117+ /** The RJ-45 port on the INTNET "switch".
118+ * This is what we're connected to. */
119+ PINTNETTRUNKSWPORT pSwitchPort;
120+ /** Pointer to the globals. */
121+ PVBOXNETFLTGLOBALS pGlobals;
122+
123+ /** The spinlock protecting the state variables and host interface handle. */
124+ RTSPINLOCK hSpinlock;
125+ /** The current interface state. */
126+ VBOXNETFTLINSSTATE volatile enmState;
127+ /** The trunk state. */
128+ INTNETTRUNKIFSTATE volatile enmTrunkState;
129+ bool volatile fActive;
130+ /** Disconnected from the host network interface. */
131+ bool volatile fDisconnectedFromHost;
132+ /** Rediscovery is pending.
133+ * cBusy will never reach zero during rediscovery, so which
134+ * takes care of serializing rediscovery and disconnecting. */
135+ bool volatile fRediscoveryPending;
136+ /** Whether we should not attempt to set promiscuous mode at all. */
137+ bool fDisablePromiscuous;
138+#if (ARCH_BITS == 32) && defined(__GNUC__)
139+#if 0
140+ uint32_t u32Padding; /**< Alignment padding, will assert in ASMAtomicUoWriteU64 otherwise. */
141+#endif
142+#endif
143+ /** The timestamp of the last rediscovery. */
144+ uint64_t volatile NanoTSLastRediscovery;
145+ /** Reference count. */
146+ uint32_t volatile cRefs;
147+ /** The busy count.
148+ * This counts the number of current callers and pending packet. */
149+ uint32_t volatile cBusy;
150+ /** The event that is signaled when we go idle and that pfnWaitForIdle blocks on. */
151+ RTSEMEVENT hEventIdle;
152+
153+ /** @todo move MacAddr out of this structure! */
154+ union
155+ {
156+#ifdef VBOXNETFLT_OS_SPECFIC
157+ struct
158+ {
159+# if defined(RT_OS_DARWIN)
160+ /** @name Darwin instance data.
161+ * @{ */
162+ /** Pointer to the darwin network interface we're attached to.
163+ * This is treated as highly volatile and should only be read and retained
164+ * while owning hSpinlock. Releasing references to this should not be done
165+ * while owning it though as we might end up destroying it in some paths. */
166+ ifnet_t volatile pIfNet;
167+ /** The interface filter handle.
168+ * Same access rules as with pIfNet. */
169+ interface_filter_t volatile pIfFilter;
170+ /** Whether we've need to set promiscuous mode when the interface comes up. */
171+ bool volatile fNeedSetPromiscuous;
172+ /** Whether we've successfully put the interface into to promiscuous mode.
173+ * This is for dealing with the ENETDOWN case. */
174+ bool volatile fSetPromiscuous;
175+ /** The MAC address of the interface. */
176+ RTMAC MacAddr;
177+ /** @} */
178+# elif defined(RT_OS_LINUX)
179+ /** @name Linux instance data
180+ * @{ */
181+ /** Pointer to the device. */
182+ struct net_device * volatile pDev;
183+ /** Whether we've successfully put the interface into to promiscuous mode.
184+ * This is for dealing with the ENETDOWN case. */
185+ bool volatile fPromiscuousSet;
186+ /** Whether device exists and physically attached. */
187+ bool volatile fRegistered;
188+ /** The MAC address of the interface. */
189+ RTMAC MacAddr;
190+ struct notifier_block Notifier;
191+ struct packet_type PacketType;
192+# ifndef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
193+ struct sk_buff_head XmitQueue;
194+ struct work_struct XmitTask;
195+# endif
196+ /** @} */
197+# elif defined(RT_OS_SOLARIS)
198+ /** @name Solaris instance data.
199+ * @{ */
200+# ifdef VBOX_WITH_NETFLT_CROSSBOW
201+ /** Whether the underlying interface is a VNIC or not. */
202+ bool fIsVNIC;
203+ /** Whether the underlying interface is a VNIC template or not. */
204+ bool fIsVNICTemplate;
205+ /** Handle to list of created VNICs. */
206+ list_t hVNICs;
207+ /** The MAC address of the host interface. */
208+ RTMAC MacAddr;
209+ /** Handle of this interface (lower MAC). */
210+ mac_handle_t hInterface;
211+ /** Handle to link state notifier. */
212+ mac_notify_handle_t hNotify;
213+# else
214+ /** Pointer to the bound IPv4 stream. */
215+ struct vboxnetflt_stream_t * volatile pIp4Stream;
216+ /** Pointer to the bound IPv6 stream. */
217+ struct vboxnetflt_stream_t * volatile pIp6Stream;
218+ /** Pointer to the bound ARP stream. */
219+ struct vboxnetflt_stream_t * volatile pArpStream;
220+ /** Pointer to the unbound promiscuous stream. */
221+ struct vboxnetflt_promisc_stream_t * volatile pPromiscStream;
222+ /** Whether we are attaching to IPv6 stream dynamically now. */
223+ bool volatile fAttaching;
224+ /** Whether this is a VLAN interface or not. */
225+ bool volatile fVLAN;
226+ /** Layered device handle to the interface. */
227+ ldi_handle_t hIface;
228+ /** The MAC address of the interface. */
229+ RTMAC MacAddr;
230+ /** Mutex protection used for loopback. */
231+ kmutex_t hMtx;
232+ /** Mutex protection used for dynamic IPv6 attaches. */
233+ RTSEMFASTMUTEX hPollMtx;
234+# endif
235+ /** @} */
236+# elif defined(RT_OS_FREEBSD)
237+ /** @name FreeBSD instance data.
238+ * @{ */
239+ /** Interface handle */
240+ struct ifnet *ifp;
241+ /** Netgraph node handle */
242+ node_p node;
243+ /** Input hook */
244+ hook_p input;
245+ /** Output hook */
246+ hook_p output;
247+ /** Original interface flags */
248+ unsigned int flags;
249+ /** Input queue */
250+ struct ifqueue inq;
251+ /** Output queue */
252+ struct ifqueue outq;
253+ /** Input task */
254+ struct task tskin;
255+ /** Output task */
256+ struct task tskout;
257+ /** The MAC address of the interface. */
258+ RTMAC MacAddr;
259+ /** @} */
260+# elif defined(RT_OS_WINDOWS)
261+ /** @name Windows instance data.
262+ * @{ */
263+ /** Filter driver device context. */
264+ VBOXNETFLTWIN WinIf;
265+
266+ volatile uint32_t cModeNetFltRefs;
267+ volatile uint32_t cModePassThruRefs;
268+#ifndef VBOXNETFLT_NO_PACKET_QUEUE
269+ /** Packet worker thread info */
270+ PACKET_QUEUE_WORKER PacketQueueWorker;
271+#endif
272+ /** The MAC address of the interface. Caching MAC for performance reasons. */
273+ RTMAC MacAddr;
274+ /** mutex used to synchronize WinIf init/deinit */
275+ RTSEMMUTEX hWinIfMutex;
276+ /** @} */
277+# else
278+# error "PORTME"
279+# endif
280+ } s;
281+#endif
282+ /** Padding. */
283+#if defined(RT_OS_WINDOWS)
284+# if defined(VBOX_NETFLT_ONDEMAND_BIND)
285+ uint8_t abPadding[192];
286+# elif defined(VBOXNETADP)
287+ uint8_t abPadding[256];
288+# else
289+ uint8_t abPadding[1024];
290+# endif
291+#elif defined(RT_OS_LINUX)
292+ uint8_t abPadding[320];
293+#elif defined(RT_OS_FREEBSD)
294+ uint8_t abPadding[320];
295+#else
296+ uint8_t abPadding[128];
297+#endif
298+ } u;
299+
300+ /** The interface name. */
301+ char szName[1];
302+} VBOXNETFLTINS;
303+/** Pointer to the instance data of a host network filter driver. */
304+typedef struct VBOXNETFLTINS *PVBOXNETFLTINS;
305+
306+AssertCompileMemberAlignment(VBOXNETFLTINS, NanoTSLastRediscovery, 8);
307+#ifdef VBOXNETFLT_OS_SPECFIC
308+AssertCompile(RT_SIZEOFMEMB(VBOXNETFLTINS, u.s) <= RT_SIZEOFMEMB(VBOXNETFLTINS, u.abPadding));
309+#endif
310+
311+
312+/**
313+ * The global data of the VBox filter driver.
314+ *
315+ * This contains the bit required for communicating with support driver, VBoxDrv
316+ * (start out as SupDrv).
317+ */
318+typedef struct VBOXNETFLTGLOBALS
319+{
320+ /** Mutex protecting the list of instances and state changes. */
321+ RTSEMFASTMUTEX hFastMtx;
322+ /** Pointer to a list of instance data. */
323+ PVBOXNETFLTINS pInstanceHead;
324+
325+ /** The INTNET trunk network interface factory. */
326+ INTNETTRUNKFACTORY TrunkFactory;
327+ /** The SUPDRV component factory registration. */
328+ SUPDRVFACTORY SupDrvFactory;
329+ /** The number of current factory references. */
330+ int32_t volatile cFactoryRefs;
331+ /** Whether the IDC connection is open or not.
332+ * This is only for cleaning up correctly after the separate IDC init on Windows. */
333+ bool fIDCOpen;
334+ /** The SUPDRV IDC handle (opaque struct). */
335+ SUPDRVIDCHANDLE SupDrvIDC;
336+} VBOXNETFLTGLOBALS;
337+
338+
339+DECLHIDDEN(int) vboxNetFltInitGlobalsAndIdc(PVBOXNETFLTGLOBALS pGlobals);
340+DECLHIDDEN(int) vboxNetFltInitGlobals(PVBOXNETFLTGLOBALS pGlobals);
341+DECLHIDDEN(int) vboxNetFltInitIdc(PVBOXNETFLTGLOBALS pGlobals);
342+DECLHIDDEN(int) vboxNetFltTryDeleteIdcAndGlobals(PVBOXNETFLTGLOBALS pGlobals);
343+DECLHIDDEN(void) vboxNetFltDeleteGlobals(PVBOXNETFLTGLOBALS pGlobals);
344+DECLHIDDEN(int) vboxNetFltTryDeleteIdc(PVBOXNETFLTGLOBALS pGlobals);
345+
346+DECLHIDDEN(bool) vboxNetFltCanUnload(PVBOXNETFLTGLOBALS pGlobals);
347+DECLHIDDEN(PVBOXNETFLTINS) vboxNetFltFindInstance(PVBOXNETFLTGLOBALS pGlobals, const char *pszName);
348+
349+DECLHIDDEN(void) vboxNetFltRetain(PVBOXNETFLTINS pThis, bool fBusy);
350+DECLHIDDEN(bool) vboxNetFltTryRetainBusyActive(PVBOXNETFLTINS pThis);
351+DECLHIDDEN(bool) vboxNetFltTryRetainBusyNotDisconnected(PVBOXNETFLTINS pThis);
352+DECLHIDDEN(void) vboxNetFltRelease(PVBOXNETFLTINS pThis, bool fBusy);
353+
354+#ifdef VBOXNETFLT_STATIC_CONFIG
355+DECLHIDDEN(int) vboxNetFltSearchCreateInstance(PVBOXNETFLTGLOBALS pGlobals, const char *pszName, PVBOXNETFLTINS *ppInstance, void * pContext);
356+#endif
357+
358+
359+
360+/** @name The OS specific interface.
361+ * @{ */
362+/**
363+ * Try rediscover the host interface.
364+ *
365+ * This is called periodically from the transmit path if we're marked as
366+ * disconnected from the host. There is no chance of a race here.
367+ *
368+ * @returns true if the interface was successfully rediscovered and reattach,
369+ * otherwise false.
370+ * @param pThis The new instance.
371+ */
372+DECLHIDDEN(bool) vboxNetFltOsMaybeRediscovered(PVBOXNETFLTINS pThis);
373+
374+/**
375+ * Transmits a frame.
376+ *
377+ * @return IPRT status code.
378+ * @param pThis The new instance.
379+ * @param pvIfData Pointer to the host-private interface data.
380+ * @param pSG The (scatter/)gather list.
381+ * @param fDst The destination mask. At least one bit will be set.
382+ *
383+ * @remarks Owns the out-bound trunk port semaphore.
384+ */
385+DECLHIDDEN(int) vboxNetFltPortOsXmit(PVBOXNETFLTINS pThis, void *pvIfData, PINTNETSG pSG, uint32_t fDst);
386+
387+/**
388+ * This is called when activating or suspending the instance.
389+ *
390+ * Use this method to enable and disable promiscuous mode on
391+ * the interface to prevent unnecessary interrupt load.
392+ *
393+ * It is only called when the state changes.
394+ *
395+ * @param pThis The instance.
396+ *
397+ * @remarks Owns the lock for the out-bound trunk port.
398+ */
399+DECLHIDDEN(void) vboxNetFltPortOsSetActive(PVBOXNETFLTINS pThis, bool fActive);
400+
401+/**
402+ * This is called when a network interface has obtained a new MAC address.
403+ *
404+ * @param pThis The instance.
405+ * @param pvIfData Pointer to the private interface data.
406+ * @param pMac Pointer to the new MAC address.
407+ */
408+DECLHIDDEN(void) vboxNetFltPortOsNotifyMacAddress(PVBOXNETFLTINS pThis, void *pvIfData, PCRTMAC pMac);
409+
410+/**
411+ * This is called when an interface is connected to the network.
412+ *
413+ * @return IPRT status code.
414+ * @param pThis The instance.
415+ * @param pvIf Pointer to the interface.
416+ * @param ppvIfData Where to store the private interface data.
417+ */
418+DECLHIDDEN(int) vboxNetFltPortOsConnectInterface(PVBOXNETFLTINS pThis, void *pvIf, void **ppvIfData);
419+
420+/**
421+ * This is called when a VM host disconnects from the network.
422+ *
423+ * @param pThis The instance.
424+ * @param pvIfData Pointer to the private interface data.
425+ */
426+DECLHIDDEN(int) vboxNetFltPortOsDisconnectInterface(PVBOXNETFLTINS pThis, void *pvIfData);
427+
428+/**
429+ * This is called to when disconnecting from a network.
430+ *
431+ * @return IPRT status code.
432+ * @param pThis The new instance.
433+ *
434+ * @remarks May own the semaphores for the global list, the network lock and the out-bound trunk port.
435+ */
436+DECLHIDDEN(int) vboxNetFltOsDisconnectIt(PVBOXNETFLTINS pThis);
437+
438+/**
439+ * This is called to when connecting to a network.
440+ *
441+ * @return IPRT status code.
442+ * @param pThis The new instance.
443+ *
444+ * @remarks Owns the semaphores for the global list, the network lock and the out-bound trunk port.
445+ */
446+DECLHIDDEN(int) vboxNetFltOsConnectIt(PVBOXNETFLTINS pThis);
447+
448+/**
449+ * Counter part to vboxNetFltOsInitInstance().
450+ *
451+ * @return IPRT status code.
452+ * @param pThis The new instance.
453+ *
454+ * @remarks May own the semaphores for the global list, the network lock and the out-bound trunk port.
455+ */
456+DECLHIDDEN(void) vboxNetFltOsDeleteInstance(PVBOXNETFLTINS pThis);
457+
458+/**
459+ * This is called to attach to the actual host interface
460+ * after linking the instance into the list.
461+ *
462+ * The MAC address as well promiscuousness and GSO capabilities should be
463+ * reported by this function.
464+ *
465+ * @return IPRT status code.
466+ * @param pThis The new instance.
467+ * @param pvContext The user supplied context in the static config only.
468+ * NULL in the dynamic config.
469+ *
470+ * @remarks Owns no locks.
471+ */
472+DECLHIDDEN(int) vboxNetFltOsInitInstance(PVBOXNETFLTINS pThis, void *pvContext);
473+
474+/**
475+ * This is called to perform structure initializations.
476+ *
477+ * @return IPRT status code.
478+ * @param pThis The new instance.
479+ *
480+ * @remarks Owns no locks.
481+ */
482+DECLHIDDEN(int) vboxNetFltOsPreInitInstance(PVBOXNETFLTINS pThis);
483+/** @} */
484+
485+
486+RT_C_DECLS_END
487+
488+#endif
489+
490
491=== added directory '.pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/linux'
492=== added file '.pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/linux/VBoxNetFlt-linux.c'
493--- .pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/linux/VBoxNetFlt-linux.c 1970-01-01 00:00:00 +0000
494+++ .pc/37-fix-unregister-netdevice.patch/src/VBox/HostDrivers/VBoxNetFlt/linux/VBoxNetFlt-linux.c 2013-03-14 12:42:23 +0000
495@@ -0,0 +1,2555 @@
496+/* $Id: VBoxNetFlt-linux.c $ */
497+/** @file
498+ * VBoxNetFlt - Network Filter Driver (Host), Linux Specific Code.
499+ */
500+
501+/*
502+ * Copyright (C) 2006-2008 Oracle Corporation
503+ *
504+ * This file is part of VirtualBox Open Source Edition (OSE), as
505+ * available from http://www.virtualbox.org. This file is free software;
506+ * you can redistribute it and/or modify it under the terms of the GNU
507+ * General Public License (GPL) as published by the Free Software
508+ * Foundation, in version 2 as it comes in the "COPYING" file of the
509+ * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
510+ * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
511+ */
512+
513+/*******************************************************************************
514+* Header Files *
515+*******************************************************************************/
516+#define LOG_GROUP LOG_GROUP_NET_FLT_DRV
517+#define VBOXNETFLT_LINUX_NO_XMIT_QUEUE
518+#include "the-linux-kernel.h"
519+#include "version-generated.h"
520+#include "product-generated.h"
521+#include <linux/netdevice.h>
522+#include <linux/etherdevice.h>
523+#include <linux/rtnetlink.h>
524+#include <linux/miscdevice.h>
525+#include <linux/ip.h>
526+
527+#include <VBox/log.h>
528+#include <VBox/err.h>
529+#include <VBox/intnetinline.h>
530+#include <VBox/vmm/pdmnetinline.h>
531+#include <VBox/param.h>
532+#include <iprt/alloca.h>
533+#include <iprt/assert.h>
534+#include <iprt/spinlock.h>
535+#include <iprt/semaphore.h>
536+#include <iprt/initterm.h>
537+#include <iprt/process.h>
538+#include <iprt/mem.h>
539+#include <iprt/net.h>
540+#include <iprt/log.h>
541+#include <iprt/mp.h>
542+#include <iprt/mem.h>
543+#include <iprt/time.h>
544+
545+#define VBOXNETFLT_OS_SPECFIC 1
546+#include "../VBoxNetFltInternal.h"
547+
548+#define VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
549+#ifdef CONFIG_NET_SCHED
550+/*# define VBOXNETFLT_WITH_QDISC Comment this out to disable qdisc support */
551+# ifdef VBOXNETFLT_WITH_QDISC
552+# include <net/pkt_sched.h>
553+# endif /* VBOXNETFLT_WITH_QDISC */
554+#endif
555+
556+
557+/*******************************************************************************
558+* Defined Constants And Macros *
559+*******************************************************************************/
560+#define VBOX_FLT_NB_TO_INST(pNB) RT_FROM_MEMBER(pNB, VBOXNETFLTINS, u.s.Notifier)
561+#define VBOX_FLT_PT_TO_INST(pPT) RT_FROM_MEMBER(pPT, VBOXNETFLTINS, u.s.PacketType)
562+#ifndef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
563+# define VBOX_FLT_XT_TO_INST(pXT) RT_FROM_MEMBER(pXT, VBOXNETFLTINS, u.s.XmitTask)
564+#endif
565+
566+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 22)
567+# define VBOX_SKB_RESET_NETWORK_HDR(skb) skb_reset_network_header(skb)
568+# define VBOX_SKB_RESET_MAC_HDR(skb) skb_reset_mac_header(skb)
569+#else
570+# define VBOX_SKB_RESET_NETWORK_HDR(skb) skb->nh.raw = skb->data
571+# define VBOX_SKB_RESET_MAC_HDR(skb) skb->mac.raw = skb->data
572+#endif
573+
574+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 19)
575+# define VBOX_SKB_CHECKSUM_HELP(skb) skb_checksum_help(skb)
576+#else
577+# define CHECKSUM_PARTIAL CHECKSUM_HW
578+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 10)
579+# define VBOX_SKB_CHECKSUM_HELP(skb) skb_checksum_help(skb, 0)
580+# else
581+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 7)
582+# define VBOX_SKB_CHECKSUM_HELP(skb) skb_checksum_help(&skb, 0)
583+# else
584+# define VBOX_SKB_CHECKSUM_HELP(skb) (!skb_checksum_help(skb))
585+# endif
586+/* Versions prior 2.6.10 use stats for both bstats and qstats */
587+# define bstats stats
588+# define qstats stats
589+# endif
590+#endif
591+
592+#ifdef VBOXNETFLT_WITH_QDISC
593+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 13)
594+static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
595+{
596+ kfree_skb(skb);
597+ sch->stats.drops++;
598+
599+ return NET_XMIT_DROP;
600+}
601+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 13) */
602+#endif /* VBOXNETFLT_WITH_QDISC */
603+
604+#ifndef NET_IP_ALIGN
605+# define NET_IP_ALIGN 2
606+#endif
607+
608+#if 0
609+/** Create scatter / gather segments for fragments. When not used, we will
610+ * linearize the socket buffer before creating the internal networking SG. */
611+# define VBOXNETFLT_SG_SUPPORT 1
612+#endif
613+
614+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 18)
615+/** Indicates that the linux kernel may send us GSO frames. */
616+# define VBOXNETFLT_WITH_GSO 1
617+
618+/** This enables or disables the transmitting of GSO frame from the internal
619+ * network and to the host. */
620+# define VBOXNETFLT_WITH_GSO_XMIT_HOST 1
621+
622+# if 0 /** @todo This is currently disable because it causes performance loss of 5-10%. */
623+/** This enables or disables the transmitting of GSO frame from the internal
624+ * network and to the wire. */
625+# define VBOXNETFLT_WITH_GSO_XMIT_WIRE 1
626+# endif
627+
628+/** This enables or disables the forwarding/flooding of GSO frame from the host
629+ * to the internal network. */
630+# define VBOXNETFLT_WITH_GSO_RECV 1
631+
632+#endif
633+
634+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29)
635+/** This enables or disables handling of GSO frames coming from the wire (GRO). */
636+# define VBOXNETFLT_WITH_GRO 1
637+#endif
638+/*
639+ * GRO support was backported to RHEL 5.4
640+ */
641+#ifdef RHEL_RELEASE_CODE
642+# if RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(5, 4)
643+# define VBOXNETFLT_WITH_GRO 1
644+# endif
645+#endif
646+
647+/*******************************************************************************
648+* Internal Functions *
649+*******************************************************************************/
650+static int VBoxNetFltLinuxInit(void);
651+static void VBoxNetFltLinuxUnload(void);
652+static void vboxNetFltLinuxForwardToIntNet(PVBOXNETFLTINS pThis, struct sk_buff *pBuf);
653+
654+
655+/*******************************************************************************
656+* Global Variables *
657+*******************************************************************************/
658+/**
659+ * The (common) global data.
660+ */
661+static VBOXNETFLTGLOBALS g_VBoxNetFltGlobals;
662+
663+module_init(VBoxNetFltLinuxInit);
664+module_exit(VBoxNetFltLinuxUnload);
665+
666+MODULE_AUTHOR(VBOX_VENDOR);
667+MODULE_DESCRIPTION(VBOX_PRODUCT " Network Filter Driver");
668+MODULE_LICENSE("GPL");
669+#ifdef MODULE_VERSION
670+MODULE_VERSION(VBOX_VERSION_STRING " (" RT_XSTR(INTNETTRUNKIFPORT_VERSION) ")");
671+#endif
672+
673+
674+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 12) && defined(LOG_ENABLED)
675+unsigned dev_get_flags(const struct net_device *dev)
676+{
677+ unsigned flags;
678+
679+ flags = (dev->flags & ~(IFF_PROMISC |
680+ IFF_ALLMULTI |
681+ IFF_RUNNING)) |
682+ (dev->gflags & (IFF_PROMISC |
683+ IFF_ALLMULTI));
684+
685+ if (netif_running(dev) && netif_carrier_ok(dev))
686+ flags |= IFF_RUNNING;
687+
688+ return flags;
689+}
690+#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 12) */
691+
692+
693+#ifdef VBOXNETFLT_WITH_QDISC
694+//#define QDISC_LOG(x) printk x
695+# define QDISC_LOG(x) do { } while (0)
696+
697+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 20)
698+# define QDISC_CREATE(dev, queue, ops, parent) qdisc_create_dflt(dev, ops)
699+# elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 27)
700+# define QDISC_CREATE(dev, queue, ops, parent) qdisc_create_dflt(dev, ops, parent)
701+# elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 37)
702+# define QDISC_CREATE(dev, queue, ops, parent) qdisc_create_dflt(dev, queue, ops, parent)
703+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37) */
704+# define QDISC_CREATE(dev, queue, ops, parent) qdisc_create_dflt(queue, ops, parent)
705+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37) */
706+
707+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 27)
708+# define qdisc_dev(qdisc) (qdisc->dev)
709+# define qdisc_pkt_len(skb) (skb->len)
710+# define QDISC_GET(dev) (dev->qdisc_sleeping)
711+# else
712+# define QDISC_GET(dev) (netdev_get_tx_queue(dev, 0)->qdisc_sleeping)
713+# endif
714+
715+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 27)
716+# define QDISC_SAVED_NUM(dev) 1
717+# elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32)
718+# define QDISC_SAVED_NUM(dev) dev->num_tx_queues
719+# else
720+# define QDISC_SAVED_NUM(dev) dev->num_tx_queues+1
721+# endif
722+
723+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 27)
724+# define QDISC_IS_BUSY(dev, qdisc) test_bit(__LINK_STATE_SCHED, &dev->state)
725+# elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36)
726+# define QDISC_IS_BUSY(dev, qdisc) (test_bit(__QDISC_STATE_RUNNING, &qdisc->state) || \
727+ test_bit(__QDISC_STATE_SCHED, &qdisc->state))
728+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36) */
729+# define QDISC_IS_BUSY(dev, qdisc) (qdisc_is_running(qdisc) || \
730+ test_bit(__QDISC_STATE_SCHED, &qdisc->state))
731+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36) */
732+
733+struct VBoxNetQDiscPriv
734+{
735+ /** Pointer to the single child qdisc. */
736+ struct Qdisc *pChild;
737+ /*
738+ * Technically it is possible to have different qdiscs for different TX
739+ * queues so we have to save them all.
740+ */
741+ /** Pointer to the array of saved qdiscs. */
742+ struct Qdisc **ppSaved;
743+ /** Pointer to the net filter instance. */
744+ PVBOXNETFLTINS pVBoxNetFlt;
745+};
746+typedef struct VBoxNetQDiscPriv *PVBOXNETQDISCPRIV;
747+
748+//#define VBOXNETFLT_QDISC_ENQUEUE
749+static int vboxNetFltQdiscEnqueue(struct sk_buff *skb, struct Qdisc *sch)
750+{
751+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
752+ int rc;
753+
754+# ifdef VBOXNETFLT_QDISC_ENQUEUE
755+ if (VALID_PTR(pPriv->pVBoxNetFlt))
756+ {
757+ uint8_t abHdrBuf[sizeof(RTNETETHERHDR) + sizeof(uint32_t) + RTNETIPV4_MIN_LEN];
758+ PCRTNETETHERHDR pEtherHdr;
759+ PINTNETTRUNKSWPORT pSwitchPort;
760+ uint32_t cbHdrs = skb_headlen(skb);
761+
762+ cbHdrs = RT_MIN(cbHdrs, sizeof(abHdrBuf));
763+ pEtherHdr = (PCRTNETETHERHDR)skb_header_pointer(skb, 0, cbHdrs, &abHdrBuf[0]);
764+ if ( pEtherHdr
765+ && (pSwitchPort = pPriv->pVBoxNetFlt->pSwitchPort) != NULL
766+ && VALID_PTR(pSwitchPort)
767+ && cbHdrs >= 6)
768+ {
769+ /** @todo consider reference counting, etc. */
770+ INTNETSWDECISION enmDecision = pSwitchPort->pfnPreRecv(pSwitchPort, pEtherHdr, cbHdrs, INTNETTRUNKDIR_HOST);
771+ if (enmDecision == INTNETSWDECISION_INTNET)
772+ {
773+ struct sk_buff *pBuf = skb_copy(skb, GFP_ATOMIC);
774+ pBuf->pkt_type = PACKET_OUTGOING;
775+ vboxNetFltLinuxForwardToIntNet(pPriv->pVBoxNetFlt, pBuf);
776+ qdisc_drop(skb, sch);
777+ ++sch->bstats.packets;
778+ sch->bstats.bytes += qdisc_pkt_len(skb);
779+ return NET_XMIT_SUCCESS;
780+ }
781+ }
782+ }
783+# endif /* VBOXNETFLT_QDISC_ENQUEUE */
784+ rc = pPriv->pChild->enqueue(skb, pPriv->pChild);
785+ if (rc == NET_XMIT_SUCCESS)
786+ {
787+ ++sch->q.qlen;
788+ ++sch->bstats.packets;
789+ sch->bstats.bytes += qdisc_pkt_len(skb);
790+ }
791+ else
792+ ++sch->qstats.drops;
793+ return rc;
794+}
795+
796+static struct sk_buff *vboxNetFltQdiscDequeue(struct Qdisc *sch)
797+{
798+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
799+# ifdef VBOXNETFLT_QDISC_ENQUEUE
800+ --sch->q.qlen;
801+ return pPriv->pChild->dequeue(pPriv->pChild);
802+# else /* VBOXNETFLT_QDISC_ENQUEUE */
803+ uint8_t abHdrBuf[sizeof(RTNETETHERHDR) + sizeof(uint32_t) + RTNETIPV4_MIN_LEN];
804+ PCRTNETETHERHDR pEtherHdr;
805+ PINTNETTRUNKSWPORT pSwitchPort;
806+ struct sk_buff *pSkb;
807+
808+ QDISC_LOG(("vboxNetFltDequeue: Enter pThis=%p\n", pPriv->pVBoxNetFlt));
809+
810+ while ((pSkb = pPriv->pChild->dequeue(pPriv->pChild)) != NULL)
811+ {
812+ struct sk_buff *pBuf;
813+ INTNETSWDECISION enmDecision;
814+ uint32_t cbHdrs;
815+
816+ --sch->q.qlen;
817+
818+ if (!VALID_PTR(pPriv->pVBoxNetFlt))
819+ break;
820+
821+ cbHdrs = skb_headlen(pSkb);
822+ cbHdrs = RT_MIN(cbHdrs, sizeof(abHdrBuf));
823+ pEtherHdr = (PCRTNETETHERHDR)skb_header_pointer(pSkb, 0, cbHdrs, &abHdrBuf[0]);
824+ if ( !pEtherHdr
825+ || (pSwitchPort = pPriv->pVBoxNetFlt->pSwitchPort) == NULL
826+ || !VALID_PTR(pSwitchPort)
827+ || cbHdrs < 6)
828+ break;
829+
830+ /** @todo consider reference counting, etc. */
831+ enmDecision = pSwitchPort->pfnPreRecv(pSwitchPort, pEtherHdr, cbHdrs, INTNETTRUNKDIR_HOST);
832+ if (enmDecision != INTNETSWDECISION_INTNET)
833+ break;
834+
835+ pBuf = skb_copy(pSkb, GFP_ATOMIC);
836+ pBuf->pkt_type = PACKET_OUTGOING;
837+ QDISC_LOG(("vboxNetFltDequeue: pThis=%p\n", pPriv->pVBoxNetFlt));
838+ vboxNetFltLinuxForwardToIntNet(pPriv->pVBoxNetFlt, pBuf);
839+ qdisc_drop(pSkb, sch);
840+ QDISC_LOG(("VBoxNetFlt: Packet for %02x:%02x:%02x:%02x:%02x:%02x dropped\n",
841+ pSkb->data[0], pSkb->data[1], pSkb->data[2],
842+ pSkb->data[3], pSkb->data[4], pSkb->data[5]));
843+ }
844+
845+ return pSkb;
846+# endif /* VBOXNETFLT_QDISC_ENQUEUE */
847+}
848+
849+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
850+static int vboxNetFltQdiscRequeue(struct sk_buff *skb, struct Qdisc *sch)
851+{
852+ int rc;
853+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
854+
855+ rc = pPriv->pChild->ops->requeue(skb, pPriv->pChild);
856+ if (rc == 0)
857+ {
858+ sch->q.qlen++;
859+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 10)
860+ sch->qstats.requeues++;
861+# endif
862+ }
863+
864+ return rc;
865+}
866+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29) */
867+
868+static unsigned int vboxNetFltQdiscDrop(struct Qdisc *sch)
869+{
870+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
871+ unsigned int cbLen;
872+
873+ if (pPriv->pChild->ops->drop)
874+ {
875+ cbLen = pPriv->pChild->ops->drop(pPriv->pChild);
876+ if (cbLen != 0)
877+ {
878+ ++sch->qstats.drops;
879+ --sch->q.qlen;
880+ return cbLen;
881+ }
882+ }
883+
884+ return 0;
885+}
886+
887+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 25)
888+static int vboxNetFltQdiscInit(struct Qdisc *sch, struct rtattr *opt)
889+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 25) */
890+static int vboxNetFltQdiscInit(struct Qdisc *sch, struct nlattr *opt)
891+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 25) */
892+{
893+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
894+ struct net_device *pDev = qdisc_dev(sch);
895+
896+ pPriv->pVBoxNetFlt = NULL;
897+
898+ pPriv->ppSaved = kcalloc(QDISC_SAVED_NUM(pDev), sizeof(pPriv->ppSaved[0]),
899+ GFP_KERNEL);
900+ if (!pPriv->ppSaved)
901+ return -ENOMEM;
902+
903+ pPriv->pChild = QDISC_CREATE(pDev, netdev_get_tx_queue(pDev, 0),
904+ &pfifo_qdisc_ops,
905+ TC_H_MAKE(TC_H_MAJ(sch->handle),
906+ TC_H_MIN(1)));
907+ if (!pPriv->pChild)
908+ {
909+ kfree(pPriv->ppSaved);
910+ pPriv->ppSaved = NULL;
911+ return -ENOMEM;
912+ }
913+
914+ return 0;
915+}
916+
917+static void vboxNetFltQdiscReset(struct Qdisc *sch)
918+{
919+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
920+
921+ qdisc_reset(pPriv->pChild);
922+ sch->q.qlen = 0;
923+ sch->qstats.backlog = 0;
924+}
925+
926+static void vboxNetFltQdiscDestroy(struct Qdisc* sch)
927+{
928+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
929+ struct net_device *pDev = qdisc_dev(sch);
930+
931+ qdisc_destroy(pPriv->pChild);
932+ pPriv->pChild = NULL;
933+
934+ if (pPriv->ppSaved)
935+ {
936+ int i;
937+ for (i = 0; i < QDISC_SAVED_NUM(pDev); i++)
938+ if (pPriv->ppSaved[i])
939+ qdisc_destroy(pPriv->ppSaved[i]);
940+ kfree(pPriv->ppSaved);
941+ pPriv->ppSaved = NULL;
942+ }
943+}
944+
945+static int vboxNetFltClassGraft(struct Qdisc *sch, unsigned long arg, struct Qdisc *pNew,
946+ struct Qdisc **ppOld)
947+{
948+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
949+
950+ if (pNew == NULL)
951+ pNew = &noop_qdisc;
952+
953+ sch_tree_lock(sch);
954+ *ppOld = pPriv->pChild;
955+ pPriv->pChild = pNew;
956+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 20)
957+ sch->q.qlen = 0;
958+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 20) */
959+ qdisc_tree_decrease_qlen(*ppOld, (*ppOld)->q.qlen);
960+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 20) */
961+ qdisc_reset(*ppOld);
962+ sch_tree_unlock(sch);
963+
964+ return 0;
965+}
966+
967+static struct Qdisc *vboxNetFltClassLeaf(struct Qdisc *sch, unsigned long arg)
968+{
969+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
970+ return pPriv->pChild;
971+}
972+
973+static unsigned long vboxNetFltClassGet(struct Qdisc *sch, u32 classid)
974+{
975+ return 1;
976+}
977+
978+static void vboxNetFltClassPut(struct Qdisc *sch, unsigned long arg)
979+{
980+}
981+
982+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 25)
983+static int vboxNetFltClassChange(struct Qdisc *sch, u32 classid, u32 parentid,
984+ struct rtattr **tca, unsigned long *arg)
985+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 25) */
986+static int vboxNetFltClassChange(struct Qdisc *sch, u32 classid, u32 parentid,
987+ struct nlattr **tca, unsigned long *arg)
988+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 25) */
989+{
990+ return -ENOSYS;
991+}
992+
993+static int vboxNetFltClassDelete(struct Qdisc *sch, unsigned long arg)
994+{
995+ return -ENOSYS;
996+}
997+
998+static void vboxNetFltClassWalk(struct Qdisc *sch, struct qdisc_walker *walker)
999+{
1000+ if (!walker->stop) {
1001+ if (walker->count >= walker->skip)
1002+ if (walker->fn(sch, 1, walker) < 0) {
1003+ walker->stop = 1;
1004+ return;
1005+ }
1006+ walker->count++;
1007+ }
1008+}
1009+
1010+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32)
1011+static struct tcf_proto **vboxNetFltClassFindTcf(struct Qdisc *sch, unsigned long cl)
1012+{
1013+ return NULL;
1014+}
1015+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32) */
1016+
1017+static int vboxNetFltClassDump(struct Qdisc *sch, unsigned long cl,
1018+ struct sk_buff *skb, struct tcmsg *tcm)
1019+{
1020+ PVBOXNETQDISCPRIV pPriv = qdisc_priv(sch);
1021+
1022+ if (cl != 1)
1023+ return -ENOENT;
1024+
1025+ tcm->tcm_handle |= TC_H_MIN(1);
1026+ tcm->tcm_info = pPriv->pChild->handle;
1027+
1028+ return 0;
1029+}
1030+
1031+
1032+static struct Qdisc_class_ops g_VBoxNetFltClassOps =
1033+{
1034+ .graft = vboxNetFltClassGraft,
1035+ .leaf = vboxNetFltClassLeaf,
1036+ .get = vboxNetFltClassGet,
1037+ .put = vboxNetFltClassPut,
1038+ .change = vboxNetFltClassChange,
1039+ .delete = vboxNetFltClassDelete,
1040+ .walk = vboxNetFltClassWalk,
1041+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32)
1042+ .tcf_chain = vboxNetFltClassFindTcf,
1043+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 32) */
1044+ .dump = vboxNetFltClassDump,
1045+};
1046+
1047+
1048+static struct Qdisc_ops g_VBoxNetFltQDiscOps = {
1049+ .cl_ops = &g_VBoxNetFltClassOps,
1050+ .id = "vboxnetflt",
1051+ .priv_size = sizeof(struct VBoxNetQDiscPriv),
1052+ .enqueue = vboxNetFltQdiscEnqueue,
1053+ .dequeue = vboxNetFltQdiscDequeue,
1054+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
1055+ .requeue = vboxNetFltQdiscRequeue,
1056+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1057+ .peek = qdisc_peek_dequeued,
1058+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1059+ .drop = vboxNetFltQdiscDrop,
1060+ .init = vboxNetFltQdiscInit,
1061+ .reset = vboxNetFltQdiscReset,
1062+ .destroy = vboxNetFltQdiscDestroy,
1063+ .owner = THIS_MODULE
1064+};
1065+
1066+/*
1067+ * If our qdisc is already attached to the device (that means the user
1068+ * installed it from command line with 'tc' command) we simply update
1069+ * the pointer to vboxnetflt instance in qdisc's private structure.
1070+ * Otherwise we need to take some additional steps:
1071+ * - Create our qdisc;
1072+ * - Save all references to qdiscs;
1073+ * - Replace our child with the first qdisc reference;
1074+ * - Replace all references so they point to our qdisc.
1075+ */
1076+static void vboxNetFltLinuxQdiscInstall(PVBOXNETFLTINS pThis, struct net_device *pDev)
1077+{
1078+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27)
1079+ int i;
1080+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27) */
1081+ PVBOXNETQDISCPRIV pPriv;
1082+
1083+ struct Qdisc *pExisting = QDISC_GET(pDev);
1084+ /* Do not install our qdisc for devices with no TX queues */
1085+ if (!pExisting->enqueue)
1086+ return;
1087+ if (strcmp(pExisting->ops->id, "vboxnetflt"))
1088+ {
1089+ /* The existing qdisc is different from ours, let's create new one. */
1090+ struct Qdisc *pNew = QDISC_CREATE(pDev, netdev_get_tx_queue(pDev, 0),
1091+ &g_VBoxNetFltQDiscOps, TC_H_ROOT);
1092+ if (!pNew)
1093+ return; // TODO: Error?
1094+
1095+ if (!try_module_get(THIS_MODULE))
1096+ {
1097+ /*
1098+ * This may cause a memory leak but calling qdisc_destroy()
1099+ * is not an option as it will call module_put().
1100+ */
1101+ return;
1102+ }
1103+ pPriv = qdisc_priv(pNew);
1104+
1105+ qdisc_destroy(pPriv->pChild);
1106+ pPriv->pChild = QDISC_GET(pDev);
1107+ atomic_inc(&pPriv->pChild->refcnt);
1108+ /*
1109+ * There is no need in deactivating the device or acquiring any locks
1110+ * prior changing qdiscs since we do not destroy the old qdisc.
1111+ * Atomic replacement of pointers is enough.
1112+ */
1113+ /*
1114+ * No need to change reference counters here as we merely move
1115+ * the pointer and the reference counter of the newly allocated
1116+ * qdisc is already 1.
1117+ */
1118+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 27)
1119+ pPriv->ppSaved[0] = pDev->qdisc_sleeping;
1120+ ASMAtomicWritePtr(&pDev->qdisc_sleeping, pNew);
1121+ ASMAtomicWritePtr(&pDev->qdisc, pNew);
1122+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27) */
1123+ for (i = 0; i < pDev->num_tx_queues; i++)
1124+ {
1125+ struct netdev_queue *pQueue = netdev_get_tx_queue(pDev, i);
1126+
1127+ pPriv->ppSaved[i] = pQueue->qdisc_sleeping;
1128+ ASMAtomicWritePtr(&pQueue->qdisc_sleeping, pNew);
1129+ ASMAtomicWritePtr(&pQueue->qdisc, pNew);
1130+ if (i)
1131+ atomic_inc(&pNew->refcnt);
1132+ }
1133+ /* Newer kernels store root qdisc in netdev structure as well. */
1134+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32)
1135+ pPriv->ppSaved[pDev->num_tx_queues] = pDev->qdisc;
1136+ ASMAtomicWritePtr(&pDev->qdisc, pNew);
1137+ atomic_inc(&pNew->refcnt);
1138+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32) */
1139+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27) */
1140+ /* Sync the queue len with our child */
1141+ pNew->q.qlen = pPriv->pChild->q.qlen;
1142+ }
1143+ else
1144+ {
1145+ /* We already have vboxnetflt qdisc, let's use it. */
1146+ pPriv = qdisc_priv(pExisting);
1147+ }
1148+ ASMAtomicWritePtr(&pPriv->pVBoxNetFlt, pThis);
1149+ QDISC_LOG(("vboxNetFltLinuxInstallQdisc: pThis=%p\n", pPriv->pVBoxNetFlt));
1150+}
1151+
1152+static void vboxNetFltLinuxQdiscRemove(PVBOXNETFLTINS pThis, struct net_device *pDev)
1153+{
1154+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27)
1155+ int i;
1156+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27) */
1157+ PVBOXNETQDISCPRIV pPriv;
1158+ struct Qdisc *pQdisc, *pChild;
1159+ if (!pDev)
1160+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
1161+ if (!VALID_PTR(pDev))
1162+ {
1163+ printk("VBoxNetFlt: Failed to detach qdisc, invalid device pointer: %p\n",
1164+ pDev);
1165+ return; // TODO: Consider returing an error
1166+ }
1167+
1168+
1169+ pQdisc = QDISC_GET(pDev);
1170+ if (strcmp(pQdisc->ops->id, "vboxnetflt"))
1171+ {
1172+ if (pQdisc->enqueue)
1173+ {
1174+ /* Looks like the user has replaced our qdisc manually. */
1175+ printk("VBoxNetFlt: Failed to detach qdisc, wrong qdisc: %s\n",
1176+ pQdisc->ops->id);
1177+ }
1178+ return; // TODO: Consider returing an error
1179+ }
1180+
1181+ pPriv = qdisc_priv(pQdisc);
1182+ Assert(pPriv->pVBoxNetFlt == pThis);
1183+ ASMAtomicWriteNullPtr(&pPriv->pVBoxNetFlt);
1184+ pChild = ASMAtomicXchgPtrT(&pPriv->pChild, &noop_qdisc, struct Qdisc *);
1185+ qdisc_destroy(pChild); /* It won't be the last reference. */
1186+
1187+ QDISC_LOG(("vboxNetFltLinuxQdiscRemove: refcnt=%d num_tx_queues=%d\n",
1188+ atomic_read(&pQdisc->refcnt), pDev->num_tx_queues));
1189+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 27)
1190+ /* Play it safe, make sure the qdisc is not being used. */
1191+ if (pPriv->ppSaved[0])
1192+ {
1193+ ASMAtomicWritePtr(&pDev->qdisc_sleeping, pPriv->ppSaved[0]);
1194+ ASMAtomicWritePtr(&pDev->qdisc, pPriv->ppSaved[0]);
1195+ pPriv->ppSaved[0] = NULL;
1196+ while (QDISC_IS_BUSY(pDev, pQdisc))
1197+ yield();
1198+ qdisc_destroy(pQdisc); /* Destroy reference */
1199+ }
1200+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27) */
1201+ for (i = 0; i < pDev->num_tx_queues; i++)
1202+ {
1203+ struct netdev_queue *pQueue = netdev_get_tx_queue(pDev, i);
1204+ if (pPriv->ppSaved[i])
1205+ {
1206+ Assert(pQueue->qdisc_sleeping == pQdisc);
1207+ ASMAtomicWritePtr(&pQueue->qdisc_sleeping, pPriv->ppSaved[i]);
1208+ ASMAtomicWritePtr(&pQueue->qdisc, pPriv->ppSaved[i]);
1209+ pPriv->ppSaved[i] = NULL;
1210+ while (QDISC_IS_BUSY(pDev, pQdisc))
1211+ yield();
1212+ qdisc_destroy(pQdisc); /* Destroy reference */
1213+ }
1214+ }
1215+ /* Newer kernels store root qdisc in netdev structure as well. */
1216+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32)
1217+ ASMAtomicWritePtr(&pDev->qdisc, pPriv->ppSaved[pDev->num_tx_queues]);
1218+ pPriv->ppSaved[pDev->num_tx_queues] = NULL;
1219+ while (QDISC_IS_BUSY(pDev, pQdisc))
1220+ yield();
1221+ qdisc_destroy(pQdisc); /* Destroy reference */
1222+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32) */
1223+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 27) */
1224+
1225+ /*
1226+ * At this point all references to our qdisc should be gone
1227+ * unless the user had installed it manually.
1228+ */
1229+ QDISC_LOG(("vboxNetFltLinuxRemoveQdisc: pThis=%p\n", pPriv->pVBoxNetFlt));
1230+}
1231+
1232+#endif /* VBOXNETFLT_WITH_QDISC */
1233+
1234+
1235+/**
1236+ * Initialize module.
1237+ *
1238+ * @returns appropriate status code.
1239+ */
1240+static int __init VBoxNetFltLinuxInit(void)
1241+{
1242+ int rc;
1243+ /*
1244+ * Initialize IPRT.
1245+ */
1246+ rc = RTR0Init(0);
1247+ if (RT_SUCCESS(rc))
1248+ {
1249+ Log(("VBoxNetFltLinuxInit\n"));
1250+
1251+ /*
1252+ * Initialize the globals and connect to the support driver.
1253+ *
1254+ * This will call back vboxNetFltOsOpenSupDrv (and maybe vboxNetFltOsCloseSupDrv)
1255+ * for establishing the connect to the support driver.
1256+ */
1257+ memset(&g_VBoxNetFltGlobals, 0, sizeof(g_VBoxNetFltGlobals));
1258+ rc = vboxNetFltInitGlobalsAndIdc(&g_VBoxNetFltGlobals);
1259+ if (RT_SUCCESS(rc))
1260+ {
1261+#ifdef VBOXNETFLT_WITH_QDISC
1262+ /*memcpy(&g_VBoxNetFltQDiscOps, &pfifo_qdisc_ops, sizeof(g_VBoxNetFltQDiscOps));
1263+ strcpy(g_VBoxNetFltQDiscOps.id, "vboxnetflt");
1264+ g_VBoxNetFltQDiscOps.owner = THIS_MODULE;*/
1265+ rc = register_qdisc(&g_VBoxNetFltQDiscOps);
1266+ if (rc)
1267+ {
1268+ LogRel(("VBoxNetFlt: Failed to registered qdisc: %d\n", rc));
1269+ return rc;
1270+ }
1271+#endif /* VBOXNETFLT_WITH_QDISC */
1272+ LogRel(("VBoxNetFlt: Successfully started.\n"));
1273+ return 0;
1274+ }
1275+
1276+ LogRel(("VBoxNetFlt: failed to initialize device extension (rc=%d)\n", rc));
1277+ RTR0Term();
1278+ }
1279+ else
1280+ LogRel(("VBoxNetFlt: failed to initialize IPRT (rc=%d)\n", rc));
1281+
1282+ memset(&g_VBoxNetFltGlobals, 0, sizeof(g_VBoxNetFltGlobals));
1283+ return -RTErrConvertToErrno(rc);
1284+}
1285+
1286+
1287+/**
1288+ * Unload the module.
1289+ *
1290+ * @todo We have to prevent this if we're busy!
1291+ */
1292+static void __exit VBoxNetFltLinuxUnload(void)
1293+{
1294+ int rc;
1295+ Log(("VBoxNetFltLinuxUnload\n"));
1296+ Assert(vboxNetFltCanUnload(&g_VBoxNetFltGlobals));
1297+
1298+#ifdef VBOXNETFLT_WITH_QDISC
1299+ unregister_qdisc(&g_VBoxNetFltQDiscOps);
1300+#endif /* VBOXNETFLT_WITH_QDISC */
1301+ /*
1302+ * Undo the work done during start (in reverse order).
1303+ */
1304+ rc = vboxNetFltTryDeleteIdcAndGlobals(&g_VBoxNetFltGlobals);
1305+ AssertRC(rc); NOREF(rc);
1306+
1307+ RTR0Term();
1308+
1309+ memset(&g_VBoxNetFltGlobals, 0, sizeof(g_VBoxNetFltGlobals));
1310+
1311+ Log(("VBoxNetFltLinuxUnload - done\n"));
1312+}
1313+
1314+
1315+/**
1316+ * Experiment where we filter traffic from the host to the internal network
1317+ * before it reaches the NIC driver.
1318+ *
1319+ * The current code uses a very ugly hack and only works on kernels using the
1320+ * net_device_ops (>= 2.6.29). It has been shown to give us a
1321+ * performance boost of 60-100% though. So, we have to find some less hacky way
1322+ * of getting this job done eventually.
1323+ *
1324+ * #define VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
1325+ */
1326+#ifdef VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
1327+
1328+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
1329+
1330+# include <linux/ethtool.h>
1331+
1332+typedef struct ethtool_ops OVR_OPSTYPE;
1333+# define OVR_OPS ethtool_ops
1334+# define OVR_XMIT pfnStartXmit
1335+
1336+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1337+
1338+typedef struct net_device_ops OVR_OPSTYPE;
1339+# define OVR_OPS netdev_ops
1340+# define OVR_XMIT pOrgOps->ndo_start_xmit
1341+
1342+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1343+
1344+/**
1345+ * The overridden net_device_ops of the device we're attached to.
1346+ *
1347+ * As there is no net_device_ops structure in pre-2.6.29 kernels we override
1348+ * ethtool_ops instead along with hard_start_xmit callback in net_device
1349+ * structure.
1350+ *
1351+ * This is a very dirty hack that was created to explore how much we can improve
1352+ * the host to guest transfers by not CC'ing the NIC. It turns out to be
1353+ * the only way to filter outgoing packets for devices without TX queue.
1354+ */
1355+typedef struct VBoxNetDeviceOpsOverride
1356+{
1357+ /** Our overridden ops. */
1358+ OVR_OPSTYPE Ops;
1359+ /** Magic word. */
1360+ uint32_t u32Magic;
1361+ /** Pointer to the original ops. */
1362+ OVR_OPSTYPE const *pOrgOps;
1363+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
1364+ /** Pointer to the original hard_start_xmit function. */
1365+ int (*pfnStartXmit)(struct sk_buff *pSkb, struct net_device *pDev);
1366+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29) */
1367+ /** Pointer to the net filter instance. */
1368+ PVBOXNETFLTINS pVBoxNetFlt;
1369+ /** The number of filtered packages. */
1370+ uint64_t cFiltered;
1371+ /** The total number of packets */
1372+ uint64_t cTotal;
1373+} VBOXNETDEVICEOPSOVERRIDE, *PVBOXNETDEVICEOPSOVERRIDE;
1374+/** VBOXNETDEVICEOPSOVERRIDE::u32Magic value. */
1375+#define VBOXNETDEVICEOPSOVERRIDE_MAGIC UINT32_C(0x00c0ffee)
1376+
1377+/**
1378+ * ndo_start_xmit wrapper that drops packets that shouldn't go to the wire
1379+ * because they belong on the internal network.
1380+ *
1381+ * @returns NETDEV_TX_XXX.
1382+ * @param pSkb The socket buffer to transmit.
1383+ * @param pDev The net device.
1384+ */
1385+static int vboxNetFltLinuxStartXmitFilter(struct sk_buff *pSkb, struct net_device *pDev)
1386+{
1387+ PVBOXNETDEVICEOPSOVERRIDE pOverride = (PVBOXNETDEVICEOPSOVERRIDE)pDev->OVR_OPS;
1388+ uint8_t abHdrBuf[sizeof(RTNETETHERHDR) + sizeof(uint32_t) + RTNETIPV4_MIN_LEN];
1389+ PCRTNETETHERHDR pEtherHdr;
1390+ PINTNETTRUNKSWPORT pSwitchPort;
1391+ uint32_t cbHdrs;
1392+
1393+
1394+ /*
1395+ * Validate the override structure.
1396+ *
1397+ * Note! We're racing vboxNetFltLinuxUnhookDev here. If this was supposed
1398+ * to be production quality code, we would have to be much more
1399+ * careful here and avoid the race.
1400+ */
1401+ if ( !VALID_PTR(pOverride)
1402+ || pOverride->u32Magic != VBOXNETDEVICEOPSOVERRIDE_MAGIC
1403+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29)
1404+ || !VALID_PTR(pOverride->pOrgOps)
1405+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1406+ )
1407+ {
1408+ printk("vboxNetFltLinuxStartXmitFilter: bad override %p\n", pOverride);
1409+ dev_kfree_skb(pSkb);
1410+ return NETDEV_TX_OK;
1411+ }
1412+ pOverride->cTotal++;
1413+
1414+ /*
1415+ * Do the filtering base on the default OUI of our virtual NICs
1416+ *
1417+ * Note! In a real solution, we would ask the switch whether the
1418+ * destination MAC is 100% to be on the internal network and then
1419+ * drop it.
1420+ */
1421+ cbHdrs = skb_headlen(pSkb);
1422+ cbHdrs = RT_MIN(cbHdrs, sizeof(abHdrBuf));
1423+ pEtherHdr = (PCRTNETETHERHDR)skb_header_pointer(pSkb, 0, cbHdrs, &abHdrBuf[0]);
1424+ if ( pEtherHdr
1425+ && VALID_PTR(pOverride->pVBoxNetFlt)
1426+ && (pSwitchPort = pOverride->pVBoxNetFlt->pSwitchPort) != NULL
1427+ && VALID_PTR(pSwitchPort)
1428+ && cbHdrs >= 6)
1429+ {
1430+ INTNETSWDECISION enmDecision;
1431+
1432+ /** @todo consider reference counting, etc. */
1433+ enmDecision = pSwitchPort->pfnPreRecv(pSwitchPort, pEtherHdr, cbHdrs, INTNETTRUNKDIR_HOST);
1434+ if (enmDecision == INTNETSWDECISION_INTNET)
1435+ {
1436+ dev_kfree_skb(pSkb);
1437+ pOverride->cFiltered++;
1438+ return NETDEV_TX_OK;
1439+ }
1440+ }
1441+
1442+ return pOverride->OVR_XMIT(pSkb, pDev);
1443+}
1444+
1445+/**
1446+ * Hooks the device ndo_start_xmit operation of the device.
1447+ *
1448+ * @param pThis The net filter instance.
1449+ * @param pDev The net device.
1450+ */
1451+static void vboxNetFltLinuxHookDev(PVBOXNETFLTINS pThis, struct net_device *pDev)
1452+{
1453+ PVBOXNETDEVICEOPSOVERRIDE pOverride;
1454+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
1455+
1456+ /* Cancel override if ethtool_ops is missing (host-only case, #5712) */
1457+ if (!VALID_PTR(pDev->OVR_OPS))
1458+ return;
1459+ pOverride = RTMemAlloc(sizeof(*pOverride));
1460+ if (!pOverride)
1461+ return;
1462+ pOverride->pOrgOps = pDev->OVR_OPS;
1463+ pOverride->Ops = *pDev->OVR_OPS;
1464+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
1465+ pOverride->pfnStartXmit = pDev->hard_start_xmit;
1466+# else /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1467+ pOverride->Ops.ndo_start_xmit = vboxNetFltLinuxStartXmitFilter;
1468+# endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 29) */
1469+ pOverride->u32Magic = VBOXNETDEVICEOPSOVERRIDE_MAGIC;
1470+ pOverride->cTotal = 0;
1471+ pOverride->cFiltered = 0;
1472+ pOverride->pVBoxNetFlt = pThis;
1473+
1474+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp); /* (this isn't necessary, but so what) */
1475+ ASMAtomicWritePtr((void * volatile *)&pDev->OVR_OPS, pOverride);
1476+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
1477+ ASMAtomicXchgPtr((void * volatile *)&pDev->hard_start_xmit, vboxNetFltLinuxStartXmitFilter);
1478+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29) */
1479+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
1480+}
1481+
1482+/**
1483+ * Undos what vboxNetFltLinuxHookDev did.
1484+ *
1485+ * @param pThis The net filter instance.
1486+ * @param pDev The net device. Can be NULL, in which case
1487+ * we'll try retrieve it from @a pThis.
1488+ */
1489+static void vboxNetFltLinuxUnhookDev(PVBOXNETFLTINS pThis, struct net_device *pDev)
1490+{
1491+ PVBOXNETDEVICEOPSOVERRIDE pOverride;
1492+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
1493+
1494+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
1495+ if (!pDev)
1496+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
1497+ if (VALID_PTR(pDev))
1498+ {
1499+ pOverride = (PVBOXNETDEVICEOPSOVERRIDE)pDev->OVR_OPS;
1500+ if ( VALID_PTR(pOverride)
1501+ && pOverride->u32Magic == VBOXNETDEVICEOPSOVERRIDE_MAGIC
1502+ && VALID_PTR(pOverride->pOrgOps)
1503+ )
1504+ {
1505+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29)
1506+ ASMAtomicWritePtr((void * volatile *)&pDev->hard_start_xmit, pOverride->pfnStartXmit);
1507+# endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 29) */
1508+ ASMAtomicWritePtr((void const * volatile *)&pDev->OVR_OPS, pOverride->pOrgOps);
1509+ ASMAtomicWriteU32(&pOverride->u32Magic, 0);
1510+ }
1511+ else
1512+ pOverride = NULL;
1513+ }
1514+ else
1515+ pOverride = NULL;
1516+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
1517+
1518+ if (pOverride)
1519+ {
1520+ printk("vboxnetflt: dropped %llu out of %llu packets\n", pOverride->cFiltered, pOverride->cTotal);
1521+ RTMemFree(pOverride);
1522+ }
1523+}
1524+
1525+#endif /* VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT */
1526+
1527+
1528+/**
1529+ * Reads and retains the host interface handle.
1530+ *
1531+ * @returns The handle, NULL if detached.
1532+ * @param pThis
1533+ */
1534+DECLINLINE(struct net_device *) vboxNetFltLinuxRetainNetDev(PVBOXNETFLTINS pThis)
1535+{
1536+#if 0
1537+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
1538+ struct net_device *pDev = NULL;
1539+
1540+ Log(("vboxNetFltLinuxRetainNetDev\n"));
1541+ /*
1542+ * Be careful here to avoid problems racing the detached callback.
1543+ */
1544+ RTSpinlockAcquire(pThis->hSpinlock, &Tmp);
1545+ if (!ASMAtomicUoReadBool(&pThis->fDisconnectedFromHost))
1546+ {
1547+ pDev = (struct net_device *)ASMAtomicUoReadPtr((void * volatile *)&pThis->u.s.pDev);
1548+ if (pDev)
1549+ {
1550+ dev_hold(pDev);
1551+ Log(("vboxNetFltLinuxRetainNetDev: Device %p(%s) retained. ref=%d\n",
1552+ pDev, pDev->name,
1553+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
1554+ netdev_refcnt_read(pDev)
1555+#else
1556+ atomic_read(&pDev->refcnt)
1557+#endif
1558+ ));
1559+ }
1560+ }
1561+ RTSpinlockRelease(pThis->hSpinlock, &Tmp);
1562+
1563+ Log(("vboxNetFltLinuxRetainNetDev - done\n"));
1564+ return pDev;
1565+#else
1566+ return ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
1567+#endif
1568+}
1569+
1570+
1571+/**
1572+ * Release the host interface handle previously retained
1573+ * by vboxNetFltLinuxRetainNetDev.
1574+ *
1575+ * @param pThis The instance.
1576+ * @param pDev The vboxNetFltLinuxRetainNetDev
1577+ * return value, NULL is fine.
1578+ */
1579+DECLINLINE(void) vboxNetFltLinuxReleaseNetDev(PVBOXNETFLTINS pThis, struct net_device *pDev)
1580+{
1581+#if 0
1582+ Log(("vboxNetFltLinuxReleaseNetDev\n"));
1583+ NOREF(pThis);
1584+ if (pDev)
1585+ {
1586+ dev_put(pDev);
1587+ Log(("vboxNetFltLinuxReleaseNetDev: Device %p(%s) released. ref=%d\n",
1588+ pDev, pDev->name,
1589+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
1590+ netdev_refcnt_read(pDev)
1591+#else
1592+ atomic_read(&pDev->refcnt)
1593+#endif
1594+ ));
1595+ }
1596+ Log(("vboxNetFltLinuxReleaseNetDev - done\n"));
1597+#endif
1598+}
1599+
1600+#define VBOXNETFLT_CB_TAG(skb) (0xA1C90000 | (skb->dev->ifindex & 0xFFFF))
1601+#define VBOXNETFLT_SKB_TAG(skb) (*(uint32_t*)&((skb)->cb[sizeof((skb)->cb)-sizeof(uint32_t)]))
1602+
1603+/**
1604+ * Checks whether this is an mbuf created by vboxNetFltLinuxMBufFromSG,
1605+ * i.e. a buffer which we're pushing and should be ignored by the filter callbacks.
1606+ *
1607+ * @returns true / false accordingly.
1608+ * @param pBuf The sk_buff.
1609+ */
1610+DECLINLINE(bool) vboxNetFltLinuxSkBufIsOur(struct sk_buff *pBuf)
1611+{
1612+ return VBOXNETFLT_SKB_TAG(pBuf) == VBOXNETFLT_CB_TAG(pBuf);
1613+}
1614+
1615+
1616+/**
1617+ * Internal worker that create a linux sk_buff for a
1618+ * (scatter/)gather list.
1619+ *
1620+ * @returns Pointer to the sk_buff.
1621+ * @param pThis The instance.
1622+ * @param pSG The (scatter/)gather list.
1623+ * @param fDstWire Set if the destination is the wire.
1624+ */
1625+static struct sk_buff *vboxNetFltLinuxSkBufFromSG(PVBOXNETFLTINS pThis, PINTNETSG pSG, bool fDstWire)
1626+{
1627+ struct sk_buff *pPkt;
1628+ struct net_device *pDev;
1629+ unsigned fGsoType = 0;
1630+
1631+ if (pSG->cbTotal == 0)
1632+ {
1633+ LogRel(("VBoxNetFlt: Dropped empty packet coming from internal network.\n"));
1634+ return NULL;
1635+ }
1636+
1637+ /** @todo We should use fragments mapping the SG buffers with large packets.
1638+ * 256 bytes seems to be the a threshold used a lot for this. It
1639+ * requires some nasty work on the intnet side though... */
1640+ /*
1641+ * Allocate a packet and copy over the data.
1642+ */
1643+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
1644+ pPkt = dev_alloc_skb(pSG->cbTotal + NET_IP_ALIGN);
1645+ if (RT_UNLIKELY(!pPkt))
1646+ {
1647+ Log(("vboxNetFltLinuxSkBufFromSG: Failed to allocate sk_buff(%u).\n", pSG->cbTotal));
1648+ pSG->pvUserData = NULL;
1649+ return NULL;
1650+ }
1651+ pPkt->dev = pDev;
1652+ pPkt->ip_summed = CHECKSUM_NONE;
1653+
1654+ /* Align IP header on 16-byte boundary: 2 + 14 (ethernet hdr size). */
1655+ skb_reserve(pPkt, NET_IP_ALIGN);
1656+
1657+ /* Copy the segments. */
1658+ skb_put(pPkt, pSG->cbTotal);
1659+ IntNetSgRead(pSG, pPkt->data);
1660+
1661+#if defined(VBOXNETFLT_WITH_GSO_XMIT_WIRE) || defined(VBOXNETFLT_WITH_GSO_XMIT_HOST)
1662+ /*
1663+ * Setup GSO if used by this packet.
1664+ */
1665+ switch ((PDMNETWORKGSOTYPE)pSG->GsoCtx.u8Type)
1666+ {
1667+ default:
1668+ AssertMsgFailed(("%u (%s)\n", pSG->GsoCtx.u8Type, PDMNetGsoTypeName((PDMNETWORKGSOTYPE)pSG->GsoCtx.u8Type) ));
1669+ /* fall thru */
1670+ case PDMNETWORKGSOTYPE_INVALID:
1671+ fGsoType = 0;
1672+ break;
1673+ case PDMNETWORKGSOTYPE_IPV4_TCP:
1674+ fGsoType = SKB_GSO_TCPV4;
1675+ break;
1676+ case PDMNETWORKGSOTYPE_IPV4_UDP:
1677+ fGsoType = SKB_GSO_UDP;
1678+ break;
1679+ case PDMNETWORKGSOTYPE_IPV6_TCP:
1680+ fGsoType = SKB_GSO_TCPV6;
1681+ break;
1682+ }
1683+ if (fGsoType)
1684+ {
1685+ struct skb_shared_info *pShInfo = skb_shinfo(pPkt);
1686+
1687+ pShInfo->gso_type = fGsoType | SKB_GSO_DODGY;
1688+ pShInfo->gso_size = pSG->GsoCtx.cbMaxSeg;
1689+ pShInfo->gso_segs = PDMNetGsoCalcSegmentCount(&pSG->GsoCtx, pSG->cbTotal);
1690+
1691+ /*
1692+ * We need to set checksum fields even if the packet goes to the host
1693+ * directly as it may be immediately forwarded by IP layer @bugref{5020}.
1694+ */
1695+ Assert(skb_headlen(pPkt) >= pSG->GsoCtx.cbHdrs);
1696+ pPkt->ip_summed = CHECKSUM_PARTIAL;
1697+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 22)
1698+ pPkt->csum_start = skb_headroom(pPkt) + pSG->GsoCtx.offHdr2;
1699+ if (fGsoType & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))
1700+ pPkt->csum_offset = RT_OFFSETOF(RTNETTCP, th_sum);
1701+ else
1702+ pPkt->csum_offset = RT_OFFSETOF(RTNETUDP, uh_sum);
1703+# else
1704+ pPkt->h.raw = pPkt->data + pSG->GsoCtx.offHdr2;
1705+ if (fGsoType & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))
1706+ pPkt->csum = RT_OFFSETOF(RTNETTCP, th_sum);
1707+ else
1708+ pPkt->csum = RT_OFFSETOF(RTNETUDP, uh_sum);
1709+# endif
1710+ if (!fDstWire)
1711+ PDMNetGsoPrepForDirectUse(&pSG->GsoCtx, pPkt->data, pSG->cbTotal, PDMNETCSUMTYPE_PSEUDO);
1712+ }
1713+#endif /* VBOXNETFLT_WITH_GSO_XMIT_WIRE || VBOXNETFLT_WITH_GSO_XMIT_HOST */
1714+
1715+ /*
1716+ * Finish up the socket buffer.
1717+ */
1718+ pPkt->protocol = eth_type_trans(pPkt, pDev);
1719+ if (fDstWire)
1720+ {
1721+ VBOX_SKB_RESET_NETWORK_HDR(pPkt);
1722+
1723+ /* Restore ethernet header back. */
1724+ skb_push(pPkt, ETH_HLEN); /** @todo VLAN: +4 if VLAN? */
1725+ VBOX_SKB_RESET_MAC_HDR(pPkt);
1726+ }
1727+ VBOXNETFLT_SKB_TAG(pPkt) = VBOXNETFLT_CB_TAG(pPkt);
1728+
1729+ return pPkt;
1730+}
1731+
1732+
1733+/**
1734+ * Initializes a SG list from an sk_buff.
1735+ *
1736+ * @returns Number of segments.
1737+ * @param pThis The instance.
1738+ * @param pBuf The sk_buff.
1739+ * @param pSG The SG.
1740+ * @param pvFrame The frame pointer, optional.
1741+ * @param cSegs The number of segments allocated for the SG.
1742+ * This should match the number in the mbuf exactly!
1743+ * @param fSrc The source of the frame.
1744+ * @param pGso Pointer to the GSO context if it's a GSO
1745+ * internal network frame. NULL if regular frame.
1746+ */
1747+DECLINLINE(void) vboxNetFltLinuxSkBufToSG(PVBOXNETFLTINS pThis, struct sk_buff *pBuf, PINTNETSG pSG,
1748+ unsigned cSegs, uint32_t fSrc, PCPDMNETWORKGSO pGsoCtx)
1749+{
1750+ int i;
1751+ NOREF(pThis);
1752+
1753+ Assert(!skb_shinfo(pBuf)->frag_list);
1754+
1755+ if (!pGsoCtx)
1756+ IntNetSgInitTempSegs(pSG, pBuf->len, cSegs, 0 /*cSegsUsed*/);
1757+ else
1758+ IntNetSgInitTempSegsGso(pSG, pBuf->len, cSegs, 0 /*cSegsUsed*/, pGsoCtx);
1759+
1760+#ifdef VBOXNETFLT_SG_SUPPORT
1761+ pSG->aSegs[0].cb = skb_headlen(pBuf);
1762+ pSG->aSegs[0].pv = pBuf->data;
1763+ pSG->aSegs[0].Phys = NIL_RTHCPHYS;
1764+
1765+ for (i = 0; i < skb_shinfo(pBuf)->nr_frags; i++)
1766+ {
1767+ skb_frag_t *pFrag = &skb_shinfo(pBuf)->frags[i];
1768+ pSG->aSegs[i+1].cb = pFrag->size;
1769+ pSG->aSegs[i+1].pv = kmap(pFrag->page);
1770+ printk("%p = kmap()\n", pSG->aSegs[i+1].pv);
1771+ pSG->aSegs[i+1].Phys = NIL_RTHCPHYS;
1772+ }
1773+ ++i;
1774+
1775+#else
1776+ pSG->aSegs[0].cb = pBuf->len;
1777+ pSG->aSegs[0].pv = pBuf->data;
1778+ pSG->aSegs[0].Phys = NIL_RTHCPHYS;
1779+ i = 1;
1780+#endif
1781+
1782+ pSG->cSegsUsed = i;
1783+
1784+#ifdef PADD_RUNT_FRAMES_FROM_HOST
1785+ /*
1786+ * Add a trailer if the frame is too small.
1787+ *
1788+ * Since we're getting to the packet before it is framed, it has not
1789+ * yet been padded. The current solution is to add a segment pointing
1790+ * to a buffer containing all zeros and pray that works for all frames...
1791+ */
1792+ if (pSG->cbTotal < 60 && (fSrc & INTNETTRUNKDIR_HOST))
1793+ {
1794+ static uint8_t const s_abZero[128] = {0};
1795+
1796+ AssertReturnVoid(i < cSegs);
1797+
1798+ pSG->aSegs[i].Phys = NIL_RTHCPHYS;
1799+ pSG->aSegs[i].pv = (void *)&s_abZero[0];
1800+ pSG->aSegs[i].cb = 60 - pSG->cbTotal;
1801+ pSG->cbTotal = 60;
1802+ pSG->cSegsUsed++;
1803+ Assert(i + 1 <= pSG->cSegsAlloc)
1804+ }
1805+#endif
1806+
1807+ Log4(("vboxNetFltLinuxSkBufToSG: allocated=%d, segments=%d frags=%d next=%p frag_list=%p pkt_type=%x fSrc=%x\n",
1808+ pSG->cSegsAlloc, pSG->cSegsUsed, skb_shinfo(pBuf)->nr_frags, pBuf->next, skb_shinfo(pBuf)->frag_list, pBuf->pkt_type, fSrc));
1809+ for (i = 0; i < pSG->cSegsUsed; i++)
1810+ Log4(("vboxNetFltLinuxSkBufToSG: #%d: cb=%d pv=%p\n",
1811+ i, pSG->aSegs[i].cb, pSG->aSegs[i].pv));
1812+}
1813+
1814+/**
1815+ * Packet handler,
1816+ *
1817+ * @returns 0 or EJUSTRETURN.
1818+ * @param pThis The instance.
1819+ * @param pMBuf The mbuf.
1820+ * @param pvFrame The start of the frame, optional.
1821+ * @param fSrc Where the packet (allegedly) comes from, one INTNETTRUNKDIR_* value.
1822+ * @param eProtocol The protocol.
1823+ */
1824+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 14)
1825+static int vboxNetFltLinuxPacketHandler(struct sk_buff *pBuf,
1826+ struct net_device *pSkbDev,
1827+ struct packet_type *pPacketType,
1828+ struct net_device *pOrigDev)
1829+#else
1830+static int vboxNetFltLinuxPacketHandler(struct sk_buff *pBuf,
1831+ struct net_device *pSkbDev,
1832+ struct packet_type *pPacketType)
1833+#endif
1834+{
1835+ PVBOXNETFLTINS pThis;
1836+ struct net_device *pDev;
1837+ LogFlow(("vboxNetFltLinuxPacketHandler: pBuf=%p pSkbDev=%p pPacketType=%p\n",
1838+ pBuf, pSkbDev, pPacketType));
1839+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 18)
1840+ Log3(("vboxNetFltLinuxPacketHandler: skb len=%u data_len=%u truesize=%u next=%p nr_frags=%u gso_size=%u gso_seqs=%u gso_type=%x frag_list=%p pkt_type=%x\n",
1841+ pBuf->len, pBuf->data_len, pBuf->truesize, pBuf->next, skb_shinfo(pBuf)->nr_frags, skb_shinfo(pBuf)->gso_size, skb_shinfo(pBuf)->gso_segs, skb_shinfo(pBuf)->gso_type, skb_shinfo(pBuf)->frag_list, pBuf->pkt_type));
1842+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 22)
1843+ Log4(("vboxNetFltLinuxPacketHandler: packet dump follows:\n%.*Rhxd\n", pBuf->len-pBuf->data_len, skb_mac_header(pBuf)));
1844+# endif
1845+#else
1846+ Log3(("vboxNetFltLinuxPacketHandler: skb len=%u data_len=%u truesize=%u next=%p nr_frags=%u tso_size=%u tso_seqs=%u frag_list=%p pkt_type=%x\n",
1847+ pBuf->len, pBuf->data_len, pBuf->truesize, pBuf->next, skb_shinfo(pBuf)->nr_frags, skb_shinfo(pBuf)->tso_size, skb_shinfo(pBuf)->tso_segs, skb_shinfo(pBuf)->frag_list, pBuf->pkt_type));
1848+#endif
1849+ /*
1850+ * Drop it immediately?
1851+ */
1852+ if (!pBuf)
1853+ return 0;
1854+
1855+ pThis = VBOX_FLT_PT_TO_INST(pPacketType);
1856+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
1857+ if (pDev != pSkbDev)
1858+ {
1859+ Log(("vboxNetFltLinuxPacketHandler: Devices do not match, pThis may be wrong! pThis=%p\n", pThis));
1860+ return 0;
1861+ }
1862+
1863+ Log4(("vboxNetFltLinuxPacketHandler: pBuf->cb dump:\n%.*Rhxd\n", sizeof(pBuf->cb), pBuf->cb));
1864+ if (vboxNetFltLinuxSkBufIsOur(pBuf))
1865+ {
1866+ Log2(("vboxNetFltLinuxPacketHandler: got our own sk_buff, drop it.\n"));
1867+ dev_kfree_skb(pBuf);
1868+ return 0;
1869+ }
1870+
1871+#ifndef VBOXNETFLT_SG_SUPPORT
1872+ {
1873+ /*
1874+ * Get rid of fragmented packets, they cause too much trouble.
1875+ */
1876+ struct sk_buff *pCopy = skb_copy(pBuf, GFP_ATOMIC);
1877+ kfree_skb(pBuf);
1878+ if (!pCopy)
1879+ {
1880+ LogRel(("VBoxNetFlt: Failed to allocate packet buffer, dropping the packet.\n"));
1881+ return 0;
1882+ }
1883+ pBuf = pCopy;
1884+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 18)
1885+ Log3(("vboxNetFltLinuxPacketHandler: skb copy len=%u data_len=%u truesize=%u next=%p nr_frags=%u gso_size=%u gso_seqs=%u gso_type=%x frag_list=%p pkt_type=%x\n",
1886+ pBuf->len, pBuf->data_len, pBuf->truesize, pBuf->next, skb_shinfo(pBuf)->nr_frags, skb_shinfo(pBuf)->gso_size, skb_shinfo(pBuf)->gso_segs, skb_shinfo(pBuf)->gso_type, skb_shinfo(pBuf)->frag_list, pBuf->pkt_type));
1887+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 22)
1888+ Log4(("vboxNetFltLinuxPacketHandler: packet dump follows:\n%.*Rhxd\n", pBuf->len-pBuf->data_len, skb_mac_header(pBuf)));
1889+# endif
1890+# else
1891+ Log3(("vboxNetFltLinuxPacketHandler: skb copy len=%u data_len=%u truesize=%u next=%p nr_frags=%u tso_size=%u tso_seqs=%u frag_list=%p pkt_type=%x\n",
1892+ pBuf->len, pBuf->data_len, pBuf->truesize, pBuf->next, skb_shinfo(pBuf)->nr_frags, skb_shinfo(pBuf)->tso_size, skb_shinfo(pBuf)->tso_segs, skb_shinfo(pBuf)->frag_list, pBuf->pkt_type));
1893+# endif
1894+ }
1895+#endif
1896+
1897+#ifdef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
1898+ /* Forward it to the internal network. */
1899+ vboxNetFltLinuxForwardToIntNet(pThis, pBuf);
1900+#else
1901+ /* Add the packet to transmit queue and schedule the bottom half. */
1902+ skb_queue_tail(&pThis->u.s.XmitQueue, pBuf);
1903+ schedule_work(&pThis->u.s.XmitTask);
1904+ Log4(("vboxNetFltLinuxPacketHandler: scheduled work %p for sk_buff %p\n",
1905+ &pThis->u.s.XmitTask, pBuf));
1906+#endif
1907+
1908+ /* It does not really matter what we return, it is ignored by the kernel. */
1909+ return 0;
1910+}
1911+
1912+/**
1913+ * Calculate the number of INTNETSEG segments the socket buffer will need.
1914+ *
1915+ * @returns Segment count.
1916+ * @param pBuf The socket buffer.
1917+ */
1918+DECLINLINE(unsigned) vboxNetFltLinuxCalcSGSegments(struct sk_buff *pBuf)
1919+{
1920+#ifdef VBOXNETFLT_SG_SUPPORT
1921+ unsigned cSegs = 1 + skb_shinfo(pBuf)->nr_frags;
1922+#else
1923+ unsigned cSegs = 1;
1924+#endif
1925+#ifdef PADD_RUNT_FRAMES_FROM_HOST
1926+ /* vboxNetFltLinuxSkBufToSG adds a padding segment if it's a runt. */
1927+ if (pBuf->len < 60)
1928+ cSegs++;
1929+#endif
1930+ return cSegs;
1931+}
1932+
1933+/**
1934+ * Destroy the intnet scatter / gather buffer created by
1935+ * vboxNetFltLinuxSkBufToSG.
1936+ */
1937+static void vboxNetFltLinuxDestroySG(PINTNETSG pSG)
1938+{
1939+#ifdef VBOXNETFLT_SG_SUPPORT
1940+ int i;
1941+
1942+ for (i = 0; i < skb_shinfo(pBuf)->nr_frags; i++)
1943+ {
1944+ printk("kunmap(%p)\n", pSG->aSegs[i+1].pv);
1945+ kunmap(pSG->aSegs[i+1].pv);
1946+ }
1947+#endif
1948+ NOREF(pSG);
1949+}
1950+
1951+#ifdef LOG_ENABLED
1952+/**
1953+ * Logging helper.
1954+ */
1955+static void vboxNetFltDumpPacket(PINTNETSG pSG, bool fEgress, const char *pszWhere, int iIncrement)
1956+{
1957+ uint8_t *pInt, *pExt;
1958+ static int iPacketNo = 1;
1959+ iPacketNo += iIncrement;
1960+ if (fEgress)
1961+ {
1962+ pExt = pSG->aSegs[0].pv;
1963+ pInt = pExt + 6;
1964+ }
1965+ else
1966+ {
1967+ pInt = pSG->aSegs[0].pv;
1968+ pExt = pInt + 6;
1969+ }
1970+ Log(("VBoxNetFlt: (int)%02x:%02x:%02x:%02x:%02x:%02x"
1971+ " %s (%s)%02x:%02x:%02x:%02x:%02x:%02x (%u bytes) packet #%u\n",
1972+ pInt[0], pInt[1], pInt[2], pInt[3], pInt[4], pInt[5],
1973+ fEgress ? "-->" : "<--", pszWhere,
1974+ pExt[0], pExt[1], pExt[2], pExt[3], pExt[4], pExt[5],
1975+ pSG->cbTotal, iPacketNo));
1976+ Log3(("%.*Rhxd\n", pSG->aSegs[0].cb, pSG->aSegs[0].pv));
1977+}
1978+#else
1979+# define vboxNetFltDumpPacket(a, b, c, d) do {} while (0)
1980+#endif
1981+
1982+#ifdef VBOXNETFLT_WITH_GSO_RECV
1983+
1984+/**
1985+ * Worker for vboxNetFltLinuxForwardToIntNet that checks if we can forwards a
1986+ * GSO socket buffer without having to segment it.
1987+ *
1988+ * @returns true on success, false if needs segmenting.
1989+ * @param pThis The net filter instance.
1990+ * @param pSkb The GSO socket buffer.
1991+ * @param fSrc The source.
1992+ * @param pGsoCtx Where to return the GSO context on success.
1993+ */
1994+static bool vboxNetFltLinuxCanForwardAsGso(PVBOXNETFLTINS pThis, struct sk_buff *pSkb, uint32_t fSrc,
1995+ PPDMNETWORKGSO pGsoCtx)
1996+{
1997+ PDMNETWORKGSOTYPE enmGsoType;
1998+ uint16_t uEtherType;
1999+ unsigned int cbTransport;
2000+ unsigned int offTransport;
2001+ unsigned int cbTransportHdr;
2002+ unsigned uProtocol;
2003+ union
2004+ {
2005+ RTNETIPV4 IPv4;
2006+ RTNETIPV6 IPv6;
2007+ RTNETTCP Tcp;
2008+ uint8_t ab[40];
2009+ uint16_t au16[40/2];
2010+ uint32_t au32[40/4];
2011+ } Buf;
2012+
2013+ /*
2014+ * Check the GSO properties of the socket buffer and make sure it fits.
2015+ */
2016+ /** @todo Figure out how to handle SKB_GSO_TCP_ECN! */
2017+ if (RT_UNLIKELY( skb_shinfo(pSkb)->gso_type & ~(SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_TCPV6 | SKB_GSO_TCPV4) ))
2018+ {
2019+ Log5(("vboxNetFltLinuxCanForwardAsGso: gso_type=%#x\n", skb_shinfo(pSkb)->gso_type));
2020+ return false;
2021+ }
2022+ if (RT_UNLIKELY( skb_shinfo(pSkb)->gso_size < 1
2023+ || pSkb->len > VBOX_MAX_GSO_SIZE ))
2024+ {
2025+ Log5(("vboxNetFltLinuxCanForwardAsGso: gso_size=%#x skb_len=%#x (max=%#x)\n", skb_shinfo(pSkb)->gso_size, pSkb->len, VBOX_MAX_GSO_SIZE));
2026+ return false;
2027+ }
2028+ /*
2029+ * It is possible to receive GSO packets from wire if GRO is enabled.
2030+ */
2031+ if (RT_UNLIKELY(fSrc & INTNETTRUNKDIR_WIRE))
2032+ {
2033+ Log5(("vboxNetFltLinuxCanForwardAsGso: fSrc=wire\n"));
2034+#ifdef VBOXNETFLT_WITH_GRO
2035+ /*
2036+ * The packet came from the wire and the driver has already consumed
2037+ * mac header. We need to restore it back.
2038+ */
2039+ pSkb->mac_len = skb_network_header(pSkb) - skb_mac_header(pSkb);
2040+ skb_push(pSkb, pSkb->mac_len);
2041+ Log5(("vboxNetFltLinuxCanForwardAsGso: mac_len=%d data=%p mac_header=%p network_header=%p\n",
2042+ pSkb->mac_len, pSkb->data, skb_mac_header(pSkb), skb_network_header(pSkb)));
2043+#else /* !VBOXNETFLT_WITH_GRO */
2044+ /* Older kernels didn't have GRO. */
2045+ return false;
2046+#endif /* !VBOXNETFLT_WITH_GRO */
2047+ }
2048+ else
2049+ {
2050+ /*
2051+ * skb_gso_segment does the following. Do we need to do it as well?
2052+ */
2053+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 22)
2054+ skb_reset_mac_header(pSkb);
2055+ pSkb->mac_len = pSkb->network_header - pSkb->mac_header;
2056+#else
2057+ pSkb->mac.raw = pSkb->data;
2058+ pSkb->mac_len = pSkb->nh.raw - pSkb->data;
2059+#endif
2060+ }
2061+
2062+ /*
2063+ * Switch on the ethertype.
2064+ */
2065+ uEtherType = pSkb->protocol;
2066+ if ( uEtherType == RT_H2N_U16_C(RTNET_ETHERTYPE_VLAN)
2067+ && pSkb->mac_len == sizeof(RTNETETHERHDR) + sizeof(uint32_t))
2068+ {
2069+ uint16_t const *puEtherType = skb_header_pointer(pSkb, sizeof(RTNETETHERHDR) + sizeof(uint16_t), sizeof(uint16_t), &Buf);
2070+ if (puEtherType)
2071+ uEtherType = *puEtherType;
2072+ }
2073+ switch (uEtherType)
2074+ {
2075+ case RT_H2N_U16_C(RTNET_ETHERTYPE_IPV4):
2076+ {
2077+ unsigned int cbHdr;
2078+ PCRTNETIPV4 pIPv4 = (PCRTNETIPV4)skb_header_pointer(pSkb, pSkb->mac_len, sizeof(Buf.IPv4), &Buf);
2079+ if (RT_UNLIKELY(!pIPv4))
2080+ {
2081+ Log5(("vboxNetFltLinuxCanForwardAsGso: failed to access IPv4 hdr\n"));
2082+ return false;
2083+ }
2084+
2085+ cbHdr = pIPv4->ip_hl * 4;
2086+ cbTransport = RT_N2H_U16(pIPv4->ip_len);
2087+ if (RT_UNLIKELY( cbHdr < RTNETIPV4_MIN_LEN
2088+ || cbHdr > cbTransport ))
2089+ {
2090+ Log5(("vboxNetFltLinuxCanForwardAsGso: invalid IPv4 lengths: ip_hl=%u ip_len=%u\n", pIPv4->ip_hl, RT_N2H_U16(pIPv4->ip_len)));
2091+ return false;
2092+ }
2093+ cbTransport -= cbHdr;
2094+ offTransport = pSkb->mac_len + cbHdr;
2095+ uProtocol = pIPv4->ip_p;
2096+ if (uProtocol == RTNETIPV4_PROT_TCP)
2097+ enmGsoType = PDMNETWORKGSOTYPE_IPV4_TCP;
2098+ else if (uProtocol == RTNETIPV4_PROT_UDP)
2099+ enmGsoType = PDMNETWORKGSOTYPE_IPV4_UDP;
2100+ else /** @todo IPv6: 4to6 tunneling */
2101+ enmGsoType = PDMNETWORKGSOTYPE_INVALID;
2102+ break;
2103+ }
2104+
2105+ case RT_H2N_U16_C(RTNET_ETHERTYPE_IPV6):
2106+ {
2107+ PCRTNETIPV6 pIPv6 = (PCRTNETIPV6)skb_header_pointer(pSkb, pSkb->mac_len, sizeof(Buf.IPv6), &Buf);
2108+ if (RT_UNLIKELY(!pIPv6))
2109+ {
2110+ Log5(("vboxNetFltLinuxCanForwardAsGso: failed to access IPv6 hdr\n"));
2111+ return false;
2112+ }
2113+
2114+ cbTransport = RT_N2H_U16(pIPv6->ip6_plen);
2115+ offTransport = pSkb->mac_len + sizeof(RTNETIPV6);
2116+ uProtocol = pIPv6->ip6_nxt;
2117+ /** @todo IPv6: Dig our way out of the other headers. */
2118+ if (uProtocol == RTNETIPV4_PROT_TCP)
2119+ enmGsoType = PDMNETWORKGSOTYPE_IPV6_TCP;
2120+ else if (uProtocol == RTNETIPV4_PROT_UDP)
2121+ enmGsoType = PDMNETWORKGSOTYPE_IPV4_UDP;
2122+ else
2123+ enmGsoType = PDMNETWORKGSOTYPE_INVALID;
2124+ break;
2125+ }
2126+
2127+ default:
2128+ Log5(("vboxNetFltLinuxCanForwardAsGso: uEtherType=%#x\n", RT_H2N_U16(uEtherType)));
2129+ return false;
2130+ }
2131+
2132+ if (enmGsoType == PDMNETWORKGSOTYPE_INVALID)
2133+ {
2134+ Log5(("vboxNetFltLinuxCanForwardAsGso: Unsupported protocol %d\n", uProtocol));
2135+ return false;
2136+ }
2137+
2138+ if (RT_UNLIKELY( offTransport + cbTransport <= offTransport
2139+ || offTransport + cbTransport > pSkb->len
2140+ || cbTransport < (uProtocol == RTNETIPV4_PROT_TCP ? RTNETTCP_MIN_LEN : RTNETUDP_MIN_LEN)) )
2141+ {
2142+ Log5(("vboxNetFltLinuxCanForwardAsGso: Bad transport length; off=%#x + cb=%#x => %#x; skb_len=%#x (%s)\n",
2143+ offTransport, cbTransport, offTransport + cbTransport, pSkb->len, PDMNetGsoTypeName(enmGsoType) ));
2144+ return false;
2145+ }
2146+
2147+ /*
2148+ * Check the TCP/UDP bits.
2149+ */
2150+ if (uProtocol == RTNETIPV4_PROT_TCP)
2151+ {
2152+ PCRTNETTCP pTcp = (PCRTNETTCP)skb_header_pointer(pSkb, offTransport, sizeof(Buf.Tcp), &Buf);
2153+ if (RT_UNLIKELY(!pTcp))
2154+ {
2155+ Log5(("vboxNetFltLinuxCanForwardAsGso: failed to access TCP hdr\n"));
2156+ return false;
2157+ }
2158+
2159+ cbTransportHdr = pTcp->th_off * 4;
2160+ if (RT_UNLIKELY( cbTransportHdr < RTNETTCP_MIN_LEN
2161+ || cbTransportHdr > cbTransport
2162+ || offTransport + cbTransportHdr >= UINT8_MAX
2163+ || offTransport + cbTransportHdr >= pSkb->len ))
2164+ {
2165+ Log5(("vboxNetFltLinuxCanForwardAsGso: No space for TCP header; off=%#x cb=%#x skb_len=%#x\n", offTransport, cbTransportHdr, pSkb->len));
2166+ return false;
2167+ }
2168+
2169+ }
2170+ else
2171+ {
2172+ Assert(uProtocol == RTNETIPV4_PROT_UDP);
2173+ cbTransportHdr = sizeof(RTNETUDP);
2174+ if (RT_UNLIKELY( offTransport + cbTransportHdr >= UINT8_MAX
2175+ || offTransport + cbTransportHdr >= pSkb->len ))
2176+ {
2177+ Log5(("vboxNetFltLinuxCanForwardAsGso: No space for UDP header; off=%#x skb_len=%#x\n", offTransport, pSkb->len));
2178+ return false;
2179+ }
2180+ }
2181+
2182+ /*
2183+ * We're good, init the GSO context.
2184+ */
2185+ pGsoCtx->u8Type = enmGsoType;
2186+ pGsoCtx->cbHdrs = offTransport + cbTransportHdr;
2187+ pGsoCtx->cbMaxSeg = skb_shinfo(pSkb)->gso_size;
2188+ pGsoCtx->offHdr1 = pSkb->mac_len;
2189+ pGsoCtx->offHdr2 = offTransport;
2190+ pGsoCtx->au8Unused[0] = 0;
2191+ pGsoCtx->au8Unused[1] = 0;
2192+
2193+ return true;
2194+}
2195+
2196+/**
2197+ * Forward the socket buffer as a GSO internal network frame.
2198+ *
2199+ * @returns IPRT status code.
2200+ * @param pThis The net filter instance.
2201+ * @param pSkb The GSO socket buffer.
2202+ * @param fSrc The source.
2203+ * @param pGsoCtx Where to return the GSO context on success.
2204+ */
2205+static int vboxNetFltLinuxForwardAsGso(PVBOXNETFLTINS pThis, struct sk_buff *pSkb, uint32_t fSrc, PCPDMNETWORKGSO pGsoCtx)
2206+{
2207+ int rc;
2208+ unsigned cSegs = vboxNetFltLinuxCalcSGSegments(pSkb);
2209+ if (RT_LIKELY(cSegs <= MAX_SKB_FRAGS + 1))
2210+ {
2211+ PINTNETSG pSG = (PINTNETSG)alloca(RT_OFFSETOF(INTNETSG, aSegs[cSegs]));
2212+ if (RT_LIKELY(pSG))
2213+ {
2214+ vboxNetFltLinuxSkBufToSG(pThis, pSkb, pSG, cSegs, fSrc, pGsoCtx);
2215+
2216+ vboxNetFltDumpPacket(pSG, false, (fSrc & INTNETTRUNKDIR_HOST) ? "host" : "wire", 1);
2217+ pThis->pSwitchPort->pfnRecv(pThis->pSwitchPort, NULL /* pvIf */, pSG, fSrc);
2218+
2219+ vboxNetFltLinuxDestroySG(pSG);
2220+ rc = VINF_SUCCESS;
2221+ }
2222+ else
2223+ {
2224+ Log(("VBoxNetFlt: Dropping the sk_buff (failure case).\n"));
2225+ rc = VERR_NO_MEMORY;
2226+ }
2227+ }
2228+ else
2229+ {
2230+ Log(("VBoxNetFlt: Bad sk_buff? cSegs=%#x.\n", cSegs));
2231+ rc = VERR_INTERNAL_ERROR_3;
2232+ }
2233+
2234+ Log4(("VBoxNetFlt: Dropping the sk_buff.\n"));
2235+ dev_kfree_skb(pSkb);
2236+ return rc;
2237+}
2238+
2239+#endif /* VBOXNETFLT_WITH_GSO_RECV */
2240+
2241+/**
2242+ * Worker for vboxNetFltLinuxForwardToIntNet.
2243+ *
2244+ * @returns VINF_SUCCESS or VERR_NO_MEMORY.
2245+ * @param pThis The net filter instance.
2246+ * @param pBuf The socket buffer.
2247+ * @param fSrc The source.
2248+ */
2249+static int vboxNetFltLinuxForwardSegment(PVBOXNETFLTINS pThis, struct sk_buff *pBuf, uint32_t fSrc)
2250+{
2251+ int rc;
2252+ unsigned cSegs = vboxNetFltLinuxCalcSGSegments(pBuf);
2253+ if (cSegs <= MAX_SKB_FRAGS + 1)
2254+ {
2255+ PINTNETSG pSG = (PINTNETSG)alloca(RT_OFFSETOF(INTNETSG, aSegs[cSegs]));
2256+ if (RT_LIKELY(pSG))
2257+ {
2258+ if (fSrc & INTNETTRUNKDIR_WIRE)
2259+ {
2260+ /*
2261+ * The packet came from wire, ethernet header was removed by device driver.
2262+ * Restore it.
2263+ */
2264+ skb_push(pBuf, ETH_HLEN);
2265+ }
2266+
2267+ vboxNetFltLinuxSkBufToSG(pThis, pBuf, pSG, cSegs, fSrc, NULL /*pGsoCtx*/);
2268+
2269+ vboxNetFltDumpPacket(pSG, false, (fSrc & INTNETTRUNKDIR_HOST) ? "host" : "wire", 1);
2270+ pThis->pSwitchPort->pfnRecv(pThis->pSwitchPort, NULL /* pvIf */, pSG, fSrc);
2271+
2272+ vboxNetFltLinuxDestroySG(pSG);
2273+ rc = VINF_SUCCESS;
2274+ }
2275+ else
2276+ {
2277+ Log(("VBoxNetFlt: Failed to allocate SG buffer.\n"));
2278+ rc = VERR_NO_MEMORY;
2279+ }
2280+ }
2281+ else
2282+ {
2283+ Log(("VBoxNetFlt: Bad sk_buff? cSegs=%#x.\n", cSegs));
2284+ rc = VERR_INTERNAL_ERROR_3;
2285+ }
2286+
2287+ Log4(("VBoxNetFlt: Dropping the sk_buff.\n"));
2288+ dev_kfree_skb(pBuf);
2289+ return rc;
2290+}
2291+
2292+/**
2293+ *
2294+ * @param pBuf The socket buffer. This is consumed by this function.
2295+ */
2296+static void vboxNetFltLinuxForwardToIntNet(PVBOXNETFLTINS pThis, struct sk_buff *pBuf)
2297+{
2298+ uint32_t fSrc = pBuf->pkt_type == PACKET_OUTGOING ? INTNETTRUNKDIR_HOST : INTNETTRUNKDIR_WIRE;
2299+
2300+#ifdef VBOXNETFLT_WITH_GSO
2301+ if (skb_is_gso(pBuf))
2302+ {
2303+ PDMNETWORKGSO GsoCtx;
2304+ Log3(("vboxNetFltLinuxForwardToIntNet: skb len=%u data_len=%u truesize=%u next=%p nr_frags=%u gso_size=%u gso_seqs=%u gso_type=%x frag_list=%p pkt_type=%x ip_summed=%d\n",
2305+ pBuf->len, pBuf->data_len, pBuf->truesize, pBuf->next, skb_shinfo(pBuf)->nr_frags, skb_shinfo(pBuf)->gso_size, skb_shinfo(pBuf)->gso_segs, skb_shinfo(pBuf)->gso_type, skb_shinfo(pBuf)->frag_list, pBuf->pkt_type, pBuf->ip_summed));
2306+# ifdef VBOXNETFLT_WITH_GSO_RECV
2307+ if ( (skb_shinfo(pBuf)->gso_type & (SKB_GSO_UDP | SKB_GSO_TCPV6 | SKB_GSO_TCPV4))
2308+ && vboxNetFltLinuxCanForwardAsGso(pThis, pBuf, fSrc, &GsoCtx) )
2309+ vboxNetFltLinuxForwardAsGso(pThis, pBuf, fSrc, &GsoCtx);
2310+ else
2311+# endif
2312+ {
2313+ /* Need to segment the packet */
2314+ struct sk_buff *pNext;
2315+ struct sk_buff *pSegment = skb_gso_segment(pBuf, 0 /*supported features*/);
2316+ if (IS_ERR(pSegment))
2317+ {
2318+ dev_kfree_skb(pBuf);
2319+ LogRel(("VBoxNetFlt: Failed to segment a packet (%d).\n", PTR_ERR(pSegment)));
2320+ return;
2321+ }
2322+
2323+ for (; pSegment; pSegment = pNext)
2324+ {
2325+ Log3(("vboxNetFltLinuxForwardToIntNet: segment len=%u data_len=%u truesize=%u next=%p nr_frags=%u gso_size=%u gso_seqs=%u gso_type=%x frag_list=%p pkt_type=%x\n",
2326+ pSegment->len, pSegment->data_len, pSegment->truesize, pSegment->next, skb_shinfo(pSegment)->nr_frags, skb_shinfo(pSegment)->gso_size, skb_shinfo(pSegment)->gso_segs, skb_shinfo(pSegment)->gso_type, skb_shinfo(pSegment)->frag_list, pSegment->pkt_type));
2327+ pNext = pSegment->next;
2328+ pSegment->next = 0;
2329+ vboxNetFltLinuxForwardSegment(pThis, pSegment, fSrc);
2330+ }
2331+ dev_kfree_skb(pBuf);
2332+ }
2333+ }
2334+ else
2335+#endif /* VBOXNETFLT_WITH_GSO */
2336+ {
2337+ if (pBuf->ip_summed == CHECKSUM_PARTIAL && pBuf->pkt_type == PACKET_OUTGOING)
2338+ {
2339+#if LINUX_VERSION_CODE <= KERNEL_VERSION(2, 6, 18)
2340+ /*
2341+ * Try to work around the problem with CentOS 4.7 and 5.2 (2.6.9
2342+ * and 2.6.18 kernels), they pass wrong 'h' pointer down. We take IP
2343+ * header length from the header itself and reconstruct 'h' pointer
2344+ * to TCP (or whatever) header.
2345+ */
2346+ unsigned char *tmp = pBuf->h.raw;
2347+ if (pBuf->h.raw == pBuf->nh.raw && pBuf->protocol == htons(ETH_P_IP))
2348+ pBuf->h.raw = pBuf->nh.raw + pBuf->nh.iph->ihl * 4;
2349+#endif /* LINUX_VERSION_CODE <= KERNEL_VERSION(2, 6, 18) */
2350+ if (VBOX_SKB_CHECKSUM_HELP(pBuf))
2351+ {
2352+ LogRel(("VBoxNetFlt: Failed to compute checksum, dropping the packet.\n"));
2353+ dev_kfree_skb(pBuf);
2354+ return;
2355+ }
2356+#if LINUX_VERSION_CODE <= KERNEL_VERSION(2, 6, 18)
2357+ /* Restore the original (wrong) pointer. */
2358+ pBuf->h.raw = tmp;
2359+#endif /* LINUX_VERSION_CODE <= KERNEL_VERSION(2, 6, 18) */
2360+ }
2361+ vboxNetFltLinuxForwardSegment(pThis, pBuf, fSrc);
2362+ }
2363+}
2364+
2365+#ifndef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
2366+/**
2367+ * Work queue handler that forwards the socket buffers queued by
2368+ * vboxNetFltLinuxPacketHandler to the internal network.
2369+ *
2370+ * @param pWork The work queue.
2371+ */
2372+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 20)
2373+static void vboxNetFltLinuxXmitTask(struct work_struct *pWork)
2374+# else
2375+static void vboxNetFltLinuxXmitTask(void *pWork)
2376+# endif
2377+{
2378+ PVBOXNETFLTINS pThis = VBOX_FLT_XT_TO_INST(pWork);
2379+ struct sk_buff *pBuf;
2380+
2381+ Log4(("vboxNetFltLinuxXmitTask: Got work %p.\n", pWork));
2382+
2383+ /*
2384+ * Active? Retain the instance and increment the busy counter.
2385+ */
2386+ if (vboxNetFltTryRetainBusyActive(pThis))
2387+ {
2388+ while ((pBuf = skb_dequeue(&pThis->u.s.XmitQueue)) != NULL)
2389+ vboxNetFltLinuxForwardToIntNet(pThis, pBuf);
2390+
2391+ vboxNetFltRelease(pThis, true /* fBusy */);
2392+ }
2393+ else
2394+ {
2395+ /** @todo Shouldn't we just drop the packets here? There is little point in
2396+ * making them accumulate when the VM is paused and it'll only waste
2397+ * kernel memory anyway... Hmm. maybe wait a short while (2-5 secs)
2398+ * before start draining the packets (goes for the intnet ring buf
2399+ * too)? */
2400+ }
2401+}
2402+#endif /* !VBOXNETFLT_LINUX_NO_XMIT_QUEUE */
2403+
2404+/**
2405+ * Reports the GSO capabilities of the hardware NIC.
2406+ *
2407+ * @param pThis The net filter instance. The caller hold a
2408+ * reference to this.
2409+ */
2410+static void vboxNetFltLinuxReportNicGsoCapabilities(PVBOXNETFLTINS pThis)
2411+{
2412+#ifdef VBOXNETFLT_WITH_GSO_XMIT_WIRE
2413+ if (vboxNetFltTryRetainBusyNotDisconnected(pThis))
2414+ {
2415+ struct net_device *pDev;
2416+ PINTNETTRUNKSWPORT pSwitchPort;
2417+ unsigned int fFeatures;
2418+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
2419+
2420+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
2421+
2422+ pSwitchPort = pThis->pSwitchPort; /* this doesn't need to be here, but it doesn't harm. */
2423+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
2424+ if (pDev)
2425+ fFeatures = pDev->features;
2426+ else
2427+ fFeatures = 0;
2428+
2429+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
2430+
2431+ if (pThis->pSwitchPort)
2432+ {
2433+ /* Set/update the GSO capabilities of the NIC. */
2434+ uint32_t fGsoCapabilites = 0;
2435+ if (fFeatures & NETIF_F_TSO)
2436+ fGsoCapabilites |= RT_BIT_32(PDMNETWORKGSOTYPE_IPV4_TCP);
2437+ if (fFeatures & NETIF_F_TSO6)
2438+ fGsoCapabilites |= RT_BIT_32(PDMNETWORKGSOTYPE_IPV6_TCP);
2439+# if 0 /** @todo GSO: Test UDP offloading (UFO) on linux. */
2440+ if (fFeatures & NETIF_F_UFO)
2441+ fGsoCapabilites |= RT_BIT_32(PDMNETWORKGSOTYPE_IPV4_UDP);
2442+ if (fFeatures & NETIF_F_UFO)
2443+ fGsoCapabilites |= RT_BIT_32(PDMNETWORKGSOTYPE_IPV6_UDP);
2444+# endif
2445+ pThis->pSwitchPort->pfnReportGsoCapabilities(pThis->pSwitchPort, fGsoCapabilites, INTNETTRUNKDIR_WIRE);
2446+ }
2447+
2448+ vboxNetFltRelease(pThis, true /*fBusy*/);
2449+ }
2450+#endif /* VBOXNETFLT_WITH_GSO_XMIT_WIRE */
2451+}
2452+
2453+/**
2454+ * Helper that determines whether the host (ignoreing us) is operating the
2455+ * interface in promiscuous mode or not.
2456+ */
2457+static bool vboxNetFltLinuxPromiscuous(PVBOXNETFLTINS pThis)
2458+{
2459+ bool fRc = false;
2460+ struct net_device * pDev = vboxNetFltLinuxRetainNetDev(pThis);
2461+ if (pDev)
2462+ {
2463+ fRc = !!(pDev->promiscuity - (ASMAtomicUoReadBool(&pThis->u.s.fPromiscuousSet) & 1));
2464+ LogFlow(("vboxNetFltPortOsIsPromiscuous: returns %d, pDev->promiscuity=%d, fPromiscuousSet=%d\n",
2465+ fRc, pDev->promiscuity, pThis->u.s.fPromiscuousSet));
2466+ vboxNetFltLinuxReleaseNetDev(pThis, pDev);
2467+ }
2468+ return fRc;
2469+}
2470+
2471+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 36)
2472+/**
2473+ * Helper for detecting TAP devices.
2474+ */
2475+static bool vboxNetFltIsTapDevice(PVBOXNETFLTINS pThis, struct net_device *pDev)
2476+{
2477+ if (pDev->ethtool_ops && pDev->ethtool_ops->get_drvinfo)
2478+ {
2479+ struct ethtool_drvinfo Info;
2480+
2481+ memset(&Info, 0, sizeof(Info));
2482+ Info.cmd = ETHTOOL_GDRVINFO;
2483+ pDev->ethtool_ops->get_drvinfo(pDev, &Info);
2484+ Log3(("vboxNetFltIsTapDevice: driver=%s version=%s bus_info=%s\n",
2485+ Info.driver, Info.version, Info.bus_info));
2486+
2487+ return !strncmp(Info.driver, "tun", 4)
2488+ && !strncmp(Info.bus_info, "tap", 4);
2489+ }
2490+
2491+ return false;
2492+}
2493+
2494+/**
2495+ * Helper for updating the link state of TAP devices.
2496+ * Only TAP devices are affected.
2497+ */
2498+static void vboxNetFltSetTapLinkState(PVBOXNETFLTINS pThis, struct net_device *pDev, bool fLinkUp)
2499+{
2500+ if (vboxNetFltIsTapDevice(pThis, pDev))
2501+ {
2502+ Log3(("vboxNetFltSetTapLinkState: bringing %s tap device link state\n",
2503+ fLinkUp ? "up" : "down"));
2504+ netif_tx_lock_bh(pDev);
2505+ if (fLinkUp)
2506+ netif_carrier_on(pDev);
2507+ else
2508+ netif_carrier_off(pDev);
2509+ netif_tx_unlock_bh(pDev);
2510+ }
2511+}
2512+#else /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36) */
2513+DECLINLINE(void) vboxNetFltSetTapLinkState(PVBOXNETFLTINS pThis, struct net_device *pDev, bool fLinkUp)
2514+{
2515+ /* Nothing to do for pre-2.6.36 kernels. */
2516+}
2517+#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36) */
2518+
2519+/**
2520+ * Internal worker for vboxNetFltLinuxNotifierCallback.
2521+ *
2522+ * @returns VBox status code.
2523+ * @param pThis The instance.
2524+ * @param fRediscovery If set we're doing a rediscovery attempt, so, don't
2525+ * flood the release log.
2526+ */
2527+static int vboxNetFltLinuxAttachToInterface(PVBOXNETFLTINS pThis, struct net_device *pDev)
2528+{
2529+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
2530+ LogFlow(("vboxNetFltLinuxAttachToInterface: pThis=%p (%s)\n", pThis, pThis->szName));
2531+
2532+ /*
2533+ * Retain and store the device.
2534+ */
2535+ dev_hold(pDev);
2536+
2537+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
2538+ ASMAtomicUoWritePtr(&pThis->u.s.pDev, pDev);
2539+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
2540+
2541+ Log(("vboxNetFltLinuxAttachToInterface: Device %p(%s) retained. ref=%d\n",
2542+ pDev, pDev->name,
2543+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
2544+ netdev_refcnt_read(pDev)
2545+#else
2546+ atomic_read(&pDev->refcnt)
2547+#endif
2548+ ));
2549+ Log(("vboxNetFltLinuxAttachToInterface: Got pDev=%p pThis=%p pThis->u.s.pDev=%p\n",
2550+ pDev, pThis, ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *)));
2551+
2552+ /* Get the mac address while we still have a valid net_device reference. */
2553+ memcpy(&pThis->u.s.MacAddr, pDev->dev_addr, sizeof(pThis->u.s.MacAddr));
2554+
2555+ /*
2556+ * Install a packet filter for this device with a protocol wildcard (ETH_P_ALL).
2557+ */
2558+ pThis->u.s.PacketType.type = __constant_htons(ETH_P_ALL);
2559+ pThis->u.s.PacketType.dev = pDev;
2560+ pThis->u.s.PacketType.func = vboxNetFltLinuxPacketHandler;
2561+ dev_add_pack(&pThis->u.s.PacketType);
2562+
2563+#ifdef VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
2564+ vboxNetFltLinuxHookDev(pThis, pDev);
2565+#endif
2566+#ifdef VBOXNETFLT_WITH_QDISC
2567+ vboxNetFltLinuxQdiscInstall(pThis, pDev);
2568+#endif /* VBOXNETFLT_WITH_QDISC */
2569+
2570+ /*
2571+ * If attaching to TAP interface we need to bring the link state up
2572+ * starting from 2.6.36 kernel.
2573+ */
2574+ vboxNetFltSetTapLinkState(pThis, pDev, true);
2575+
2576+ /*
2577+ * Set indicators that require the spinlock. Be abit paranoid about racing
2578+ * the device notification handle.
2579+ */
2580+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
2581+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
2582+ if (pDev)
2583+ {
2584+ ASMAtomicUoWriteBool(&pThis->fDisconnectedFromHost, false);
2585+ ASMAtomicUoWriteBool(&pThis->u.s.fRegistered, true);
2586+ pDev = NULL; /* don't dereference it */
2587+ }
2588+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
2589+ Log(("vboxNetFltLinuxAttachToInterface: this=%p: Packet handler installed.\n", pThis));
2590+
2591+ /*
2592+ * If the above succeeded report GSO capabilities, if not undo and
2593+ * release the device.
2594+ */
2595+ if (!pDev)
2596+ {
2597+ Assert(pThis->pSwitchPort);
2598+ if (vboxNetFltTryRetainBusyNotDisconnected(pThis))
2599+ {
2600+ vboxNetFltLinuxReportNicGsoCapabilities(pThis);
2601+ pThis->pSwitchPort->pfnReportMacAddress(pThis->pSwitchPort, &pThis->u.s.MacAddr);
2602+ pThis->pSwitchPort->pfnReportPromiscuousMode(pThis->pSwitchPort, vboxNetFltLinuxPromiscuous(pThis));
2603+ pThis->pSwitchPort->pfnReportNoPreemptDsts(pThis->pSwitchPort, INTNETTRUNKDIR_WIRE | INTNETTRUNKDIR_HOST);
2604+ vboxNetFltRelease(pThis, true /*fBusy*/);
2605+ }
2606+ }
2607+ else
2608+ {
2609+#ifdef VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
2610+ vboxNetFltLinuxUnhookDev(pThis, pDev);
2611+#endif
2612+#ifdef VBOXNETFLT_WITH_QDISC
2613+ vboxNetFltLinuxQdiscRemove(pThis, pDev);
2614+#endif /* VBOXNETFLT_WITH_QDISC */
2615+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
2616+ ASMAtomicUoWriteNullPtr(&pThis->u.s.pDev);
2617+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
2618+ dev_put(pDev);
2619+ Log(("vboxNetFltLinuxAttachToInterface: Device %p(%s) released. ref=%d\n",
2620+ pDev, pDev->name,
2621+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
2622+ netdev_refcnt_read(pDev)
2623+#else
2624+ atomic_read(&pDev->refcnt)
2625+#endif
2626+ ));
2627+ }
2628+
2629+ LogRel(("VBoxNetFlt: attached to '%s' / %.*Rhxs\n", pThis->szName, sizeof(pThis->u.s.MacAddr), &pThis->u.s.MacAddr));
2630+ return VINF_SUCCESS;
2631+}
2632+
2633+
2634+static int vboxNetFltLinuxUnregisterDevice(PVBOXNETFLTINS pThis, struct net_device *pDev)
2635+{
2636+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
2637+
2638+ Assert(!pThis->fDisconnectedFromHost);
2639+
2640+#ifdef VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
2641+ vboxNetFltLinuxUnhookDev(pThis, pDev);
2642+#endif
2643+#ifdef VBOXNETFLT_WITH_QDISC
2644+ vboxNetFltLinuxQdiscRemove(pThis, pDev);
2645+#endif /* VBOXNETFLT_WITH_QDISC */
2646+
2647+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
2648+ ASMAtomicWriteBool(&pThis->u.s.fRegistered, false);
2649+ ASMAtomicWriteBool(&pThis->fDisconnectedFromHost, true);
2650+ ASMAtomicUoWriteNullPtr(&pThis->u.s.pDev);
2651+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
2652+
2653+ dev_remove_pack(&pThis->u.s.PacketType);
2654+#ifndef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
2655+ skb_queue_purge(&pThis->u.s.XmitQueue);
2656+#endif
2657+ Log(("vboxNetFltLinuxUnregisterDevice: this=%p: Packet handler removed, xmit queue purged.\n", pThis));
2658+ Log(("vboxNetFltLinuxUnregisterDevice: Device %p(%s) released. ref=%d\n",
2659+ pDev, pDev->name,
2660+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
2661+ netdev_refcnt_read(pDev)
2662+#else
2663+ atomic_read(&pDev->refcnt)
2664+#endif
2665+ ));
2666+ dev_put(pDev);
2667+
2668+ return NOTIFY_OK;
2669+}
2670+
2671+static int vboxNetFltLinuxDeviceIsUp(PVBOXNETFLTINS pThis, struct net_device *pDev)
2672+{
2673+ /* Check if we are not suspended and promiscuous mode has not been set. */
2674+ if ( pThis->enmTrunkState == INTNETTRUNKIFSTATE_ACTIVE
2675+ && !ASMAtomicUoReadBool(&pThis->u.s.fPromiscuousSet))
2676+ {
2677+ /* Note that there is no need for locking as the kernel got hold of the lock already. */
2678+ dev_set_promiscuity(pDev, 1);
2679+ ASMAtomicWriteBool(&pThis->u.s.fPromiscuousSet, true);
2680+ Log(("vboxNetFltLinuxDeviceIsUp: enabled promiscuous mode on %s (%d)\n", pThis->szName, pDev->promiscuity));
2681+ }
2682+ else
2683+ Log(("vboxNetFltLinuxDeviceIsUp: no need to enable promiscuous mode on %s (%d)\n", pThis->szName, pDev->promiscuity));
2684+ return NOTIFY_OK;
2685+}
2686+
2687+static int vboxNetFltLinuxDeviceGoingDown(PVBOXNETFLTINS pThis, struct net_device *pDev)
2688+{
2689+ /* Undo promiscuous mode if we has set it. */
2690+ if (ASMAtomicUoReadBool(&pThis->u.s.fPromiscuousSet))
2691+ {
2692+ /* Note that there is no need for locking as the kernel got hold of the lock already. */
2693+ dev_set_promiscuity(pDev, -1);
2694+ ASMAtomicWriteBool(&pThis->u.s.fPromiscuousSet, false);
2695+ Log(("vboxNetFltLinuxDeviceGoingDown: disabled promiscuous mode on %s (%d)\n", pThis->szName, pDev->promiscuity));
2696+ }
2697+ else
2698+ Log(("vboxNetFltLinuxDeviceGoingDown: no need to disable promiscuous mode on %s (%d)\n", pThis->szName, pDev->promiscuity));
2699+ return NOTIFY_OK;
2700+}
2701+
2702+#ifdef LOG_ENABLED
2703+/** Stringify the NETDEV_XXX constants. */
2704+static const char *vboxNetFltLinuxGetNetDevEventName(unsigned long ulEventType)
2705+{
2706+ const char *pszEvent = "NETDRV_<unknown>";
2707+ switch (ulEventType)
2708+ {
2709+ case NETDEV_REGISTER: pszEvent = "NETDEV_REGISTER"; break;
2710+ case NETDEV_UNREGISTER: pszEvent = "NETDEV_UNREGISTER"; break;
2711+ case NETDEV_UP: pszEvent = "NETDEV_UP"; break;
2712+ case NETDEV_DOWN: pszEvent = "NETDEV_DOWN"; break;
2713+ case NETDEV_REBOOT: pszEvent = "NETDEV_REBOOT"; break;
2714+ case NETDEV_CHANGENAME: pszEvent = "NETDEV_CHANGENAME"; break;
2715+ case NETDEV_CHANGE: pszEvent = "NETDEV_CHANGE"; break;
2716+ case NETDEV_CHANGEMTU: pszEvent = "NETDEV_CHANGEMTU"; break;
2717+ case NETDEV_CHANGEADDR: pszEvent = "NETDEV_CHANGEADDR"; break;
2718+ case NETDEV_GOING_DOWN: pszEvent = "NETDEV_GOING_DOWN"; break;
2719+# ifdef NETDEV_FEAT_CHANGE
2720+ case NETDEV_FEAT_CHANGE: pszEvent = "NETDEV_FEAT_CHANGE"; break;
2721+# endif
2722+ }
2723+ return pszEvent;
2724+}
2725+#endif /* LOG_ENABLED */
2726+
2727+/**
2728+ * Callback for listening to netdevice events.
2729+ *
2730+ * This works the rediscovery, clean up on unregistration, promiscuity on
2731+ * up/down, and GSO feature changes from ethtool.
2732+ *
2733+ * @returns NOTIFY_OK
2734+ * @param self Pointer to our notifier registration block.
2735+ * @param ulEventType The event.
2736+ * @param ptr Event specific, but it is usually the device it
2737+ * relates to.
2738+ */
2739+static int vboxNetFltLinuxNotifierCallback(struct notifier_block *self, unsigned long ulEventType, void *ptr)
2740+
2741+{
2742+ PVBOXNETFLTINS pThis = VBOX_FLT_NB_TO_INST(self);
2743+ struct net_device *pDev = (struct net_device *)ptr;
2744+ int rc = NOTIFY_OK;
2745+
2746+ Log(("VBoxNetFlt: got event %s(0x%lx) on %s, pDev=%p pThis=%p pThis->u.s.pDev=%p\n",
2747+ vboxNetFltLinuxGetNetDevEventName(ulEventType), ulEventType, pDev->name, pDev, pThis, ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *)));
2748+ if ( ulEventType == NETDEV_REGISTER
2749+ && !strcmp(pDev->name, pThis->szName))
2750+ {
2751+ vboxNetFltLinuxAttachToInterface(pThis, pDev);
2752+ }
2753+ else
2754+ {
2755+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
2756+ if (pDev == ptr)
2757+ {
2758+ switch (ulEventType)
2759+ {
2760+ case NETDEV_UNREGISTER:
2761+ rc = vboxNetFltLinuxUnregisterDevice(pThis, pDev);
2762+ break;
2763+ case NETDEV_UP:
2764+ rc = vboxNetFltLinuxDeviceIsUp(pThis, pDev);
2765+ break;
2766+ case NETDEV_GOING_DOWN:
2767+ rc = vboxNetFltLinuxDeviceGoingDown(pThis, pDev);
2768+ break;
2769+ case NETDEV_CHANGENAME:
2770+ break;
2771+#ifdef NETDEV_FEAT_CHANGE
2772+ case NETDEV_FEAT_CHANGE:
2773+ vboxNetFltLinuxReportNicGsoCapabilities(pThis);
2774+ break;
2775+#endif
2776+ }
2777+ }
2778+ }
2779+
2780+ return rc;
2781+}
2782+
2783+bool vboxNetFltOsMaybeRediscovered(PVBOXNETFLTINS pThis)
2784+{
2785+ return !ASMAtomicUoReadBool(&pThis->fDisconnectedFromHost);
2786+}
2787+
2788+int vboxNetFltPortOsXmit(PVBOXNETFLTINS pThis, void *pvIfData, PINTNETSG pSG, uint32_t fDst)
2789+{
2790+ struct net_device * pDev;
2791+ int err;
2792+ int rc = VINF_SUCCESS;
2793+ NOREF(pvIfData);
2794+
2795+ LogFlow(("vboxNetFltPortOsXmit: pThis=%p (%s)\n", pThis, pThis->szName));
2796+
2797+ pDev = vboxNetFltLinuxRetainNetDev(pThis);
2798+ if (pDev)
2799+ {
2800+ /*
2801+ * Create a sk_buff for the gather list and push it onto the wire.
2802+ */
2803+ if (fDst & INTNETTRUNKDIR_WIRE)
2804+ {
2805+ struct sk_buff *pBuf = vboxNetFltLinuxSkBufFromSG(pThis, pSG, true);
2806+ if (pBuf)
2807+ {
2808+ vboxNetFltDumpPacket(pSG, true, "wire", 1);
2809+ Log4(("vboxNetFltPortOsXmit: pBuf->cb dump:\n%.*Rhxd\n", sizeof(pBuf->cb), pBuf->cb));
2810+ Log4(("vboxNetFltPortOsXmit: dev_queue_xmit(%p)\n", pBuf));
2811+ err = dev_queue_xmit(pBuf);
2812+ if (err)
2813+ rc = RTErrConvertFromErrno(err);
2814+ }
2815+ else
2816+ rc = VERR_NO_MEMORY;
2817+ }
2818+
2819+ /*
2820+ * Create a sk_buff for the gather list and push it onto the host stack.
2821+ */
2822+ if (fDst & INTNETTRUNKDIR_HOST)
2823+ {
2824+ struct sk_buff *pBuf = vboxNetFltLinuxSkBufFromSG(pThis, pSG, false);
2825+ if (pBuf)
2826+ {
2827+ vboxNetFltDumpPacket(pSG, true, "host", (fDst & INTNETTRUNKDIR_WIRE) ? 0 : 1);
2828+ Log4(("vboxNetFltPortOsXmit: pBuf->cb dump:\n%.*Rhxd\n", sizeof(pBuf->cb), pBuf->cb));
2829+ Log4(("vboxNetFltPortOsXmit: netif_rx_ni(%p)\n", pBuf));
2830+ err = netif_rx_ni(pBuf);
2831+ if (err)
2832+ rc = RTErrConvertFromErrno(err);
2833+ }
2834+ else
2835+ rc = VERR_NO_MEMORY;
2836+ }
2837+
2838+ vboxNetFltLinuxReleaseNetDev(pThis, pDev);
2839+ }
2840+
2841+ return rc;
2842+}
2843+
2844+
2845+void vboxNetFltPortOsSetActive(PVBOXNETFLTINS pThis, bool fActive)
2846+{
2847+ struct net_device * pDev;
2848+
2849+ LogFlow(("vboxNetFltPortOsSetActive: pThis=%p (%s), fActive=%s, fDisablePromiscuous=%s\n",
2850+ pThis, pThis->szName, fActive?"true":"false",
2851+ pThis->fDisablePromiscuous?"true":"false"));
2852+
2853+ if (pThis->fDisablePromiscuous)
2854+ return;
2855+
2856+ pDev = vboxNetFltLinuxRetainNetDev(pThis);
2857+ if (pDev)
2858+ {
2859+ /*
2860+ * This api is a bit weird, the best reference is the code.
2861+ *
2862+ * Also, we have a bit or race conditions wrt the maintenance of
2863+ * host the interface promiscuity for vboxNetFltPortOsIsPromiscuous.
2864+ */
2865+#ifdef LOG_ENABLED
2866+ u_int16_t fIf;
2867+ unsigned const cPromiscBefore = pDev->promiscuity;
2868+#endif
2869+ if (fActive)
2870+ {
2871+ Assert(!pThis->u.s.fPromiscuousSet);
2872+
2873+ rtnl_lock();
2874+ dev_set_promiscuity(pDev, 1);
2875+ rtnl_unlock();
2876+ pThis->u.s.fPromiscuousSet = true;
2877+ Log(("vboxNetFltPortOsSetActive: enabled promiscuous mode on %s (%d)\n", pThis->szName, pDev->promiscuity));
2878+ }
2879+ else
2880+ {
2881+ if (pThis->u.s.fPromiscuousSet)
2882+ {
2883+ rtnl_lock();
2884+ dev_set_promiscuity(pDev, -1);
2885+ rtnl_unlock();
2886+ Log(("vboxNetFltPortOsSetActive: disabled promiscuous mode on %s (%d)\n", pThis->szName, pDev->promiscuity));
2887+ }
2888+ pThis->u.s.fPromiscuousSet = false;
2889+
2890+#ifdef LOG_ENABLED
2891+ fIf = dev_get_flags(pDev);
2892+ Log(("VBoxNetFlt: fIf=%#x; %d->%d\n", fIf, cPromiscBefore, pDev->promiscuity));
2893+#endif
2894+ }
2895+
2896+ vboxNetFltLinuxReleaseNetDev(pThis, pDev);
2897+ }
2898+}
2899+
2900+
2901+int vboxNetFltOsDisconnectIt(PVBOXNETFLTINS pThis)
2902+{
2903+#ifdef VBOXNETFLT_WITH_QDISC
2904+ vboxNetFltLinuxQdiscRemove(pThis, NULL);
2905+#endif /* VBOXNETFLT_WITH_QDISC */
2906+ /*
2907+ * Remove packet handler when we get disconnected from internal switch as
2908+ * we don't want the handler to forward packets to disconnected switch.
2909+ */
2910+ dev_remove_pack(&pThis->u.s.PacketType);
2911+ return VINF_SUCCESS;
2912+}
2913+
2914+
2915+int vboxNetFltOsConnectIt(PVBOXNETFLTINS pThis)
2916+{
2917+ /*
2918+ * Report the GSO capabilities of the host and device (if connected).
2919+ * Note! No need to mark ourselves busy here.
2920+ */
2921+ /** @todo duplicate work here now? Attach */
2922+#if defined(VBOXNETFLT_WITH_GSO_XMIT_HOST)
2923+ pThis->pSwitchPort->pfnReportGsoCapabilities(pThis->pSwitchPort,
2924+ 0
2925+ | RT_BIT_32(PDMNETWORKGSOTYPE_IPV4_TCP)
2926+ | RT_BIT_32(PDMNETWORKGSOTYPE_IPV6_TCP)
2927+# if 0 /** @todo GSO: Test UDP offloading (UFO) on linux. */
2928+ | RT_BIT_32(PDMNETWORKGSOTYPE_IPV4_UDP)
2929+ | RT_BIT_32(PDMNETWORKGSOTYPE_IPV6_UDP)
2930+# endif
2931+ , INTNETTRUNKDIR_HOST);
2932+
2933+#endif
2934+ vboxNetFltLinuxReportNicGsoCapabilities(pThis);
2935+
2936+ return VINF_SUCCESS;
2937+}
2938+
2939+
2940+void vboxNetFltOsDeleteInstance(PVBOXNETFLTINS pThis)
2941+{
2942+ struct net_device *pDev;
2943+ bool fRegistered;
2944+ RTSPINLOCKTMP Tmp = RTSPINLOCKTMP_INITIALIZER;
2945+
2946+#ifdef VBOXNETFLT_WITH_FILTER_HOST2GUEST_SKBS_EXPERIMENT
2947+ vboxNetFltLinuxUnhookDev(pThis, NULL);
2948+#endif
2949+
2950+ /** @todo This code may race vboxNetFltLinuxUnregisterDevice (very very
2951+ * unlikely, but none the less). Since it doesn't actually update the
2952+ * state (just reads it), it is likely to panic in some interesting
2953+ * ways. */
2954+
2955+ RTSpinlockAcquireNoInts(pThis->hSpinlock, &Tmp);
2956+ pDev = ASMAtomicUoReadPtrT(&pThis->u.s.pDev, struct net_device *);
2957+ fRegistered = ASMAtomicUoReadBool(&pThis->u.s.fRegistered);
2958+ RTSpinlockReleaseNoInts(pThis->hSpinlock, &Tmp);
2959+
2960+ if (fRegistered)
2961+ {
2962+ vboxNetFltSetTapLinkState(pThis, pDev, false);
2963+
2964+#ifndef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
2965+ skb_queue_purge(&pThis->u.s.XmitQueue);
2966+#endif
2967+ Log(("vboxNetFltOsDeleteInstance: this=%p: Packet handler removed, xmit queue purged.\n", pThis));
2968+ Log(("vboxNetFltOsDeleteInstance: Device %p(%s) released. ref=%d\n",
2969+ pDev, pDev->name,
2970+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
2971+ netdev_refcnt_read(pDev)
2972+#else
2973+ atomic_read(&pDev->refcnt)
2974+#endif
2975+ ));
2976+ dev_put(pDev);
2977+ }
2978+ Log(("vboxNetFltOsDeleteInstance: this=%p: Notifier removed.\n", pThis));
2979+ unregister_netdevice_notifier(&pThis->u.s.Notifier);
2980+ module_put(THIS_MODULE);
2981+}
2982+
2983+
2984+int vboxNetFltOsInitInstance(PVBOXNETFLTINS pThis, void *pvContext)
2985+{
2986+ int err;
2987+ NOREF(pvContext);
2988+
2989+ pThis->u.s.Notifier.notifier_call = vboxNetFltLinuxNotifierCallback;
2990+ err = register_netdevice_notifier(&pThis->u.s.Notifier);
2991+ if (err)
2992+ return VERR_INTNET_FLT_IF_FAILED;
2993+ if (!pThis->u.s.fRegistered)
2994+ {
2995+ unregister_netdevice_notifier(&pThis->u.s.Notifier);
2996+ LogRel(("VBoxNetFlt: failed to find %s.\n", pThis->szName));
2997+ return VERR_INTNET_FLT_IF_NOT_FOUND;
2998+ }
2999+
3000+ Log(("vboxNetFltOsInitInstance: this=%p: Notifier installed.\n", pThis));
3001+ if ( pThis->fDisconnectedFromHost
3002+ || !try_module_get(THIS_MODULE))
3003+ return VERR_INTNET_FLT_IF_FAILED;
3004+
3005+ return VINF_SUCCESS;
3006+}
3007+
3008+int vboxNetFltOsPreInitInstance(PVBOXNETFLTINS pThis)
3009+{
3010+ /*
3011+ * Init the linux specific members.
3012+ */
3013+ ASMAtomicUoWriteNullPtr(&pThis->u.s.pDev);
3014+ pThis->u.s.fRegistered = false;
3015+ pThis->u.s.fPromiscuousSet = false;
3016+ memset(&pThis->u.s.PacketType, 0, sizeof(pThis->u.s.PacketType));
3017+#ifndef VBOXNETFLT_LINUX_NO_XMIT_QUEUE
3018+ skb_queue_head_init(&pThis->u.s.XmitQueue);
3019+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 20)
3020+ INIT_WORK(&pThis->u.s.XmitTask, vboxNetFltLinuxXmitTask);
3021+# else
3022+ INIT_WORK(&pThis->u.s.XmitTask, vboxNetFltLinuxXmitTask, &pThis->u.s.XmitTask);
3023+# endif
3024+#endif
3025+
3026+ return VINF_SUCCESS;
3027+}
3028+
3029+
3030+void vboxNetFltPortOsNotifyMacAddress(PVBOXNETFLTINS pThis, void *pvIfData, PCRTMAC pMac)
3031+{
3032+ NOREF(pThis); NOREF(pvIfData); NOREF(pMac);
3033+}
3034+
3035+
3036+int vboxNetFltPortOsConnectInterface(PVBOXNETFLTINS pThis, void *pvIf, void **pvIfData)
3037+{
3038+ /* Nothing to do */
3039+ NOREF(pThis); NOREF(pvIf); NOREF(pvIfData);
3040+ return VINF_SUCCESS;
3041+}
3042+
3043+
3044+int vboxNetFltPortOsDisconnectInterface(PVBOXNETFLTINS pThis, void *pvIfData)
3045+{
3046+ /* Nothing to do */
3047+ NOREF(pThis); NOREF(pvIfData);
3048+ return VINF_SUCCESS;
3049+}
3050+
3051
3052=== added directory '.pc/39-kernel-35.patch'
3053=== added directory '.pc/39-kernel-35.patch/src'
3054=== added directory '.pc/39-kernel-35.patch/src/VBox'
3055=== added directory '.pc/39-kernel-35.patch/src/VBox/Additions'
3056=== added directory '.pc/39-kernel-35.patch/src/VBox/Additions/linux'
3057=== added directory '.pc/39-kernel-35.patch/src/VBox/Additions/linux/sharedfolders'
3058=== added file '.pc/39-kernel-35.patch/src/VBox/Additions/linux/sharedfolders/vfsmod.c'
3059--- .pc/39-kernel-35.patch/src/VBox/Additions/linux/sharedfolders/vfsmod.c 1970-01-01 00:00:00 +0000
3060+++ .pc/39-kernel-35.patch/src/VBox/Additions/linux/sharedfolders/vfsmod.c 2013-03-14 12:42:23 +0000
3061@@ -0,0 +1,596 @@
3062+/** @file
3063+ *
3064+ * vboxsf -- VirtualBox Guest Additions for Linux:
3065+ * Virtual File System for VirtualBox Shared Folders
3066+ *
3067+ * Module initialization/finalization
3068+ * File system registration/deregistration
3069+ * Superblock reading
3070+ * Few utility functions
3071+ */
3072+
3073+/*
3074+ * Copyright (C) 2006-2010 Oracle Corporation
3075+ *
3076+ * This file is part of VirtualBox Open Source Edition (OSE), as
3077+ * available from http://www.virtualbox.org. This file is free software;
3078+ * you can redistribute it and/or modify it under the terms of the GNU
3079+ * General Public License (GPL) as published by the Free Software
3080+ * Foundation, in version 2 as it comes in the "COPYING" file of the
3081+ * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
3082+ * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
3083+ */
3084+
3085+/**
3086+ * @note Anyone wishing to make changes here might wish to take a look at
3087+ * http://www.atnf.csiro.au/people/rgooch/linux/vfs.txt
3088+ * which seems to be the closest there is to official documentation on
3089+ * writing filesystem drivers for Linux.
3090+ */
3091+
3092+#include "vfsmod.h"
3093+
3094+MODULE_DESCRIPTION(VBOX_PRODUCT " VFS Module for Host File System Access");
3095+MODULE_AUTHOR(VBOX_VENDOR);
3096+MODULE_LICENSE("GPL");
3097+#ifdef MODULE_VERSION
3098+MODULE_VERSION(VBOX_VERSION_STRING " (interface " RT_XSTR(VMMDEV_VERSION) ")");
3099+#endif
3100+
3101+/* globals */
3102+VBSFCLIENT client_handle;
3103+
3104+/* forward declarations */
3105+static struct super_operations sf_super_ops;
3106+
3107+/* allocate global info, try to map host share */
3108+static int sf_glob_alloc(struct vbsf_mount_info_new *info, struct sf_glob_info **sf_gp)
3109+{
3110+ int err, rc;
3111+ SHFLSTRING *str_name;
3112+ size_t name_len, str_len;
3113+ struct sf_glob_info *sf_g;
3114+
3115+ TRACE();
3116+ sf_g = kmalloc(sizeof(*sf_g), GFP_KERNEL);
3117+ if (!sf_g)
3118+ {
3119+ err = -ENOMEM;
3120+ LogRelFunc(("could not allocate memory for global info\n"));
3121+ goto fail0;
3122+ }
3123+
3124+ RT_ZERO(*sf_g);
3125+
3126+ if ( info->nullchar != '\0'
3127+ || info->signature[0] != VBSF_MOUNT_SIGNATURE_BYTE_0
3128+ || info->signature[1] != VBSF_MOUNT_SIGNATURE_BYTE_1
3129+ || info->signature[2] != VBSF_MOUNT_SIGNATURE_BYTE_2)
3130+ {
3131+ /* An old version of mount.vboxsf made the syscall. Translate the
3132+ * old parameters to the new structure. */
3133+ struct vbsf_mount_info_old *info_old = (struct vbsf_mount_info_old *)info;
3134+ static struct vbsf_mount_info_new info_compat;
3135+
3136+ info = &info_compat;
3137+ memset(info, 0, sizeof(*info));
3138+ memcpy(&info->name, &info_old->name, MAX_HOST_NAME);
3139+ memcpy(&info->nls_name, &info_old->nls_name, MAX_NLS_NAME);
3140+ info->length = offsetof(struct vbsf_mount_info_new, dmode);
3141+ info->uid = info_old->uid;
3142+ info->gid = info_old->gid;
3143+ info->ttl = info_old->ttl;
3144+ }
3145+
3146+ info->name[sizeof(info->name) - 1] = 0;
3147+ info->nls_name[sizeof(info->nls_name) - 1] = 0;
3148+
3149+ name_len = strlen(info->name);
3150+ if (name_len > 0xfffe)
3151+ {
3152+ err = -ENAMETOOLONG;
3153+ LogFunc(("map name too big\n"));
3154+ goto fail1;
3155+ }
3156+
3157+ str_len = offsetof(SHFLSTRING, String.utf8) + name_len + 1;
3158+ str_name = kmalloc(str_len, GFP_KERNEL);
3159+ if (!str_name)
3160+ {
3161+ err = -ENOMEM;
3162+ LogRelFunc(("could not allocate memory for host name\n"));
3163+ goto fail1;
3164+ }
3165+
3166+ str_name->u16Length = name_len;
3167+ str_name->u16Size = name_len + 1;
3168+ memcpy(str_name->String.utf8, info->name, name_len + 1);
3169+
3170+ if (info->nls_name[0] && strcmp(info->nls_name, "utf8"))
3171+ {
3172+ sf_g->nls = load_nls(info->nls_name);
3173+ if (!sf_g->nls)
3174+ {
3175+ err = -EINVAL;
3176+ LogFunc(("failed to load nls %s\n", info->nls_name));
3177+ goto fail1;
3178+ }
3179+ }
3180+ else
3181+ sf_g->nls = NULL;
3182+
3183+ rc = vboxCallMapFolder(&client_handle, str_name, &sf_g->map);
3184+ kfree(str_name);
3185+
3186+ if (RT_FAILURE(rc))
3187+ {
3188+ err = -EPROTO;
3189+ LogFunc(("vboxCallMapFolder failed rc=%d\n", rc));
3190+ goto fail2;
3191+ }
3192+
3193+ sf_g->ttl = info->ttl;
3194+ sf_g->uid = info->uid;
3195+ sf_g->gid = info->gid;
3196+
3197+ if ((unsigned)info->length >= sizeof(struct vbsf_mount_info_new))
3198+ {
3199+ /* new fields */
3200+ sf_g->dmode = info->dmode;
3201+ sf_g->fmode = info->fmode;
3202+ sf_g->dmask = info->dmask;
3203+ sf_g->fmask = info->fmask;
3204+ }
3205+ else
3206+ {
3207+ sf_g->dmode = ~0;
3208+ sf_g->fmode = ~0;
3209+ }
3210+
3211+ *sf_gp = sf_g;
3212+ return 0;
3213+
3214+fail2:
3215+ if (sf_g->nls)
3216+ unload_nls(sf_g->nls);
3217+
3218+fail1:
3219+ kfree(sf_g);
3220+
3221+fail0:
3222+ return err;
3223+}
3224+
3225+/* unmap the share and free global info [sf_g] */
3226+static void
3227+sf_glob_free(struct sf_glob_info *sf_g)
3228+{
3229+ int rc;
3230+
3231+ TRACE();
3232+ rc = vboxCallUnmapFolder(&client_handle, &sf_g->map);
3233+ if (RT_FAILURE(rc))
3234+ LogFunc(("vboxCallUnmapFolder failed rc=%d\n", rc));
3235+
3236+ if (sf_g->nls)
3237+ unload_nls(sf_g->nls);
3238+
3239+ kfree(sf_g);
3240+}
3241+
3242+/**
3243+ * This is called (by sf_read_super_[24|26] when vfs mounts the fs and
3244+ * wants to read super_block.
3245+ *
3246+ * calls [sf_glob_alloc] to map the folder and allocate global
3247+ * information structure.
3248+ *
3249+ * initializes [sb], initializes root inode and dentry.
3250+ *
3251+ * should respect [flags]
3252+ */
3253+static int sf_read_super_aux(struct super_block *sb, void *data, int flags)
3254+{
3255+ int err;
3256+ struct dentry *droot;
3257+ struct inode *iroot;
3258+ struct sf_inode_info *sf_i;
3259+ struct sf_glob_info *sf_g;
3260+ SHFLFSOBJINFO fsinfo;
3261+ struct vbsf_mount_info_new *info;
3262+ bool fInodePut = true;
3263+
3264+ TRACE();
3265+ if (!data)
3266+ {
3267+ LogFunc(("no mount info specified\n"));
3268+ return -EINVAL;
3269+ }
3270+
3271+ info = data;
3272+
3273+ if (flags & MS_REMOUNT)
3274+ {
3275+ LogFunc(("remounting is not supported\n"));
3276+ return -ENOSYS;
3277+ }
3278+
3279+ err = sf_glob_alloc(info, &sf_g);
3280+ if (err)
3281+ goto fail0;
3282+
3283+ sf_i = kmalloc(sizeof (*sf_i), GFP_KERNEL);
3284+ if (!sf_i)
3285+ {
3286+ err = -ENOMEM;
3287+ LogRelFunc(("could not allocate memory for root inode info\n"));
3288+ goto fail1;
3289+ }
3290+
3291+ sf_i->handle = SHFL_HANDLE_NIL;
3292+ sf_i->path = kmalloc(sizeof(SHFLSTRING) + 1, GFP_KERNEL);
3293+ if (!sf_i->path)
3294+ {
3295+ err = -ENOMEM;
3296+ LogRelFunc(("could not allocate memory for root inode path\n"));
3297+ goto fail2;
3298+ }
3299+
3300+ sf_i->path->u16Length = 1;
3301+ sf_i->path->u16Size = 2;
3302+ sf_i->path->String.utf8[0] = '/';
3303+ sf_i->path->String.utf8[1] = 0;
3304+
3305+ err = sf_stat(__func__, sf_g, sf_i->path, &fsinfo, 0);
3306+ if (err)
3307+ {
3308+ LogFunc(("could not stat root of share\n"));
3309+ goto fail3;
3310+ }
3311+
3312+ sb->s_magic = 0xface;
3313+ sb->s_blocksize = 1024;
3314+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 3)
3315+ /* Required for seek/sendfile.
3316+ *
3317+ * Must by less than or equal to INT64_MAX despite the fact that the
3318+ * declaration of this variable is unsigned long long. See determination
3319+ * of 'loff_t max' in fs/read_write.c / do_sendfile(). I don't know the
3320+ * correct limit but MAX_LFS_FILESIZE (8TB-1 on 32-bit boxes) takes the
3321+ * page cache into account and is the suggested limit. */
3322+# if defined MAX_LFS_FILESIZE
3323+ sb->s_maxbytes = MAX_LFS_FILESIZE;
3324+# else
3325+ sb->s_maxbytes = 0x7fffffffffffffffULL;
3326+# endif
3327+#endif
3328+ sb->s_op = &sf_super_ops;
3329+
3330+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 25)
3331+ iroot = iget_locked(sb, 0);
3332+#else
3333+ iroot = iget(sb, 0);
3334+#endif
3335+ if (!iroot)
3336+ {
3337+ err = -ENOMEM; /* XXX */
3338+ LogFunc(("could not get root inode\n"));
3339+ goto fail3;
3340+ }
3341+
3342+ if (sf_init_backing_dev(sf_g))
3343+ {
3344+ err = -EINVAL;
3345+ LogFunc(("could not init bdi\n"));
3346+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 25)
3347+ unlock_new_inode(iroot);
3348+#endif
3349+ goto fail4;
3350+ }
3351+
3352+ sf_init_inode(sf_g, iroot, &fsinfo);
3353+ SET_INODE_INFO(iroot, sf_i);
3354+
3355+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 25)
3356+ unlock_new_inode(iroot);
3357+#endif
3358+
3359+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 4, 0)
3360+ droot = d_make_root(iroot);
3361+#else
3362+ droot = d_alloc_root(iroot);
3363+#endif
3364+ if (!droot)
3365+ {
3366+ err = -ENOMEM; /* XXX */
3367+ LogFunc(("d_alloc_root failed\n"));
3368+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 4, 0)
3369+ fInodePut = false;
3370+#endif
3371+ goto fail5;
3372+ }
3373+
3374+ sb->s_root = droot;
3375+ SET_GLOB_INFO(sb, sf_g);
3376+ return 0;
3377+
3378+fail5:
3379+ sf_done_backing_dev(sf_g);
3380+
3381+fail4:
3382+ if (fInodePut)
3383+ iput(iroot);
3384+
3385+fail3:
3386+ kfree(sf_i->path);
3387+
3388+fail2:
3389+ kfree(sf_i);
3390+
3391+fail1:
3392+ sf_glob_free(sf_g);
3393+
3394+fail0:
3395+ return err;
3396+}
3397+
3398+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 0)
3399+static struct super_block *
3400+sf_read_super_24(struct super_block *sb, void *data, int flags)
3401+{
3402+ int err;
3403+
3404+ TRACE();
3405+ err = sf_read_super_aux(sb, data, flags);
3406+ if (err)
3407+ return NULL;
3408+
3409+ return sb;
3410+}
3411+#endif
3412+
3413+/* this is called when vfs is about to destroy the [inode]. all
3414+ resources associated with this [inode] must be cleared here */
3415+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36)
3416+static void sf_clear_inode(struct inode *inode)
3417+{
3418+ struct sf_inode_info *sf_i;
3419+
3420+ TRACE();
3421+ sf_i = GET_INODE_INFO(inode);
3422+ if (!sf_i)
3423+ return;
3424+
3425+ BUG_ON(!sf_i->path);
3426+ kfree(sf_i->path);
3427+ kfree(sf_i);
3428+ SET_INODE_INFO(inode, NULL);
3429+}
3430+#else
3431+static void sf_evict_inode(struct inode *inode)
3432+{
3433+ struct sf_inode_info *sf_i;
3434+
3435+ TRACE();
3436+ truncate_inode_pages(&inode->i_data, 0);
3437+ end_writeback(inode);
3438+
3439+ sf_i = GET_INODE_INFO(inode);
3440+ if (!sf_i)
3441+ return;
3442+
3443+ BUG_ON(!sf_i->path);
3444+ kfree(sf_i->path);
3445+ kfree(sf_i);
3446+ SET_INODE_INFO(inode, NULL);
3447+}
3448+#endif
3449+
3450+/* this is called by vfs when it wants to populate [inode] with data.
3451+ the only thing that is known about inode at this point is its index
3452+ hence we can't do anything here, and let lookup/whatever with the
3453+ job to properly fill then [inode] */
3454+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 25)
3455+static void sf_read_inode(struct inode *inode)
3456+{
3457+}
3458+#endif
3459+
3460+/* vfs is done with [sb] (umount called) call [sf_glob_free] to unmap
3461+ the folder and free [sf_g] */
3462+static void sf_put_super(struct super_block *sb)
3463+{
3464+ struct sf_glob_info *sf_g;
3465+
3466+ sf_g = GET_GLOB_INFO(sb);
3467+ BUG_ON(!sf_g);
3468+ sf_done_backing_dev(sf_g);
3469+ sf_glob_free(sf_g);
3470+}
3471+
3472+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 18)
3473+static int sf_statfs(struct super_block *sb, STRUCT_STATFS *stat)
3474+{
3475+ return sf_get_volume_info(sb, stat);
3476+}
3477+#else
3478+static int sf_statfs(struct dentry *dentry, STRUCT_STATFS *stat)
3479+{
3480+ struct super_block *sb = dentry->d_inode->i_sb;
3481+ return sf_get_volume_info(sb, stat);
3482+}
3483+#endif
3484+
3485+static int sf_remount_fs(struct super_block *sb, int *flags, char *data)
3486+{
3487+ TRACE();
3488+ return -ENOSYS;
3489+}
3490+
3491+static struct super_operations sf_super_ops =
3492+{
3493+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 36)
3494+ .clear_inode = sf_clear_inode,
3495+#else
3496+ .evict_inode = sf_evict_inode,
3497+#endif
3498+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 25)
3499+ .read_inode = sf_read_inode,
3500+#endif
3501+ .put_super = sf_put_super,
3502+ .statfs = sf_statfs,
3503+ .remount_fs = sf_remount_fs
3504+};
3505+
3506+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 0)
3507+static DECLARE_FSTYPE(vboxsf_fs_type, "vboxsf", sf_read_super_24, 0);
3508+#else
3509+static int
3510+sf_read_super_26(struct super_block *sb, void *data, int flags)
3511+{
3512+ int err;
3513+
3514+ TRACE();
3515+ err = sf_read_super_aux(sb, data, flags);
3516+ if (err)
3517+ printk(KERN_DEBUG "sf_read_super_aux err=%d\n", err);
3518+
3519+ return err;
3520+}
3521+
3522+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 18)
3523+static struct super_block *sf_get_sb(struct file_system_type *fs_type, int flags,
3524+ const char *dev_name, void *data)
3525+{
3526+ TRACE();
3527+ return get_sb_nodev(fs_type, flags, data, sf_read_super_26);
3528+}
3529+# elif LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 39)
3530+static int sf_get_sb(struct file_system_type *fs_type, int flags,
3531+ const char *dev_name, void *data, struct vfsmount *mnt)
3532+{
3533+ TRACE();
3534+ return get_sb_nodev(fs_type, flags, data, sf_read_super_26, mnt);
3535+}
3536+# else
3537+static struct dentry *sf_mount(struct file_system_type *fs_type, int flags,
3538+ const char *dev_name, void *data)
3539+{
3540+ TRACE();
3541+ return mount_nodev(fs_type, flags, data, sf_read_super_26);
3542+}
3543+# endif
3544+
3545+static struct file_system_type vboxsf_fs_type =
3546+{
3547+ .owner = THIS_MODULE,
3548+ .name = "vboxsf",
3549+# if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 39)
3550+ .get_sb = sf_get_sb,
3551+# else
3552+ .mount = sf_mount,
3553+# endif
3554+ .kill_sb = kill_anon_super
3555+};
3556+#endif
3557+
3558+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 0)
3559+static int follow_symlinks = 0;
3560+module_param(follow_symlinks, int, 0);
3561+MODULE_PARM_DESC(follow_symlinks, "Let host resolve symlinks rather than showing them");
3562+#endif
3563+
3564+/* Module initialization/finalization handlers */
3565+static int __init init(void)
3566+{
3567+ int rcVBox;
3568+ int rcRet = 0;
3569+ int err;
3570+
3571+ TRACE();
3572+
3573+ if (sizeof(struct vbsf_mount_info_new) > PAGE_SIZE)
3574+ {
3575+ printk(KERN_ERR
3576+ "Mount information structure is too large %lu\n"
3577+ "Must be less than or equal to %lu\n",
3578+ (unsigned long)sizeof (struct vbsf_mount_info_new),
3579+ (unsigned long)PAGE_SIZE);
3580+ return -EINVAL;
3581+ }
3582+
3583+ err = register_filesystem(&vboxsf_fs_type);
3584+ if (err)
3585+ {
3586+ LogFunc(("register_filesystem err=%d\n", err));
3587+ return err;
3588+ }
3589+
3590+ rcVBox = vboxInit();
3591+ if (RT_FAILURE(rcVBox))
3592+ {
3593+ LogRelFunc(("vboxInit failed, rc=%d\n", rcVBox));
3594+ rcRet = -EPROTO;
3595+ goto fail0;
3596+ }
3597+
3598+ rcVBox = vboxConnect(&client_handle);
3599+ if (RT_FAILURE(rcVBox))
3600+ {
3601+ LogRelFunc(("vboxConnect failed, rc=%d\n", rcVBox));
3602+ rcRet = -EPROTO;
3603+ goto fail1;
3604+ }
3605+
3606+ rcVBox = vboxCallSetUtf8(&client_handle);
3607+ if (RT_FAILURE(rcVBox))
3608+ {
3609+ LogRelFunc(("vboxCallSetUtf8 failed, rc=%d\n", rcVBox));
3610+ rcRet = -EPROTO;
3611+ goto fail2;
3612+ }
3613+
3614+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 0)
3615+ if (!follow_symlinks)
3616+ {
3617+ rcVBox = vboxCallSetSymlinks(&client_handle);
3618+ if (RT_FAILURE(rcVBox))
3619+ {
3620+ printk(KERN_WARNING
3621+ "vboxsf: Host unable to show symlinks, rc=%d\n",
3622+ rcVBox);
3623+ }
3624+ }
3625+#endif
3626+
3627+ printk(KERN_DEBUG
3628+ "vboxsf: Successfully loaded version " VBOX_VERSION_STRING
3629+ " (interface " RT_XSTR(VMMDEV_VERSION) ")\n");
3630+
3631+ return 0;
3632+
3633+fail2:
3634+ vboxDisconnect(&client_handle);
3635+
3636+fail1:
3637+ vboxUninit();
3638+
3639+fail0:
3640+ unregister_filesystem(&vboxsf_fs_type);
3641+ return rcRet;
3642+}
3643+
3644+static void __exit fini(void)
3645+{
3646+ TRACE();
3647+
3648+ vboxDisconnect(&client_handle);
3649+ vboxUninit();
3650+ unregister_filesystem(&vboxsf_fs_type);
3651+}
3652+
3653+module_init(init);
3654+module_exit(fini);
3655+
3656+/* C++ hack */
3657+int __gxx_personality_v0 = 0xdeadbeef;
3658
3659=== added directory '.pc/39-kernel-35.patch/src/VBox/Runtime'
3660=== added directory '.pc/39-kernel-35.patch/src/VBox/Runtime/r0drv'
3661=== added directory '.pc/39-kernel-35.patch/src/VBox/Runtime/r0drv/linux'
3662=== added file '.pc/39-kernel-35.patch/src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c'
3663--- .pc/39-kernel-35.patch/src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c 1970-01-01 00:00:00 +0000
3664+++ .pc/39-kernel-35.patch/src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c 2013-03-14 12:42:23 +0000
3665@@ -0,0 +1,1550 @@
3666+/* $Revision: 75790 $ */
3667+/** @file
3668+ * IPRT - Ring-0 Memory Objects, Linux.
3669+ */
3670+
3671+/*
3672+ * Copyright (C) 2006-2007 Oracle Corporation
3673+ *
3674+ * This file is part of VirtualBox Open Source Edition (OSE), as
3675+ * available from http://www.virtualbox.org. This file is free software;
3676+ * you can redistribute it and/or modify it under the terms of the GNU
3677+ * General Public License (GPL) as published by the Free Software
3678+ * Foundation, in version 2 as it comes in the "COPYING" file of the
3679+ * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
3680+ * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
3681+ *
3682+ * The contents of this file may alternatively be used under the terms
3683+ * of the Common Development and Distribution License Version 1.0
3684+ * (CDDL) only, as it comes in the "COPYING.CDDL" file of the
3685+ * VirtualBox OSE distribution, in which case the provisions of the
3686+ * CDDL are applicable instead of those of the GPL.
3687+ *
3688+ * You may elect to license modified versions of this file under the
3689+ * terms and conditions of either the GPL or the CDDL or both.
3690+ */
3691+
3692+
3693+/*******************************************************************************
3694+* Header Files *
3695+*******************************************************************************/
3696+#include "the-linux-kernel.h"
3697+
3698+#include <iprt/memobj.h>
3699+#include <iprt/alloc.h>
3700+#include <iprt/assert.h>
3701+#include <iprt/log.h>
3702+#include <iprt/process.h>
3703+#include <iprt/string.h>
3704+#include "internal/memobj.h"
3705+
3706+
3707+/*******************************************************************************
3708+* Defined Constants And Macros *
3709+*******************************************************************************/
3710+/* early 2.6 kernels */
3711+#ifndef PAGE_SHARED_EXEC
3712+# define PAGE_SHARED_EXEC PAGE_SHARED
3713+#endif
3714+#ifndef PAGE_READONLY_EXEC
3715+# define PAGE_READONLY_EXEC PAGE_READONLY
3716+#endif
3717+
3718+/*
3719+ * 2.6.29+ kernels don't work with remap_pfn_range() anymore because
3720+ * track_pfn_vma_new() is apparently not defined for non-RAM pages.
3721+ * It should be safe to use vm_insert_page() older kernels as well.
3722+ */
3723+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 23)
3724+# define VBOX_USE_INSERT_PAGE
3725+#endif
3726+#if defined(CONFIG_X86_PAE) \
3727+ && ( defined(HAVE_26_STYLE_REMAP_PAGE_RANGE) \
3728+ || ( LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 0) \
3729+ && LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 11)))
3730+# define VBOX_USE_PAE_HACK
3731+#endif
3732+
3733+
3734+/*******************************************************************************
3735+* Structures and Typedefs *
3736+*******************************************************************************/
3737+/**
3738+ * The Darwin version of the memory object structure.
3739+ */
3740+typedef struct RTR0MEMOBJLNX
3741+{
3742+ /** The core structure. */
3743+ RTR0MEMOBJINTERNAL Core;
3744+ /** Set if the allocation is contiguous.
3745+ * This means it has to be given back as one chunk. */
3746+ bool fContiguous;
3747+ /** Set if we've vmap'ed the memory into ring-0. */
3748+ bool fMappedToRing0;
3749+ /** The pages in the apPages array. */
3750+ size_t cPages;
3751+ /** Array of struct page pointers. (variable size) */
3752+ struct page *apPages[1];
3753+} RTR0MEMOBJLNX, *PRTR0MEMOBJLNX;
3754+
3755+
3756+static void rtR0MemObjLinuxFreePages(PRTR0MEMOBJLNX pMemLnx);
3757+
3758+
3759+/**
3760+ * Helper that converts from a RTR0PROCESS handle to a linux task.
3761+ *
3762+ * @returns The corresponding Linux task.
3763+ * @param R0Process IPRT ring-0 process handle.
3764+ */
3765+static struct task_struct *rtR0ProcessToLinuxTask(RTR0PROCESS R0Process)
3766+{
3767+ /** @todo fix rtR0ProcessToLinuxTask!! */
3768+ return R0Process == RTR0ProcHandleSelf() ? current : NULL;
3769+}
3770+
3771+
3772+/**
3773+ * Compute order. Some functions allocate 2^order pages.
3774+ *
3775+ * @returns order.
3776+ * @param cPages Number of pages.
3777+ */
3778+static int rtR0MemObjLinuxOrder(size_t cPages)
3779+{
3780+ int iOrder;
3781+ size_t cTmp;
3782+
3783+ for (iOrder = 0, cTmp = cPages; cTmp >>= 1; ++iOrder)
3784+ ;
3785+ if (cPages & ~((size_t)1 << iOrder))
3786+ ++iOrder;
3787+
3788+ return iOrder;
3789+}
3790+
3791+
3792+/**
3793+ * Converts from RTMEM_PROT_* to Linux PAGE_*.
3794+ *
3795+ * @returns Linux page protection constant.
3796+ * @param fProt The IPRT protection mask.
3797+ * @param fKernel Whether it applies to kernel or user space.
3798+ */
3799+static pgprot_t rtR0MemObjLinuxConvertProt(unsigned fProt, bool fKernel)
3800+{
3801+ switch (fProt)
3802+ {
3803+ default:
3804+ AssertMsgFailed(("%#x %d\n", fProt, fKernel));
3805+ case RTMEM_PROT_NONE:
3806+ return PAGE_NONE;
3807+
3808+ case RTMEM_PROT_READ:
3809+ return fKernel ? PAGE_KERNEL_RO : PAGE_READONLY;
3810+
3811+ case RTMEM_PROT_WRITE:
3812+ case RTMEM_PROT_WRITE | RTMEM_PROT_READ:
3813+ return fKernel ? PAGE_KERNEL : PAGE_SHARED;
3814+
3815+ case RTMEM_PROT_EXEC:
3816+ case RTMEM_PROT_EXEC | RTMEM_PROT_READ:
3817+#if defined(RT_ARCH_X86) || defined(RT_ARCH_AMD64)
3818+ if (fKernel)
3819+ {
3820+ pgprot_t fPg = MY_PAGE_KERNEL_EXEC;
3821+ pgprot_val(fPg) &= ~_PAGE_RW;
3822+ return fPg;
3823+ }
3824+ return PAGE_READONLY_EXEC;
3825+#else
3826+ return fKernel ? MY_PAGE_KERNEL_EXEC : PAGE_READONLY_EXEC;
3827+#endif
3828+
3829+ case RTMEM_PROT_WRITE | RTMEM_PROT_EXEC:
3830+ case RTMEM_PROT_WRITE | RTMEM_PROT_EXEC | RTMEM_PROT_READ:
3831+ return fKernel ? MY_PAGE_KERNEL_EXEC : PAGE_SHARED_EXEC;
3832+ }
3833+}
3834+
3835+
3836+/**
3837+ * Internal worker that allocates physical pages and creates the memory object for them.
3838+ *
3839+ * @returns IPRT status code.
3840+ * @param ppMemLnx Where to store the memory object pointer.
3841+ * @param enmType The object type.
3842+ * @param cb The number of bytes to allocate.
3843+ * @param uAlignment The alignment of the physical memory.
3844+ * Only valid if fContiguous == true, ignored otherwise.
3845+ * @param fFlagsLnx The page allocation flags (GPFs).
3846+ * @param fContiguous Whether the allocation must be contiguous.
3847+ */
3848+static int rtR0MemObjLinuxAllocPages(PRTR0MEMOBJLNX *ppMemLnx, RTR0MEMOBJTYPE enmType, size_t cb,
3849+ size_t uAlignment, unsigned fFlagsLnx, bool fContiguous)
3850+{
3851+ size_t iPage;
3852+ size_t const cPages = cb >> PAGE_SHIFT;
3853+ struct page *paPages;
3854+
3855+ /*
3856+ * Allocate a memory object structure that's large enough to contain
3857+ * the page pointer array.
3858+ */
3859+ PRTR0MEMOBJLNX pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(RT_OFFSETOF(RTR0MEMOBJLNX, apPages[cPages]), enmType, NULL, cb);
3860+ if (!pMemLnx)
3861+ return VERR_NO_MEMORY;
3862+ pMemLnx->cPages = cPages;
3863+
3864+ if (cPages > 255)
3865+ {
3866+# ifdef __GFP_REPEAT
3867+ /* Try hard to allocate the memory, but the allocation attempt might fail. */
3868+ fFlagsLnx |= __GFP_REPEAT;
3869+# endif
3870+# ifdef __GFP_NOMEMALLOC
3871+ /* Introduced with Linux 2.6.12: Don't use emergency reserves */
3872+ fFlagsLnx |= __GFP_NOMEMALLOC;
3873+# endif
3874+ }
3875+
3876+ /*
3877+ * Allocate the pages.
3878+ * For small allocations we'll try contiguous first and then fall back on page by page.
3879+ */
3880+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
3881+ if ( fContiguous
3882+ || cb <= PAGE_SIZE * 2)
3883+ {
3884+# ifdef VBOX_USE_INSERT_PAGE
3885+ paPages = alloc_pages(fFlagsLnx | __GFP_COMP, rtR0MemObjLinuxOrder(cPages));
3886+# else
3887+ paPages = alloc_pages(fFlagsLnx, rtR0MemObjLinuxOrder(cPages));
3888+# endif
3889+ if (paPages)
3890+ {
3891+ fContiguous = true;
3892+ for (iPage = 0; iPage < cPages; iPage++)
3893+ pMemLnx->apPages[iPage] = &paPages[iPage];
3894+ }
3895+ else if (fContiguous)
3896+ {
3897+ rtR0MemObjDelete(&pMemLnx->Core);
3898+ return VERR_NO_MEMORY;
3899+ }
3900+ }
3901+
3902+ if (!fContiguous)
3903+ {
3904+ for (iPage = 0; iPage < cPages; iPage++)
3905+ {
3906+ pMemLnx->apPages[iPage] = alloc_page(fFlagsLnx);
3907+ if (RT_UNLIKELY(!pMemLnx->apPages[iPage]))
3908+ {
3909+ while (iPage-- > 0)
3910+ __free_page(pMemLnx->apPages[iPage]);
3911+ rtR0MemObjDelete(&pMemLnx->Core);
3912+ return VERR_NO_MEMORY;
3913+ }
3914+ }
3915+ }
3916+
3917+#else /* < 2.4.22 */
3918+ /** @todo figure out why we didn't allocate page-by-page on 2.4.21 and older... */
3919+ paPages = alloc_pages(fFlagsLnx, rtR0MemObjLinuxOrder(cPages));
3920+ if (!paPages)
3921+ {
3922+ rtR0MemObjDelete(&pMemLnx->Core);
3923+ return VERR_NO_MEMORY;
3924+ }
3925+ for (iPage = 0; iPage < cPages; iPage++)
3926+ {
3927+ pMemLnx->apPages[iPage] = &paPages[iPage];
3928+ MY_SET_PAGES_EXEC(pMemLnx->apPages[iPage], 1);
3929+ if (PageHighMem(pMemLnx->apPages[iPage]))
3930+ BUG();
3931+ }
3932+
3933+ fContiguous = true;
3934+#endif /* < 2.4.22 */
3935+ pMemLnx->fContiguous = fContiguous;
3936+
3937+ /*
3938+ * Reserve the pages.
3939+ */
3940+ for (iPage = 0; iPage < cPages; iPage++)
3941+ SetPageReserved(pMemLnx->apPages[iPage]);
3942+
3943+ /*
3944+ * Note that the physical address of memory allocated with alloc_pages(flags, order)
3945+ * is always 2^(PAGE_SHIFT+order)-aligned.
3946+ */
3947+ if ( fContiguous
3948+ && uAlignment > PAGE_SIZE)
3949+ {
3950+ /*
3951+ * Check for alignment constraints. The physical address of memory allocated with
3952+ * alloc_pages(flags, order) is always 2^(PAGE_SHIFT+order)-aligned.
3953+ */
3954+ if (RT_UNLIKELY(page_to_phys(pMemLnx->apPages[0]) & (uAlignment - 1)))
3955+ {
3956+ /*
3957+ * This should never happen!
3958+ */
3959+ printk("rtR0MemObjLinuxAllocPages(cb=0x%lx, uAlignment=0x%lx): alloc_pages(..., %d) returned physical memory at 0x%lx!\n",
3960+ (unsigned long)cb, (unsigned long)uAlignment, rtR0MemObjLinuxOrder(cPages), (unsigned long)page_to_phys(pMemLnx->apPages[0]));
3961+ rtR0MemObjLinuxFreePages(pMemLnx);
3962+ return VERR_NO_MEMORY;
3963+ }
3964+ }
3965+
3966+ *ppMemLnx = pMemLnx;
3967+ return VINF_SUCCESS;
3968+}
3969+
3970+
3971+/**
3972+ * Frees the physical pages allocated by the rtR0MemObjLinuxAllocPages() call.
3973+ *
3974+ * This method does NOT free the object.
3975+ *
3976+ * @param pMemLnx The object which physical pages should be freed.
3977+ */
3978+static void rtR0MemObjLinuxFreePages(PRTR0MEMOBJLNX pMemLnx)
3979+{
3980+ size_t iPage = pMemLnx->cPages;
3981+ if (iPage > 0)
3982+ {
3983+ /*
3984+ * Restore the page flags.
3985+ */
3986+ while (iPage-- > 0)
3987+ {
3988+ ClearPageReserved(pMemLnx->apPages[iPage]);
3989+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
3990+#else
3991+ MY_SET_PAGES_NOEXEC(pMemLnx->apPages[iPage], 1);
3992+#endif
3993+ }
3994+
3995+ /*
3996+ * Free the pages.
3997+ */
3998+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
3999+ if (!pMemLnx->fContiguous)
4000+ {
4001+ iPage = pMemLnx->cPages;
4002+ while (iPage-- > 0)
4003+ __free_page(pMemLnx->apPages[iPage]);
4004+ }
4005+ else
4006+#endif
4007+ __free_pages(pMemLnx->apPages[0], rtR0MemObjLinuxOrder(pMemLnx->cPages));
4008+
4009+ pMemLnx->cPages = 0;
4010+ }
4011+}
4012+
4013+
4014+/**
4015+ * Maps the allocation into ring-0.
4016+ *
4017+ * This will update the RTR0MEMOBJLNX::Core.pv and RTR0MEMOBJ::fMappedToRing0 members.
4018+ *
4019+ * Contiguous mappings that isn't in 'high' memory will already be mapped into kernel
4020+ * space, so we'll use that mapping if possible. If execute access is required, we'll
4021+ * play safe and do our own mapping.
4022+ *
4023+ * @returns IPRT status code.
4024+ * @param pMemLnx The linux memory object to map.
4025+ * @param fExecutable Whether execute access is required.
4026+ */
4027+static int rtR0MemObjLinuxVMap(PRTR0MEMOBJLNX pMemLnx, bool fExecutable)
4028+{
4029+ int rc = VINF_SUCCESS;
4030+
4031+ /*
4032+ * Choose mapping strategy.
4033+ */
4034+ bool fMustMap = fExecutable
4035+ || !pMemLnx->fContiguous;
4036+ if (!fMustMap)
4037+ {
4038+ size_t iPage = pMemLnx->cPages;
4039+ while (iPage-- > 0)
4040+ if (PageHighMem(pMemLnx->apPages[iPage]))
4041+ {
4042+ fMustMap = true;
4043+ break;
4044+ }
4045+ }
4046+
4047+ Assert(!pMemLnx->Core.pv);
4048+ Assert(!pMemLnx->fMappedToRing0);
4049+
4050+ if (fMustMap)
4051+ {
4052+ /*
4053+ * Use vmap - 2.4.22 and later.
4054+ */
4055+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
4056+ pgprot_t fPg;
4057+ pgprot_val(fPg) = _PAGE_PRESENT | _PAGE_RW;
4058+# ifdef _PAGE_NX
4059+ if (!fExecutable)
4060+ pgprot_val(fPg) |= _PAGE_NX;
4061+# endif
4062+
4063+# ifdef VM_MAP
4064+ pMemLnx->Core.pv = vmap(&pMemLnx->apPages[0], pMemLnx->cPages, VM_MAP, fPg);
4065+# else
4066+ pMemLnx->Core.pv = vmap(&pMemLnx->apPages[0], pMemLnx->cPages, VM_ALLOC, fPg);
4067+# endif
4068+ if (pMemLnx->Core.pv)
4069+ pMemLnx->fMappedToRing0 = true;
4070+ else
4071+ rc = VERR_MAP_FAILED;
4072+#else /* < 2.4.22 */
4073+ rc = VERR_NOT_SUPPORTED;
4074+#endif
4075+ }
4076+ else
4077+ {
4078+ /*
4079+ * Use the kernel RAM mapping.
4080+ */
4081+ pMemLnx->Core.pv = phys_to_virt(page_to_phys(pMemLnx->apPages[0]));
4082+ Assert(pMemLnx->Core.pv);
4083+ }
4084+
4085+ return rc;
4086+}
4087+
4088+
4089+/**
4090+ * Undos what rtR0MemObjLinuxVMap() did.
4091+ *
4092+ * @param pMemLnx The linux memory object.
4093+ */
4094+static void rtR0MemObjLinuxVUnmap(PRTR0MEMOBJLNX pMemLnx)
4095+{
4096+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
4097+ if (pMemLnx->fMappedToRing0)
4098+ {
4099+ Assert(pMemLnx->Core.pv);
4100+ vunmap(pMemLnx->Core.pv);
4101+ pMemLnx->fMappedToRing0 = false;
4102+ }
4103+#else /* < 2.4.22 */
4104+ Assert(!pMemLnx->fMappedToRing0);
4105+#endif
4106+ pMemLnx->Core.pv = NULL;
4107+}
4108+
4109+
4110+DECLHIDDEN(int) rtR0MemObjNativeFree(RTR0MEMOBJ pMem)
4111+{
4112+ PRTR0MEMOBJLNX pMemLnx = (PRTR0MEMOBJLNX)pMem;
4113+
4114+ /*
4115+ * Release any memory that we've allocated or locked.
4116+ */
4117+ switch (pMemLnx->Core.enmType)
4118+ {
4119+ case RTR0MEMOBJTYPE_LOW:
4120+ case RTR0MEMOBJTYPE_PAGE:
4121+ case RTR0MEMOBJTYPE_CONT:
4122+ case RTR0MEMOBJTYPE_PHYS:
4123+ case RTR0MEMOBJTYPE_PHYS_NC:
4124+ rtR0MemObjLinuxVUnmap(pMemLnx);
4125+ rtR0MemObjLinuxFreePages(pMemLnx);
4126+ break;
4127+
4128+ case RTR0MEMOBJTYPE_LOCK:
4129+ if (pMemLnx->Core.u.Lock.R0Process != NIL_RTR0PROCESS)
4130+ {
4131+ struct task_struct *pTask = rtR0ProcessToLinuxTask(pMemLnx->Core.u.Lock.R0Process);
4132+ size_t iPage;
4133+ Assert(pTask);
4134+ if (pTask && pTask->mm)
4135+ down_read(&pTask->mm->mmap_sem);
4136+
4137+ iPage = pMemLnx->cPages;
4138+ while (iPage-- > 0)
4139+ {
4140+ if (!PageReserved(pMemLnx->apPages[iPage]))
4141+ SetPageDirty(pMemLnx->apPages[iPage]);
4142+ page_cache_release(pMemLnx->apPages[iPage]);
4143+ }
4144+
4145+ if (pTask && pTask->mm)
4146+ up_read(&pTask->mm->mmap_sem);
4147+ }
4148+ /* else: kernel memory - nothing to do here. */
4149+ break;
4150+
4151+ case RTR0MEMOBJTYPE_RES_VIRT:
4152+ Assert(pMemLnx->Core.pv);
4153+ if (pMemLnx->Core.u.ResVirt.R0Process != NIL_RTR0PROCESS)
4154+ {
4155+ struct task_struct *pTask = rtR0ProcessToLinuxTask(pMemLnx->Core.u.Lock.R0Process);
4156+ Assert(pTask);
4157+ if (pTask && pTask->mm)
4158+ {
4159+ down_write(&pTask->mm->mmap_sem);
4160+ MY_DO_MUNMAP(pTask->mm, (unsigned long)pMemLnx->Core.pv, pMemLnx->Core.cb);
4161+ up_write(&pTask->mm->mmap_sem);
4162+ }
4163+ }
4164+ else
4165+ {
4166+ vunmap(pMemLnx->Core.pv);
4167+
4168+ Assert(pMemLnx->cPages == 1 && pMemLnx->apPages[0] != NULL);
4169+ __free_page(pMemLnx->apPages[0]);
4170+ pMemLnx->apPages[0] = NULL;
4171+ pMemLnx->cPages = 0;
4172+ }
4173+ pMemLnx->Core.pv = NULL;
4174+ break;
4175+
4176+ case RTR0MEMOBJTYPE_MAPPING:
4177+ Assert(pMemLnx->cPages == 0); Assert(pMemLnx->Core.pv);
4178+ if (pMemLnx->Core.u.ResVirt.R0Process != NIL_RTR0PROCESS)
4179+ {
4180+ struct task_struct *pTask = rtR0ProcessToLinuxTask(pMemLnx->Core.u.Lock.R0Process);
4181+ Assert(pTask);
4182+ if (pTask && pTask->mm)
4183+ {
4184+ down_write(&pTask->mm->mmap_sem);
4185+ MY_DO_MUNMAP(pTask->mm, (unsigned long)pMemLnx->Core.pv, pMemLnx->Core.cb);
4186+ up_write(&pTask->mm->mmap_sem);
4187+ }
4188+ }
4189+ else
4190+ vunmap(pMemLnx->Core.pv);
4191+ pMemLnx->Core.pv = NULL;
4192+ break;
4193+
4194+ default:
4195+ AssertMsgFailed(("enmType=%d\n", pMemLnx->Core.enmType));
4196+ return VERR_INTERNAL_ERROR;
4197+ }
4198+ return VINF_SUCCESS;
4199+}
4200+
4201+
4202+DECLHIDDEN(int) rtR0MemObjNativeAllocPage(PPRTR0MEMOBJINTERNAL ppMem, size_t cb, bool fExecutable)
4203+{
4204+ PRTR0MEMOBJLNX pMemLnx;
4205+ int rc;
4206+
4207+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
4208+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_PAGE, cb, PAGE_SIZE, GFP_HIGHUSER, false /* non-contiguous */);
4209+#else
4210+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_PAGE, cb, PAGE_SIZE, GFP_USER, false /* non-contiguous */);
4211+#endif
4212+ if (RT_SUCCESS(rc))
4213+ {
4214+ rc = rtR0MemObjLinuxVMap(pMemLnx, fExecutable);
4215+ if (RT_SUCCESS(rc))
4216+ {
4217+ *ppMem = &pMemLnx->Core;
4218+ return rc;
4219+ }
4220+
4221+ rtR0MemObjLinuxFreePages(pMemLnx);
4222+ rtR0MemObjDelete(&pMemLnx->Core);
4223+ }
4224+
4225+ return rc;
4226+}
4227+
4228+
4229+DECLHIDDEN(int) rtR0MemObjNativeAllocLow(PPRTR0MEMOBJINTERNAL ppMem, size_t cb, bool fExecutable)
4230+{
4231+ PRTR0MEMOBJLNX pMemLnx;
4232+ int rc;
4233+
4234+ /* Try to avoid GFP_DMA. GFM_DMA32 was introduced with Linux 2.6.15. */
4235+#if (defined(RT_ARCH_AMD64) || defined(CONFIG_X86_PAE)) && defined(GFP_DMA32)
4236+ /* ZONE_DMA32: 0-4GB */
4237+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_LOW, cb, PAGE_SIZE, GFP_DMA32, false /* non-contiguous */);
4238+ if (RT_FAILURE(rc))
4239+#endif
4240+#ifdef RT_ARCH_AMD64
4241+ /* ZONE_DMA: 0-16MB */
4242+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_LOW, cb, PAGE_SIZE, GFP_DMA, false /* non-contiguous */);
4243+#else
4244+# ifdef CONFIG_X86_PAE
4245+# endif
4246+ /* ZONE_NORMAL: 0-896MB */
4247+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_LOW, cb, PAGE_SIZE, GFP_USER, false /* non-contiguous */);
4248+#endif
4249+ if (RT_SUCCESS(rc))
4250+ {
4251+ rc = rtR0MemObjLinuxVMap(pMemLnx, fExecutable);
4252+ if (RT_SUCCESS(rc))
4253+ {
4254+ *ppMem = &pMemLnx->Core;
4255+ return rc;
4256+ }
4257+
4258+ rtR0MemObjLinuxFreePages(pMemLnx);
4259+ rtR0MemObjDelete(&pMemLnx->Core);
4260+ }
4261+
4262+ return rc;
4263+}
4264+
4265+
4266+DECLHIDDEN(int) rtR0MemObjNativeAllocCont(PPRTR0MEMOBJINTERNAL ppMem, size_t cb, bool fExecutable)
4267+{
4268+ PRTR0MEMOBJLNX pMemLnx;
4269+ int rc;
4270+
4271+#if (defined(RT_ARCH_AMD64) || defined(CONFIG_X86_PAE)) && defined(GFP_DMA32)
4272+ /* ZONE_DMA32: 0-4GB */
4273+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_CONT, cb, PAGE_SIZE, GFP_DMA32, true /* contiguous */);
4274+ if (RT_FAILURE(rc))
4275+#endif
4276+#ifdef RT_ARCH_AMD64
4277+ /* ZONE_DMA: 0-16MB */
4278+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_CONT, cb, PAGE_SIZE, GFP_DMA, true /* contiguous */);
4279+#else
4280+ /* ZONE_NORMAL (32-bit hosts): 0-896MB */
4281+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, RTR0MEMOBJTYPE_CONT, cb, PAGE_SIZE, GFP_USER, true /* contiguous */);
4282+#endif
4283+ if (RT_SUCCESS(rc))
4284+ {
4285+ rc = rtR0MemObjLinuxVMap(pMemLnx, fExecutable);
4286+ if (RT_SUCCESS(rc))
4287+ {
4288+#if defined(RT_STRICT) && (defined(RT_ARCH_AMD64) || defined(CONFIG_HIGHMEM64G))
4289+ size_t iPage = pMemLnx->cPages;
4290+ while (iPage-- > 0)
4291+ Assert(page_to_phys(pMemLnx->apPages[iPage]) < _4G);
4292+#endif
4293+ pMemLnx->Core.u.Cont.Phys = page_to_phys(pMemLnx->apPages[0]);
4294+ *ppMem = &pMemLnx->Core;
4295+ return rc;
4296+ }
4297+
4298+ rtR0MemObjLinuxFreePages(pMemLnx);
4299+ rtR0MemObjDelete(&pMemLnx->Core);
4300+ }
4301+
4302+ return rc;
4303+}
4304+
4305+
4306+/**
4307+ * Worker for rtR0MemObjLinuxAllocPhysSub that tries one allocation strategy.
4308+ *
4309+ * @returns IPRT status.
4310+ * @param ppMemLnx Where to
4311+ * @param enmType The object type.
4312+ * @param cb The size of the allocation.
4313+ * @param uAlignment The alignment of the physical memory.
4314+ * Only valid for fContiguous == true, ignored otherwise.
4315+ * @param PhysHighest See rtR0MemObjNativeAllocPhys.
4316+ * @param fGfp The Linux GFP flags to use for the allocation.
4317+ */
4318+static int rtR0MemObjLinuxAllocPhysSub2(PPRTR0MEMOBJINTERNAL ppMem, RTR0MEMOBJTYPE enmType,
4319+ size_t cb, size_t uAlignment, RTHCPHYS PhysHighest, unsigned fGfp)
4320+{
4321+ PRTR0MEMOBJLNX pMemLnx;
4322+ int rc;
4323+
4324+ rc = rtR0MemObjLinuxAllocPages(&pMemLnx, enmType, cb, uAlignment, fGfp,
4325+ enmType == RTR0MEMOBJTYPE_PHYS /* contiguous / non-contiguous */);
4326+ if (RT_FAILURE(rc))
4327+ return rc;
4328+
4329+ /*
4330+ * Check the addresses if necessary. (Can be optimized a bit for PHYS.)
4331+ */
4332+ if (PhysHighest != NIL_RTHCPHYS)
4333+ {
4334+ size_t iPage = pMemLnx->cPages;
4335+ while (iPage-- > 0)
4336+ if (page_to_phys(pMemLnx->apPages[iPage]) >= PhysHighest)
4337+ {
4338+ rtR0MemObjLinuxFreePages(pMemLnx);
4339+ rtR0MemObjDelete(&pMemLnx->Core);
4340+ return VERR_NO_MEMORY;
4341+ }
4342+ }
4343+
4344+ /*
4345+ * Complete the object.
4346+ */
4347+ if (enmType == RTR0MEMOBJTYPE_PHYS)
4348+ {
4349+ pMemLnx->Core.u.Phys.PhysBase = page_to_phys(pMemLnx->apPages[0]);
4350+ pMemLnx->Core.u.Phys.fAllocated = true;
4351+ }
4352+ *ppMem = &pMemLnx->Core;
4353+ return rc;
4354+}
4355+
4356+
4357+/**
4358+ * Worker for rtR0MemObjNativeAllocPhys and rtR0MemObjNativeAllocPhysNC.
4359+ *
4360+ * @returns IPRT status.
4361+ * @param ppMem Where to store the memory object pointer on success.
4362+ * @param enmType The object type.
4363+ * @param cb The size of the allocation.
4364+ * @param uAlignment The alignment of the physical memory.
4365+ * Only valid for enmType == RTR0MEMOBJTYPE_PHYS, ignored otherwise.
4366+ * @param PhysHighest See rtR0MemObjNativeAllocPhys.
4367+ */
4368+static int rtR0MemObjLinuxAllocPhysSub(PPRTR0MEMOBJINTERNAL ppMem, RTR0MEMOBJTYPE enmType,
4369+ size_t cb, size_t uAlignment, RTHCPHYS PhysHighest)
4370+{
4371+ int rc;
4372+
4373+ /*
4374+ * There are two clear cases and that's the <=16MB and anything-goes ones.
4375+ * When the physical address limit is somewhere in-between those two we'll
4376+ * just have to try, starting with HIGHUSER and working our way thru the
4377+ * different types, hoping we'll get lucky.
4378+ *
4379+ * We should probably move this physical address restriction logic up to
4380+ * the page alloc function as it would be more efficient there. But since
4381+ * we don't expect this to be a performance issue just yet it can wait.
4382+ */
4383+ if (PhysHighest == NIL_RTHCPHYS)
4384+ /* ZONE_HIGHMEM: the whole physical memory */
4385+ rc = rtR0MemObjLinuxAllocPhysSub2(ppMem, enmType, cb, uAlignment, PhysHighest, GFP_HIGHUSER);
4386+ else if (PhysHighest <= _1M * 16)
4387+ /* ZONE_DMA: 0-16MB */
4388+ rc = rtR0MemObjLinuxAllocPhysSub2(ppMem, enmType, cb, uAlignment, PhysHighest, GFP_DMA);
4389+ else
4390+ {
4391+ rc = VERR_NO_MEMORY;
4392+ if (RT_FAILURE(rc))
4393+ /* ZONE_HIGHMEM: the whole physical memory */
4394+ rc = rtR0MemObjLinuxAllocPhysSub2(ppMem, enmType, cb, uAlignment, PhysHighest, GFP_HIGHUSER);
4395+ if (RT_FAILURE(rc))
4396+ /* ZONE_NORMAL: 0-896MB */
4397+ rc = rtR0MemObjLinuxAllocPhysSub2(ppMem, enmType, cb, uAlignment, PhysHighest, GFP_USER);
4398+#ifdef GFP_DMA32
4399+ if (RT_FAILURE(rc))
4400+ /* ZONE_DMA32: 0-4GB */
4401+ rc = rtR0MemObjLinuxAllocPhysSub2(ppMem, enmType, cb, uAlignment, PhysHighest, GFP_DMA32);
4402+#endif
4403+ if (RT_FAILURE(rc))
4404+ /* ZONE_DMA: 0-16MB */
4405+ rc = rtR0MemObjLinuxAllocPhysSub2(ppMem, enmType, cb, uAlignment, PhysHighest, GFP_DMA);
4406+ }
4407+ return rc;
4408+}
4409+
4410+
4411+/**
4412+ * Translates a kernel virtual address to a linux page structure by walking the
4413+ * page tables.
4414+ *
4415+ * @note We do assume that the page tables will not change as we are walking
4416+ * them. This assumption is rather forced by the fact that I could not
4417+ * immediately see any way of preventing this from happening. So, we
4418+ * take some extra care when accessing them.
4419+ *
4420+ * Because of this, we don't want to use this function on memory where
4421+ * attribute changes to nearby pages is likely to cause large pages to
4422+ * be used or split up. So, don't use this for the linear mapping of
4423+ * physical memory.
4424+ *
4425+ * @returns Pointer to the page structur or NULL if it could not be found.
4426+ * @param pv The kernel virtual address.
4427+ */
4428+static struct page *rtR0MemObjLinuxVirtToPage(void *pv)
4429+{
4430+ unsigned long ulAddr = (unsigned long)pv;
4431+ unsigned long pfn;
4432+ struct page *pPage;
4433+ pte_t *pEntry;
4434+ union
4435+ {
4436+ pgd_t Global;
4437+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 11)
4438+ pud_t Upper;
4439+#endif
4440+ pmd_t Middle;
4441+ pte_t Entry;
4442+ } u;
4443+
4444+ /* Should this happen in a situation this code will be called in? And if
4445+ * so, can it change under our feet? See also
4446+ * "Documentation/vm/active_mm.txt" in the kernel sources. */
4447+ if (RT_UNLIKELY(!current->active_mm))
4448+ return NULL;
4449+ u.Global = *pgd_offset(current->active_mm, ulAddr);
4450+ if (RT_UNLIKELY(pgd_none(u.Global)))
4451+ return NULL;
4452+
4453+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 11)
4454+ u.Upper = *pud_offset(&u.Global, ulAddr);
4455+ if (RT_UNLIKELY(pud_none(u.Upper)))
4456+ return NULL;
4457+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 25)
4458+ if (pud_large(u.Upper))
4459+ {
4460+ pPage = pud_page(u.Upper);
4461+ AssertReturn(pPage, NULL);
4462+ pfn = page_to_pfn(pPage); /* doing the safe way... */
4463+ pfn += (ulAddr >> PAGE_SHIFT) & ((UINT32_C(1) << (PUD_SHIFT - PAGE_SHIFT)) - 1);
4464+ return pfn_to_page(pfn);
4465+ }
4466+# endif
4467+
4468+ u.Middle = *pmd_offset(&u.Upper, ulAddr);
4469+#else /* < 2.6.11 */
4470+ u.Middle = *pmd_offset(&u.Global, ulAddr);
4471+#endif /* < 2.6.11 */
4472+ if (RT_UNLIKELY(pmd_none(u.Middle)))
4473+ return NULL;
4474+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 0)
4475+ if (pmd_large(u.Middle))
4476+ {
4477+ pPage = pmd_page(u.Middle);
4478+ AssertReturn(pPage, NULL);
4479+ pfn = page_to_pfn(pPage); /* doing the safe way... */
4480+ pfn += (ulAddr >> PAGE_SHIFT) & ((UINT32_C(1) << (PMD_SHIFT - PAGE_SHIFT)) - 1);
4481+ return pfn_to_page(pfn);
4482+ }
4483+#endif
4484+
4485+/* As usual, RHEL 3 had pte_offset_map earlier. */
4486+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 5, 5) || defined(pte_offset_map)
4487+ pEntry = pte_offset_map(&u.Middle, ulAddr);
4488+#else
4489+ pEntry = pte_offset(&u.Middle, ulAddr);
4490+#endif
4491+ if (RT_UNLIKELY(!pEntry))
4492+ return NULL;
4493+ u.Entry = *pEntry;
4494+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 5, 5) || defined(pte_offset_map)
4495+ pte_unmap(pEntry);
4496+#endif
4497+
4498+ if (RT_UNLIKELY(!pte_present(u.Entry)))
4499+ return NULL;
4500+ return pte_page(u.Entry);
4501+}
4502+
4503+
4504+DECLHIDDEN(int) rtR0MemObjNativeAllocPhys(PPRTR0MEMOBJINTERNAL ppMem, size_t cb, RTHCPHYS PhysHighest, size_t uAlignment)
4505+{
4506+ return rtR0MemObjLinuxAllocPhysSub(ppMem, RTR0MEMOBJTYPE_PHYS, cb, uAlignment, PhysHighest);
4507+}
4508+
4509+
4510+DECLHIDDEN(int) rtR0MemObjNativeAllocPhysNC(PPRTR0MEMOBJINTERNAL ppMem, size_t cb, RTHCPHYS PhysHighest)
4511+{
4512+ return rtR0MemObjLinuxAllocPhysSub(ppMem, RTR0MEMOBJTYPE_PHYS_NC, cb, PAGE_SIZE, PhysHighest);
4513+}
4514+
4515+
4516+DECLHIDDEN(int) rtR0MemObjNativeEnterPhys(PPRTR0MEMOBJINTERNAL ppMem, RTHCPHYS Phys, size_t cb, uint32_t uCachePolicy)
4517+{
4518+ /*
4519+ * All we need to do here is to validate that we can use
4520+ * ioremap on the specified address (32/64-bit dma_addr_t).
4521+ */
4522+ PRTR0MEMOBJLNX pMemLnx;
4523+ dma_addr_t PhysAddr = Phys;
4524+ AssertMsgReturn(PhysAddr == Phys, ("%#llx\n", (unsigned long long)Phys), VERR_ADDRESS_TOO_BIG);
4525+
4526+ pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(sizeof(*pMemLnx), RTR0MEMOBJTYPE_PHYS, NULL, cb);
4527+ if (!pMemLnx)
4528+ return VERR_NO_MEMORY;
4529+
4530+ pMemLnx->Core.u.Phys.PhysBase = PhysAddr;
4531+ pMemLnx->Core.u.Phys.fAllocated = false;
4532+ pMemLnx->Core.u.Phys.uCachePolicy = uCachePolicy;
4533+ Assert(!pMemLnx->cPages);
4534+ *ppMem = &pMemLnx->Core;
4535+ return VINF_SUCCESS;
4536+}
4537+
4538+
4539+DECLHIDDEN(int) rtR0MemObjNativeLockUser(PPRTR0MEMOBJINTERNAL ppMem, RTR3PTR R3Ptr, size_t cb, uint32_t fAccess, RTR0PROCESS R0Process)
4540+{
4541+ const int cPages = cb >> PAGE_SHIFT;
4542+ struct task_struct *pTask = rtR0ProcessToLinuxTask(R0Process);
4543+ struct vm_area_struct **papVMAs;
4544+ PRTR0MEMOBJLNX pMemLnx;
4545+ int rc = VERR_NO_MEMORY;
4546+ NOREF(fAccess);
4547+
4548+ /*
4549+ * Check for valid task and size overflows.
4550+ */
4551+ if (!pTask)
4552+ return VERR_NOT_SUPPORTED;
4553+ if (((size_t)cPages << PAGE_SHIFT) != cb)
4554+ return VERR_OUT_OF_RANGE;
4555+
4556+ /*
4557+ * Allocate the memory object and a temporary buffer for the VMAs.
4558+ */
4559+ pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(RT_OFFSETOF(RTR0MEMOBJLNX, apPages[cPages]), RTR0MEMOBJTYPE_LOCK, (void *)R3Ptr, cb);
4560+ if (!pMemLnx)
4561+ return VERR_NO_MEMORY;
4562+
4563+ papVMAs = (struct vm_area_struct **)RTMemAlloc(sizeof(*papVMAs) * cPages);
4564+ if (papVMAs)
4565+ {
4566+ down_read(&pTask->mm->mmap_sem);
4567+
4568+ /*
4569+ * Get user pages.
4570+ */
4571+ rc = get_user_pages(pTask, /* Task for fault accounting. */
4572+ pTask->mm, /* Whose pages. */
4573+ R3Ptr, /* Where from. */
4574+ cPages, /* How many pages. */
4575+ 1, /* Write to memory. */
4576+ 0, /* force. */
4577+ &pMemLnx->apPages[0], /* Page array. */
4578+ papVMAs); /* vmas */
4579+ if (rc == cPages)
4580+ {
4581+ /*
4582+ * Flush dcache (required?), protect against fork and _really_ pin the page
4583+ * table entries. get_user_pages() will protect against swapping out the
4584+ * pages but it will NOT protect against removing page table entries. This
4585+ * can be achieved with
4586+ * - using mlock / mmap(..., MAP_LOCKED, ...) from userland. This requires
4587+ * an appropriate limit set up with setrlimit(..., RLIMIT_MEMLOCK, ...).
4588+ * Usual Linux distributions support only a limited size of locked pages
4589+ * (e.g. 32KB).
4590+ * - setting the PageReserved bit (as we do in rtR0MemObjLinuxAllocPages()
4591+ * or by
4592+ * - setting the VM_LOCKED flag. This is the same as doing mlock() without
4593+ * a range check.
4594+ */
4595+ /** @todo The Linux fork() protection will require more work if this API
4596+ * is to be used for anything but locking VM pages. */
4597+ while (rc-- > 0)
4598+ {
4599+ flush_dcache_page(pMemLnx->apPages[rc]);
4600+ papVMAs[rc]->vm_flags |= (VM_DONTCOPY | VM_LOCKED);
4601+ }
4602+
4603+ up_read(&pTask->mm->mmap_sem);
4604+
4605+ RTMemFree(papVMAs);
4606+
4607+ pMemLnx->Core.u.Lock.R0Process = R0Process;
4608+ pMemLnx->cPages = cPages;
4609+ Assert(!pMemLnx->fMappedToRing0);
4610+ *ppMem = &pMemLnx->Core;
4611+
4612+ return VINF_SUCCESS;
4613+ }
4614+
4615+ /*
4616+ * Failed - we need to unlock any pages that we succeeded to lock.
4617+ */
4618+ while (rc-- > 0)
4619+ {
4620+ if (!PageReserved(pMemLnx->apPages[rc]))
4621+ SetPageDirty(pMemLnx->apPages[rc]);
4622+ page_cache_release(pMemLnx->apPages[rc]);
4623+ }
4624+
4625+ up_read(&pTask->mm->mmap_sem);
4626+
4627+ RTMemFree(papVMAs);
4628+ rc = VERR_LOCK_FAILED;
4629+ }
4630+
4631+ rtR0MemObjDelete(&pMemLnx->Core);
4632+ return rc;
4633+}
4634+
4635+
4636+DECLHIDDEN(int) rtR0MemObjNativeLockKernel(PPRTR0MEMOBJINTERNAL ppMem, void *pv, size_t cb, uint32_t fAccess)
4637+{
4638+ void *pvLast = (uint8_t *)pv + cb - 1;
4639+ size_t const cPages = cb >> PAGE_SHIFT;
4640+ PRTR0MEMOBJLNX pMemLnx;
4641+ bool fLinearMapping;
4642+ int rc;
4643+ uint8_t *pbPage;
4644+ size_t iPage;
4645+ NOREF(fAccess);
4646+
4647+ if ( !RTR0MemKernelIsValidAddr(pv)
4648+ || !RTR0MemKernelIsValidAddr(pv + cb))
4649+ return VERR_INVALID_PARAMETER;
4650+
4651+ /*
4652+ * The lower part of the kernel memory has a linear mapping between
4653+ * physical and virtual addresses. So we take a short cut here. This is
4654+ * assumed to be the cleanest way to handle those addresses (and the code
4655+ * is well tested, though the test for determining it is not very nice).
4656+ * If we ever decide it isn't we can still remove it.
4657+ */
4658+#if 0
4659+ fLinearMapping = (unsigned long)pvLast < VMALLOC_START;
4660+#else
4661+ fLinearMapping = (unsigned long)pv >= (unsigned long)__va(0)
4662+ && (unsigned long)pvLast < (unsigned long)high_memory;
4663+#endif
4664+
4665+ /*
4666+ * Allocate the memory object.
4667+ */
4668+ pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(RT_OFFSETOF(RTR0MEMOBJLNX, apPages[cPages]), RTR0MEMOBJTYPE_LOCK, pv, cb);
4669+ if (!pMemLnx)
4670+ return VERR_NO_MEMORY;
4671+
4672+ /*
4673+ * Gather the pages.
4674+ * We ASSUME all kernel pages are non-swappable and non-movable.
4675+ */
4676+ rc = VINF_SUCCESS;
4677+ pbPage = (uint8_t *)pvLast;
4678+ iPage = cPages;
4679+ if (!fLinearMapping)
4680+ {
4681+ while (iPage-- > 0)
4682+ {
4683+ struct page *pPage = rtR0MemObjLinuxVirtToPage(pbPage);
4684+ if (RT_UNLIKELY(!pPage))
4685+ {
4686+ rc = VERR_LOCK_FAILED;
4687+ break;
4688+ }
4689+ pMemLnx->apPages[iPage] = pPage;
4690+ pbPage -= PAGE_SIZE;
4691+ }
4692+ }
4693+ else
4694+ {
4695+ while (iPage-- > 0)
4696+ {
4697+ pMemLnx->apPages[iPage] = virt_to_page(pbPage);
4698+ pbPage -= PAGE_SIZE;
4699+ }
4700+ }
4701+ if (RT_SUCCESS(rc))
4702+ {
4703+ /*
4704+ * Complete the memory object and return.
4705+ */
4706+ pMemLnx->Core.u.Lock.R0Process = NIL_RTR0PROCESS;
4707+ pMemLnx->cPages = cPages;
4708+ Assert(!pMemLnx->fMappedToRing0);
4709+ *ppMem = &pMemLnx->Core;
4710+
4711+ return VINF_SUCCESS;
4712+ }
4713+
4714+ rtR0MemObjDelete(&pMemLnx->Core);
4715+ return rc;
4716+}
4717+
4718+
4719+DECLHIDDEN(int) rtR0MemObjNativeReserveKernel(PPRTR0MEMOBJINTERNAL ppMem, void *pvFixed, size_t cb, size_t uAlignment)
4720+{
4721+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
4722+ const size_t cPages = cb >> PAGE_SHIFT;
4723+ struct page *pDummyPage;
4724+ struct page **papPages;
4725+
4726+ /* check for unsupported stuff. */
4727+ AssertMsgReturn(pvFixed == (void *)-1, ("%p\n", pvFixed), VERR_NOT_SUPPORTED);
4728+ if (uAlignment > PAGE_SIZE)
4729+ return VERR_NOT_SUPPORTED;
4730+
4731+ /*
4732+ * Allocate a dummy page and create a page pointer array for vmap such that
4733+ * the dummy page is mapped all over the reserved area.
4734+ */
4735+ pDummyPage = alloc_page(GFP_HIGHUSER);
4736+ if (!pDummyPage)
4737+ return VERR_NO_MEMORY;
4738+ papPages = RTMemAlloc(sizeof(*papPages) * cPages);
4739+ if (papPages)
4740+ {
4741+ void *pv;
4742+ size_t iPage = cPages;
4743+ while (iPage-- > 0)
4744+ papPages[iPage] = pDummyPage;
4745+# ifdef VM_MAP
4746+ pv = vmap(papPages, cPages, VM_MAP, PAGE_KERNEL_RO);
4747+# else
4748+ pv = vmap(papPages, cPages, VM_ALLOC, PAGE_KERNEL_RO);
4749+# endif
4750+ RTMemFree(papPages);
4751+ if (pv)
4752+ {
4753+ PRTR0MEMOBJLNX pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(sizeof(*pMemLnx), RTR0MEMOBJTYPE_RES_VIRT, pv, cb);
4754+ if (pMemLnx)
4755+ {
4756+ pMemLnx->Core.u.ResVirt.R0Process = NIL_RTR0PROCESS;
4757+ pMemLnx->cPages = 1;
4758+ pMemLnx->apPages[0] = pDummyPage;
4759+ *ppMem = &pMemLnx->Core;
4760+ return VINF_SUCCESS;
4761+ }
4762+ vunmap(pv);
4763+ }
4764+ }
4765+ __free_page(pDummyPage);
4766+ return VERR_NO_MEMORY;
4767+
4768+#else /* < 2.4.22 */
4769+ /*
4770+ * Could probably use ioremap here, but the caller is in a better position than us
4771+ * to select some safe physical memory.
4772+ */
4773+ return VERR_NOT_SUPPORTED;
4774+#endif
4775+}
4776+
4777+
4778+/**
4779+ * Worker for rtR0MemObjNativeReserveUser and rtR0MemObjNativerMapUser that creates
4780+ * an empty user space mapping.
4781+ *
4782+ * The caller takes care of acquiring the mmap_sem of the task.
4783+ *
4784+ * @returns Pointer to the mapping.
4785+ * (void *)-1 on failure.
4786+ * @param R3PtrFixed (RTR3PTR)-1 if anywhere, otherwise a specific location.
4787+ * @param cb The size of the mapping.
4788+ * @param uAlignment The alignment of the mapping.
4789+ * @param pTask The Linux task to create this mapping in.
4790+ * @param fProt The RTMEM_PROT_* mask.
4791+ */
4792+static void *rtR0MemObjLinuxDoMmap(RTR3PTR R3PtrFixed, size_t cb, size_t uAlignment, struct task_struct *pTask, unsigned fProt)
4793+{
4794+ unsigned fLnxProt;
4795+ unsigned long ulAddr;
4796+
4797+ /*
4798+ * Convert from IPRT protection to mman.h PROT_ and call do_mmap.
4799+ */
4800+ fProt &= (RTMEM_PROT_NONE | RTMEM_PROT_READ | RTMEM_PROT_WRITE | RTMEM_PROT_EXEC);
4801+ if (fProt == RTMEM_PROT_NONE)
4802+ fLnxProt = PROT_NONE;
4803+ else
4804+ {
4805+ fLnxProt = 0;
4806+ if (fProt & RTMEM_PROT_READ)
4807+ fLnxProt |= PROT_READ;
4808+ if (fProt & RTMEM_PROT_WRITE)
4809+ fLnxProt |= PROT_WRITE;
4810+ if (fProt & RTMEM_PROT_EXEC)
4811+ fLnxProt |= PROT_EXEC;
4812+ }
4813+
4814+ if (R3PtrFixed != (RTR3PTR)-1)
4815+ ulAddr = do_mmap(NULL, R3PtrFixed, cb, fLnxProt, MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, 0);
4816+ else
4817+ {
4818+ ulAddr = do_mmap(NULL, 0, cb, fLnxProt, MAP_SHARED | MAP_ANONYMOUS, 0);
4819+ if ( !(ulAddr & ~PAGE_MASK)
4820+ && (ulAddr & (uAlignment - 1)))
4821+ {
4822+ /** @todo implement uAlignment properly... We'll probably need to make some dummy mappings to fill
4823+ * up alignment gaps. This is of course complicated by fragmentation (which we might have cause
4824+ * ourselves) and further by there begin two mmap strategies (top / bottom). */
4825+ /* For now, just ignore uAlignment requirements... */
4826+ }
4827+ }
4828+ if (ulAddr & ~PAGE_MASK) /* ~PAGE_MASK == PAGE_OFFSET_MASK */
4829+ return (void *)-1;
4830+ return (void *)ulAddr;
4831+}
4832+
4833+
4834+DECLHIDDEN(int) rtR0MemObjNativeReserveUser(PPRTR0MEMOBJINTERNAL ppMem, RTR3PTR R3PtrFixed, size_t cb, size_t uAlignment, RTR0PROCESS R0Process)
4835+{
4836+ PRTR0MEMOBJLNX pMemLnx;
4837+ void *pv;
4838+ struct task_struct *pTask = rtR0ProcessToLinuxTask(R0Process);
4839+ if (!pTask)
4840+ return VERR_NOT_SUPPORTED;
4841+
4842+ /*
4843+ * Check that the specified alignment is supported.
4844+ */
4845+ if (uAlignment > PAGE_SIZE)
4846+ return VERR_NOT_SUPPORTED;
4847+
4848+ /*
4849+ * Let rtR0MemObjLinuxDoMmap do the difficult bits.
4850+ */
4851+ down_write(&pTask->mm->mmap_sem);
4852+ pv = rtR0MemObjLinuxDoMmap(R3PtrFixed, cb, uAlignment, pTask, RTMEM_PROT_NONE);
4853+ up_write(&pTask->mm->mmap_sem);
4854+ if (pv == (void *)-1)
4855+ return VERR_NO_MEMORY;
4856+
4857+ pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(sizeof(*pMemLnx), RTR0MEMOBJTYPE_RES_VIRT, pv, cb);
4858+ if (!pMemLnx)
4859+ {
4860+ down_write(&pTask->mm->mmap_sem);
4861+ MY_DO_MUNMAP(pTask->mm, (unsigned long)pv, cb);
4862+ up_write(&pTask->mm->mmap_sem);
4863+ return VERR_NO_MEMORY;
4864+ }
4865+
4866+ pMemLnx->Core.u.ResVirt.R0Process = R0Process;
4867+ *ppMem = &pMemLnx->Core;
4868+ return VINF_SUCCESS;
4869+}
4870+
4871+
4872+DECLHIDDEN(int) rtR0MemObjNativeMapKernel(PPRTR0MEMOBJINTERNAL ppMem, RTR0MEMOBJ pMemToMap,
4873+ void *pvFixed, size_t uAlignment,
4874+ unsigned fProt, size_t offSub, size_t cbSub)
4875+{
4876+ int rc = VERR_NO_MEMORY;
4877+ PRTR0MEMOBJLNX pMemLnxToMap = (PRTR0MEMOBJLNX)pMemToMap;
4878+ PRTR0MEMOBJLNX pMemLnx;
4879+
4880+ /* Fail if requested to do something we can't. */
4881+ AssertMsgReturn(!offSub && !cbSub, ("%#x %#x\n", offSub, cbSub), VERR_NOT_SUPPORTED);
4882+ AssertMsgReturn(pvFixed == (void *)-1, ("%p\n", pvFixed), VERR_NOT_SUPPORTED);
4883+ if (uAlignment > PAGE_SIZE)
4884+ return VERR_NOT_SUPPORTED;
4885+
4886+ /*
4887+ * Create the IPRT memory object.
4888+ */
4889+ pMemLnx = (PRTR0MEMOBJLNX)rtR0MemObjNew(sizeof(*pMemLnx), RTR0MEMOBJTYPE_MAPPING, NULL, pMemLnxToMap->Core.cb);
4890+ if (pMemLnx)
4891+ {
4892+ if (pMemLnxToMap->cPages)
4893+ {
4894+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 4, 22)
4895+ /*
4896+ * Use vmap - 2.4.22 and later.
4897+ */
4898+ pgprot_t fPg = rtR0MemObjLinuxConvertProt(fProt, true /* kernel */);
4899+# ifdef VM_MAP
4900+ pMemLnx->Core.pv = vmap(&pMemLnxToMap->apPages[0], pMemLnxToMap->cPages, VM_MAP, fPg);
4901+# else
4902+ pMemLnx->Core.pv = vmap(&pMemLnxToMap->apPages[0], pMemLnxToMap->cPages, VM_ALLOC, fPg);
4903+# endif
4904+ if (pMemLnx->Core.pv)
4905+ {
4906+ pMemLnx->fMappedToRing0 = true;
4907+ rc = VINF_SUCCESS;
4908+ }
4909+ else
4910+ rc = VERR_MAP_FAILED;
4911+
4912+#else /* < 2.4.22 */
4913+ /*
4914+ * Only option here is to share mappings if possible and forget about fProt.
4915+ */
4916+ if (rtR0MemObjIsRing3(pMemToMap))
4917+ rc = VERR_NOT_SUPPORTED;
4918+ else
4919+ {
4920+ rc = VINF_SUCCESS;
4921+ if (!pMemLnxToMap->Core.pv)
4922+ rc = rtR0MemObjLinuxVMap(pMemLnxToMap, !!(fProt & RTMEM_PROT_EXEC));
4923+ if (RT_SUCCESS(rc))
4924+ {
4925+ Assert(pMemLnxToMap->Core.pv);
4926+ pMemLnx->Core.pv = pMemLnxToMap->Core.pv;
4927+ }
4928+ }
4929+#endif
4930+ }
4931+ else
4932+ {
4933+ /*
4934+ * MMIO / physical memory.
4935+ */
4936+ Assert(pMemLnxToMap->Core.enmType == RTR0MEMOBJTYPE_PHYS && !pMemLnxToMap->Core.u.Phys.fAllocated);
4937+ pMemLnx->Core.pv = pMemLnxToMap->Core.u.Phys.uCachePolicy == RTMEM_CACHE_POLICY_MMIO
4938+ ? ioremap_nocache(pMemLnxToMap->Core.u.Phys.PhysBase, pMemLnxToMap->Core.cb)
4939+ : ioremap(pMemLnxToMap->Core.u.Phys.PhysBase, pMemLnxToMap->Core.cb);
4940+ if (pMemLnx->Core.pv)
4941+ {
4942+ /** @todo fix protection. */
4943+ rc = VINF_SUCCESS;
4944+ }
4945+ }
4946+ if (RT_SUCCESS(rc))
4947+ {
4948+ pMemLnx->Core.u.Mapping.R0Process = NIL_RTR0PROCESS;
4949+ *ppMem = &pMemLnx->Core;
4950+ return VINF_SUCCESS;
4951+ }
4952+ rtR0MemObjDelete(&pMemLnx->Core);
4953+ }
4954+
4955+ return rc;
4956+}
4957+
4958+
4959+#ifdef VBOX_USE_PAE_HACK
4960+/**
4961+ * Replace the PFN of a PTE with the address of the actual page.
4962+ *
4963+ * The caller maps a reserved dummy page at the address with the desired access
4964+ * and flags.
4965+ *
4966+ * This hack is required for older Linux kernels which don't provide
4967+ * remap_pfn_range().
4968+ *
4969+ * @returns 0 on success, -ENOMEM on failure.
4970+ * @param mm The memory context.
4971+ * @param ulAddr The mapping address.
4972+ * @param Phys The physical address of the page to map.
4973+ */
4974+static int rtR0MemObjLinuxFixPte(struct mm_struct *mm, unsigned long ulAddr, RTHCPHYS Phys)
4975+{
4976+ int rc = -ENOMEM;
4977+ pgd_t *pgd;
4978+
4979+ spin_lock(&mm->page_table_lock);
4980+
4981+ pgd = pgd_offset(mm, ulAddr);
4982+ if (!pgd_none(*pgd) && !pgd_bad(*pgd))
4983+ {
4984+ pmd_t *pmd = pmd_offset(pgd, ulAddr);
4985+ if (!pmd_none(*pmd))
4986+ {
4987+ pte_t *ptep = pte_offset_map(pmd, ulAddr);
4988+ if (ptep)
4989+ {
4990+ pte_t pte = *ptep;
4991+ pte.pte_high &= 0xfff00000;
4992+ pte.pte_high |= ((Phys >> 32) & 0x000fffff);
4993+ pte.pte_low &= 0x00000fff;
4994+ pte.pte_low |= (Phys & 0xfffff000);
4995+ set_pte(ptep, pte);
4996+ pte_unmap(ptep);
4997+ rc = 0;
4998+ }
4999+ }
5000+ }
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches