1、 Internet links operate at high speeds, and past trends predict that these speeds will continue to increase rapidly. Routers and intrusion detection devices that operate at up to OC-768 speeds (40Gb/s) are currently being developed. While the main bottlenecks (e.g., lookups, classification, and qual
2、ity of service) in a traditional router are well understood, what are the corresponding functions that should be hardwired in the brave new world of security and measurement? Ideally, we wish to abstract out functions that are common to several security and measurement application and find efficient
3、 algorithms for these functions, especially algorithms for these functions, especially algorithms with a compact hardware implementation. Toward this goal, this paper isolates and provides solutions for an important problem that occurs in various networking applications;counting the number of active
4、 flows among packets received on a link during a specified period of time. A flow is defined by a set of header fields; two packets belong to distinct flow if they have different values for the specified header fields that define the flow. For example, if we define a flow by a source and destination
5、 IP address, we can count the number of distinct source-destination IP address, we can count the number of distinct source-destination IP address pairs seen on a link over a given time period. Our algorithms measures the number of active flows using a very a small amount of memory that can easily be
6、 stored in on-chip SRAM or even processor registers. By contrast, native algorithms described below would require massive amounts of memory necessitating the use of slow DRAM. For example, a native method to count source-destination pairs would be to keep a counter together with a hash table that st
7、ores all the distinct 64-bit source destination address pairs seen thus far. When a packet arrives with source and destination address pairs say S,D,we search the hash table for S,D; if there is no hash match, the counter is incremented and S,D is add to the hash table. Unfortunately, given that bac
8、kbone links can have up to a million flows today, this native scheme would minimally require 64Mb of high-speed memory. Such large SRAM memory is expensive or not feasible for a modem router. There are more efficient general-purpose algorithms for counting the number of distinct values in a multiset
9、. In this paper, we not only present a general-purpose counting algorithms-multiresolution bitmap-that has better accuracy than the best known prior algorithm, probabilistic counting algorithms, probabilistic counting algorithms that further improve performance by taking advantage of particularities
10、 of the specific counting application. Our adaptive bitmap, using the fact that number of the number of active flows does not change very rapidly, can count the number of active flows does not change very rapidly, can count the number of active flows does not change very rapidly, can count the numbe
11、r of distinct flows on a link that contains anywhere from 0 to 100 million flows with an average error of less than 1% using only 2KB of memory. Our triggered bitmap, which is optimized for running multiple concurrent instances of the counting problem, many of which have small counts, is suitable fo
12、r detecting port scans and uses even less memory than running adaptive bitmap on each instance. A flow is defined by an identifier given by the values of certain header filed. The problem we wish to solve is counting the number of distinct flow identifiers (flow IDs) seen in a specified measurement
13、interval. For example, an intrusion detection system looking for port scans could for each active source address the flows defined by destination IP and suspect any source IP that opens more than three flows in 12s of scanning. Also, while many application define flows at the granularity of TCP conn
14、ection, one may want to use other definition. For example, when detecting DoS attacks we may wish to count the number of distinct source, not the number of TCP connections. Thus, in this paper, we use the term flow in this more generic way. As we have seen, a native solution using a hash table of fl
15、ow IDs is accurate but takes too much memory. In high-speed routers, it is not only the cost of large, fast memories that is a problem but also their power consumption and the board space they take up line cards. Thus, we seek solutions that use a small amount of memory and have high accuracy. We wa
16、nt to find algorithms where these tradeoffs are favorable. Also, since at high speeds the per-packet processing time is limited, it is important that the algorithms use only one or two memory accesses and are simple enough to be implemented in hardware. Why is information about the number of flows u
17、seful? We describe four possible categories of use. Detecting port scans: Intrusion detection system warn of port scans when a source opens too many connection within a given time. They widely deployed Snort intrusion detection system(IDS) uses the native approach of storing a record for each active
18、 connection. This is an obvious waste since most of the connections are not a part scan. Even for actual port scans, if the IDS only reports the number of connections, we do not need to keep a record for each connection. Since the number of sources can be very high, it is describe to find algorithms
19、 that count the number of connections of each source using little memory. Further, if an algorithms can distinguish quickly between suspected port scanners and normal traffic, the IDS need not perform expensive operations (e, g.logging)on most of the traffic, thus becoming more scalable in terms of
20、memory usage and speed. This is particularly important in the context of the recent race to provide wire-speed intrusion detection. Detecting denial of service (DoS) attracks: FlowScan by Plonka in a popular tool for visualizing network traffic. It uses the number of active flows (see Fig.1) to dete
21、ct ongoing denial of service attacks. While this works well at the edge of the network(i.e the link between a large university campus and the rest of the Internet), it does not scale to the core. Also, it relies on massive intermediate data(NetFlow) to compute compact result-could we obtain the usef
22、ul information more directly? Mahajan et al.propose a mechanism that allows backbone routers to limit the effect of(distributed) DoS attacks. While the mechanism assume that these routers can detect an ongoing attack it does not give a concrete algorithms for it. Estan and Varghese present algorithm
23、s that can detect destination address or prefixes that receive large amounts of traffic. To differentiate between legitimate traffic and an attack, we can use the fact that DoS tools use fake source address chosen at random. If for each suspected victim we count the number of sources of packets that
24、 come from some networks known to be sparely populated, a large count is a strong indication that a DoS attack is in progress. General measurement: Counting the number of active connections and the number of connection associated with each source and destination IP address is a part of the CoralReef
25、 traffic analysis suite. Other ways of counting distinct values in given header fields can also provide useful data. One could measure the number of sources using a protocol version or variant to get an accurate image of protocol deployment. Alternatively, by counting the number of connections assoc
26、iated with each of the protocols generating significant traffic, we can compute the average connection length for each protocol, thus getting a better view of its behavior. Dimensioning the various caches in routers( packet classification caches, multicast route caches for source-group(S-G) state, a
27、nd ARP caches) also benefits from prior measurements of typical workload. Estimating the spreading rate of a worm: From August 1 to August 12,2001, while trying to track the Code Red worm, collecting packet headers for Code Red traffic on a/8 network produced 0.5GB per hour of compresses data. To de
28、termine the rate at which the worm was spreading, it was necessary to count the number of distinct Code Red sources passing through the link. This was actually done using a large log and a hash table which was expensive in time and also inaccurate( because of losses in the log). Thus, while counting
29、 the number of flows is usually insufficient by itself , it can provide a useful building block for complex tasks. This paper extends an earlier conference version. The most important additions are a discussion of hardware implementation of the bitmap and probabilistic counting , and a discussion of
30、 more recent related work. The networking problem of counting the number of distinct flows has a well-studied equivalent in the database community: counting the number of distinct base records ( or distinct values of an attribute). Thus, the major piece of related work is a seminal algorithm, probab
31、ilistic counting, due to Flajolet and Martin, introduced in the context of database. We use probabilistic counting as a base against which to compare our algorithms. Whang et al. address the same problem and propose an algorithm equivalent to the simplest algorithms we describe(direct bitmap)。 The i
32、nsight behind probabilistic counting is to compute a metric of how uncommon a certain record is and keep track of the most uncommon records seen. If the algorithms sees very uncommon records, it concludes that the number of records is large. More precisely, for each record, the algorithm computes a
33、hash function that maps it to an L bit string. It then counts the number of consecutive zeroes starting from the least significant position of the hash result and sets the corresponding bit in a bitmap of size L. If the algorithms sees records that hash to values ending in zero, one, and two0s(the first three bits in the bitmap are set, and the rest
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1