There is currently a proliferation of organizational networked computing systems. Every type of organization, be it a commercial company, a university, a bank, a government agency or a hospital, heavily relies on one or more networks interconnecting multiple computing nodes. Failures of the networked computing system of an organization or even of only a portion of it might cause a significant damage, up to completely shutting down all operations. Additionally, all data of the organization exists somewhere on its networked computing system, including all confidential data comprising its “crown jewels” such as prices, details of customers, purchase orders, employees' salaries, technical formulas, etc. Loss of such data or leaks of such data to outside unauthorized entities might be disastrous for the organization.
As almost all organizational networks are connected to the Internet at least through one computing node, they are subject to attacks by computer hackers or by hostile adversaries. Quite often the newspapers are reporting incidents in which websites crashed, sensitive data was stolen or service to customers was denied, where the failures were the results of hostile penetration into an organization's networked computing system.
As a result, many organizations invest a lot of efforts and costs in preventive means designed to protect their computing networks against potential threats. There are many defensive products offered in the market claiming to provide protection against one or more known modes of attack, and many organizations arm themselves to the teeth with multiple products of this kind.
However, it is difficult to tell how effective such products really are in achieving their stated goals of blocking hostile attacks, and consequently most CISO's (Computer Information Security Officers) will admit (maybe only off the record), that they don't really know how well they can withstand an attack from a given adversary. The only way to really know how strong and secure a system is, is by trying to attack it as a real adversary would. This is known as red-teaming or penetration testing (pen testing, in short), and is a very common approach that is even required by regulation in some developed countries.
Penetration testing requires highly talented people to man the red team. Those people should be familiar with each and every publicly known vulnerability and attacking method and should also have a very good familiarity with networking techniques and multiple operating systems implementations. Such people are hard to find and therefore many organizations give up establishing their own red teams and resort to hiring external expert consultants for carrying out that role (or completely give up penetration testing). But external consultants are expensive and therefore are typically called in only for brief periods separated by long intervals in which no such testing is done. This makes the penetration testing ineffective as vulnerabilities caused by new attacks that appear almost daily are discovered only months after becoming serious threats to the organization.
Additionally, even rich organizations that can afford hiring talented experts as in-house red teams do not achieve good protection. Testing for vulnerabilities of a large network containing many types of computers, operating systems, network routers and other devices is both a very complex and a very tedious process. The process is prone to human errors of missing testing for certain threats or misinterpreting the damages of certain attacks. Also, because a process of full testing against all threats is quite long, the organization might again end with a too long discovery period after a new threat appears.
Because of the above difficulties several vendors are proposing automated penetration testing systems. Such systems automatically discover and report vulnerabilities of a networked system, potential damages that might be caused to the networked system, and potential trajectories of attack that may be employed by an attacker.
Within a computer network, a “broadcast domain” is a logical division of the computer network, in which all network nodes can reach each other by broadcasting at the data link layer. In other words, each network node in a broadcast domain can transmit a data link broadcast message that is addressed to all other network nodes within its broadcast domain, and all those other network nodes in its domain are expected to receive the broadcast message.
As stated above, a broadcast domain is inherently tied to the data link layer, which is layer 2 of the OSI network layers model. This implies, that in terms of current networking technologies, any computers connected to the same Ethernet repeater or layer-2-switch are members of the same broadcast domain. However, layer 3 devices, such as routers and layer-3-switches, form boundaries between broadcast domains.
Multiple broadcast domains may be connected to a larger network through routers or layer-3-switches. Network nodes that are members of the same broadcast domain communicate with each other at layer 2 of the network and address each other using MAC (Media Access Control) addresses. A MAC address of a network node is an identifier assigned to the network interface of the node and is typically stored in hardware.
However, layer 2 messages do not cross boundaries between broadcast domains, and therefore network nodes that are members of different broadcast domains communicate with each other at level 3 of the network and address each other using IP addresses.
Penetration testing systems need to know which network nodes of the tested networked system share a common broadcast domain. This is required in order to correctly assess the effectiveness of certain cyber-attacks which might be used by hostile attackers against the tested networked system.
In order to understand why this is so, the following example examines the well-known ARP Spoofing cyber-attack.
The ARP (Address Resolution Protocol) protocol is a network protocol used for discovering the link layer address associated with a given IPv4 address. Suppose that a first node needs to communicate with a second node in its broadcast domain, but it only knows the IP address of the second node, not its MAC address. As the MAC address is essential for sending a message to the second node, the following sequence of operations will take place:                a. The first node will look into its cached ARP table and search for the known IP address of the second node.        b. If an entry for that IP address is found, the entry contains the MAC address of the second node.        c. Otherwise, the first node will send out an ARP request message. An ARP request message is a layer 2 broadcast message that is received and read by all other nodes in the broadcast domain of the sending node. The ARP request contains the known IP address of the second node, plus both MAC address and IP address of the sending node (the first node in this example).        d. When the second node receives the ARP request message and identifies the IP address included in the message to be its own IP address, it responds by sending out an ARP reply message. An ARP reply message is also a layer 2 message, but unlike the ARP request message, it is a unicast message addressed only to the requesting node and not a broadcast message. The ARP reply message contains the MAC address that is the answer to the request (the MAC address of the second node in this example). Additionally, the ARP reply includes the IP address for which the address mapping was requested (the IP address of the second node in this example) and also both MAC address and IP address of the node requesting the reply, which addresses are taken from the ARP request message.        e. When the first node receives the ARP reply message and identifies it to be a reply for its ARP request, it retrieves the provided MAC address from the message (the MAC address of the second node in this example).        f. The first node then inserts a new entry into its cached ARP table, the entry linking the MAC and IP addresses of the second node to each other.        g. The first node uses the MAC address from the newly cached entry for addressing the second node. The cached entry is kept in the first node for future communication with the second node.        h. Optionally, the second node may also add an entry to its cached ARP table, the entry linking the MAC and IP addresses of the first node to each other.        
The ARP protocol does not include authentication of the ARP reply message, and therefore is vulnerable to a cyber-attack known as ARP Spoofing. In order to understand how such an attack is carried out, an example of a broadcast domain can be described in which the first and second nodes of the above example reside, where the broadcast domain also includes a third node that is already compromised by the attacker. In other words, the third node is under control of the attacker, which can make it behave in ways desirable to the attacker.
When the first node sends out the ARP request message, the message is also received by the third node, as it is a broadcast message. Under the attacker's control, the third node responds to the ARP request before the second node (which is the intended destination of the message) does so. The third node responds to the ARP request with a fake ARP reply message. The fake message is a seemingly valid ARP reply, but with a fake MAC address as an answer. The fake MAC address may be the MAC address of the third node (the node generating the fake message), or the MAC address of another node that is also under the control of the attacker.
When the first node receives the fake reply, it creates an entry in its cached ARP table that associates the IP address of the second node with the fake MAC address. From now on, all communication intended by the first node for the second node will be received by the node having a MAC address equal to the fake MAC address, and not by the second node.
Once ARP Spoofing is successfully employed, it may allow the attacker to intercept data frames on a network, modify the traffic, or stop all traffic to a certain node. Often the attack is used as an opening for other attacks, such as denial-of-service, man-in-the-middle, or session-hijacking attacks. For example, if the second node is a gateway of the broadcast domain, used by the first node when browsing the Internet, the attacker may provide the first node with poisoned web pages that will compromise the first node and bring it under the control of the attacker.
The ARP Spoofing example demonstrates why identifying which nodes share a common broadcast domain is important for a penetration testing system. If the penetration testing system can determine that (i) a first node uses the ARP protocol for finding MAC addresses in its local network, (ii) the first node uses a second node in its local network as a gateway for browsing the Internet, and (iii) there is a third node in the broadcast domain that was already determined to be compromised or already determined to be compromisable during the current penetration testing campaign, then the penetration testing system may correctly conclude that there is a way to compromise the first node.
However, no reliable conclusion can be reached without knowing whether the third node is located in the same broadcast domain as the first node. If the third node is in the same broadcast domain, then ARP Spoofing can be employed by the attacker to direct all communication sent by the first node to the gateway to reach the third node. As the communication includes Internet browsing requests, this can be used in turn to compromise the first node using any known browsing vulnerability. But if the third node is not located in the same broadcast domain as the first node (and there is no other node in the first node's broadcast domain that is already compromised or already known to be compromisable by the attacker), then no ARP Spoofing is possible and consequently no browsing vulnerability can be used against the first node.
Therefore, it is advantageous for a penetration testing system to have a way of finding out which network nodes share a common broadcast domain.
It should be emphasized that the ARP protocol example described above is only one example, and similar examples can be provided for other protocols such as LLMNR (Link-Local Multicast Name Resolution) or NBNS (NetBIOS over TCP/IP Name service). The common denominator for all the above protocols is that they all provide address translation or host name resolution services and operate within a common broadcast domain. However, the problem is not limited to such network protocols and there are other protocols that are similarly vulnerable to hacking by a false reply. By hacking any of those protocols using a false reply message an attacker can redirect network traffic to reach an incorrect destination, and then use this achievement for compromising one or more network nodes of the same broadcast domain.
It should also be emphasized that the above problem is relevant to all kinds of penetration testing systems—actual attack penetration testing systems, simulated penetration testing systems or reconnaissance agent penetration testing systems (see the corresponding definitions in the Definitions section). Obviously, it is relevant when validation of vulnerabilities is achieved by simulation or evaluation, as the evaluation of the applicability of such vulnerability requires knowing whether there is an already compromised node in the broadcast domain under discussion. But even when validation of vulnerabilities is achieved by actual attack of the tested networked system, the problem is still relevant, as without accurately knowing the broadcast domain mapping of the nodes the test might waste much time because of attempting to conduct many unsuccessful attacks.
Prior Art Solutions
The following naïve solution to the above problem is known in the prior art.
Large organizational networks are typically composed of multiple sub-networks, where each sub-network corresponds to a specific portion of the organizational network. Typically, a sub-network corresponds to some physical portion of the organizational network. For example, each floor in a building may be assigned its own sub-network. The individual sub-networks are combined into the full organizational network by level-3 devices such as routers. Consequently, each sub-network becomes a separate broadcast domain.
Each sub-network has its own prefix, which comprises the most significant bits of all the IP addresses that are available for network nodes within the sub-network. Appending 0-bits to the prefix until it reaches the length of an IP address, results in the sub-network ID. For example, a sub-network of an IPv4 network may have a network prefix of 192.168.1.0/24. This means the left-most 24 bits of the specified address (192.168.1 or 11000000.10101000.00000001 in binary notation) are the prefix for the IP addresses of all member nodes. This sub-network has an ID of 192.168.1.0 (11000000.10101000.00000001.00000000), which is the lowest IP address a member node may have.
One can obtain the ID of a sub-network from the IP address of any of its member nodes by ANDing the IP address of the node with the sub-network mask, which is a sequence of 1-bits having the same length as the prefix, followed a sequence of 0-bits that brings the length of the mask to the length of an IP address. For the above example, the sub-network mask is 255.255.255.0 (11111111.11111111.11111111.00000000). Starting from an IP address of 192.168.1.105 (11000000.10101000.00000001.01101001) and ANDing it with the mask, one gets 192.168.1.0 (11000000.10101000.00000001.00000000) as the sub-network ID.
It can be seen, that for every member node of a given sub-network, the result of computing the sub-network ID by the ANDing of the corresponding IP address and the sub-network mask results in the same ID. With the network architecture described above, each sub-network corresponds to a separate broadcast domain, and each broadcast domain corresponds to a separate sub-network. Therefore, it seems that it is possible to determine whether two given network nodes share a common broadcast domain by generating the sub-network ID for both nodes and checking whether the two IDs are equal.
However, there are certain circumstances in which the above naïve solution either is not applicable or does not produce correct results:
A. Two sub-networks that do not share a broadcast domain might nevertheless include overlapping IP addresses. This might happen, for example, when two organizations, each having a sub-network with local IP addresses in the same range, merge into a larger common organization. Instead of going into the trouble of changing IP addresses for one of the sub-networks, an administrator may decide to keep all the existing addresses and avoid conflicts by using NAT (Network Address Translation) for translating IP addresses on the fly.
Applying the naïve solution in such case might result in concluding that a node from the first sub-network and a node from the second sub-network share a common broadcast domain, even though this is not the case.
B. A network node may not have an IP address at all. This might happen, for example, when a dedicated server (e.g. a database server) provides high-bandwidth services to one or more other network nodes using a dedicated fast layer-2 protocol. Lacking an IP address, the naïve solution cannot even be applied in this case. However, in spite of not using level-3 IP addressing, the communication channels of such dedicated server might still be used for compromising its client nodes which do have IP addresses when connecting to the rest of the network.
C. An organizational network might not follow the assumptions described above. For example, one floor which originally was a single broadcast domain may later be split into two separate broadcast domains (without changing IP addresses) by adding a router between two portions of the floor, in order to improve performance when the number of member nodes gets too high. Applying the naïve solution in such case might result in concluding that the floor still constitutes a single broadcast domain, even though this is no longer the case.
The root reason the naïve solution cannot be satisfactory is that we want to find out information at layer 2 of the network, as broadcast domains are inherently layer 2 concepts. However, the naïve solution attempts to achieve the goal using IP addresses, which are inherently level-3 concepts.
Another solution known in the prior art is one in which network nodes exchange dedicated messages between them from which it is possible to deduce whether the communicating nodes share a common broadcast domain or not. By “dedicated” it is meant that the sole purpose of sending such messages is for determining broadcast domain connectivity. However, while such solution may work correctly, it is highly undesirable for penetration testing. The solution might cause two network nodes that under normal conditions never communicate with each other to start communicating, thus triggering alarms by security applications in the networked system.
Therefore, it is desired to have a better solution that allows to reliably determine whether two nodes share a common broadcast domain or not.