blog




  • Essay / Identification and Analysis of Tor Traffic

    IDENTIFICATION AND ANALYSIS OF TRAFFIC Tor is a free software system that allows anonymous communication on the Internet. The Tor network is based on the Onion router network. According to Deng, Qian, Chen, and Su (2017), “Tor is known as the second generation of onion routing, which is currently the most popular and widely used anonymous communication system.” Identifying anonymous traffic plays a vital role in today's world as it helps prevent the misuse of technology. The user's Internet activity cannot be easily traced with the use of the Tor network. User privacy can be well protected with this Tor network. Using Tor, users can browse the Internet and send messages to people without disclosing user details. Therefore, this network helps protect user privacy by sending messages anonymously. Say no to plagiarism. Get a tailor-made essay on “Why violent video games should not be banned”?Get the original essayAccording to Cuzzocrea, Martinelli, Mercaldo, and Vercelli (2017), “Tor is increasingly used for non-legal purposes, c 'that is to say to access censored content'. information, to organize political activities or to circumvent laws against criticism of heads of state. Tor has, for example, been used by criminal enterprises, hacktivism groups, and law enforcement for contradictory purposes. The Tor network consists of a group of operational networks and these are connected by a series of virtual tunnels. The main idea behind Tor's design is to reduce user tracking instead of completely erasing traces. Several machine learning techniques can be applied to learn whether a host is generating Tor-related traffic. The suitability of the technique can also be assessed using this method. According to Oda, Obukata, Yamada, Hiyama, Barolli, and Takizawa (2016), “Compared to other anonymizers, Tor is more popular and has more visibility in the academic and hacker communities. » . Anonymous Tor traffic can also be identified using a method called gravity clustering algorithm. In gravity clustering analysis, each vector in the dataset is considered as an object in the component space. Additionally, objects are moved using gravitational force and the second law of motion. This method automates the process involved in identifying the cluster number. This method could adapt to any unknown network traffic. Gravity clustering analysis provides the best performance for Tor traffic recognition compared to other traditional clustering methods such as Kmeans, EM and DBSCAN. Tor is gradually being used for non-legal activities, that is to say to access censored data, to sort political exercises or to circumvent laws against the returns of heads of state1. Tor has, for example, been used by criminal enterprises, hacktivism gatherings and law enforcement agencies facing communication problems, sometimes simultaneously2; Additionally, US government offices support Tor differently. The Web, especially TCP/IP, was not created anonymously in the beginning. One solution to guarantee anonymity is to organize an overlay that continues to operate over the TCP/IP network. At this point, the overlay network gives control of message routing, now covering host IP addresses. This control results in the obfuscation of IP addresses and thus reinforces anonymity. One of the generally used anonymous system applicationsis the Onion Routing Browser (TOR) created by the TOR project. TOR is a distributed system using low idle networking, including an additional layer of network hopping encryption and creating irregular system paths for each exchange. Customer and server journeys cannot be monitored without traffic analysis. There is no hub on the communication path that can resolve messages sent by a client into those received by the server. However, many scientists are finding that analyzing how TOR works is complex, due to its enhanced security features. A few types of research have been conducted to view the TOR network. Since a live TOR experiment is troublesome because it is not an anticipated and controllable condition. There are a variety of system conditions that can cause tilt, so it's difficult to rehash experiments. Additionally, collecting customer information is tricky because it can reveal security hazards. Alternative methodologies have then been developed, for example using emulation and simulation. The field of research on anonymity innovations began in the mid-1980s with David Chaum's article on untraceable email. Regardless, it was not until 2000 that innovations in anonymity and privacy enhancement began to be considered by a broad research network. In 2004, the underlying plan for a practical relay network called Tor was published. Its low latency makes it exceptionally reasonable for basic Internet communication applications. Tor has now become the best open and anonymous communication service on the Internet. ANALYSISTor was not intended to demolish user data on the site side, but to make it difficult for sites to trace any user activity. This is achieved firstly by encrypting the user's identity and the data they hold, and then creating a pseudo-identity for the user. According to Kiran, Vignesh, Shenoy, Venugopal, Prabhu, and Prasad (2017), “Client obscurity is achieved by routing traffic through three randomly chosen relays viz. Entry Guard Relay, Middle Relay and Exit Relay and providing layered encryption of data at each level.” The selection of these relays is arbitrary and repetitive. Arbitrary, in the choice of any three transfers, whatever their attributes and intermittent, in the occasional determination of another circuit. The packets that need to be sent to the server are encoded three times using the session key each exchanged with the three relays. The packet is then sent and each layer decodes using its own particular session key and then forwards the decoded packet to the next relay. So, when the output relay receives the packet, it transmits it to the server and the server considers the IP address of the output relay as the IP address of the user. Layered decryption at each hop implies the originality of the packet. In order to select relays in a circuit, Tor uses two algorithms. These are: 1) Input guard selection algorithm 2) No-input relay selection algorithm. The first algorithm involves categorizing relays based on their data transfer capabilities, commonly referred to as bandwidths and uptime. The classification parameter was chosen as transfer speed, mainly to improve the speed of Tor circuits. This arbitrary determination was eradicated by organizing the guards to be fast and stable. Fast guards were those whose offered transmission capacity exceeded the bandwidthmedian of all relays, while stable guards were those whose availability was greater than the median availability duration of all relays. Uptime is a measure of stability that characterizes how long a framework runs and is accessible. By using availability as a parameter, it is ensured that an attacker cannot simply create new relays and start getting traffic immediately. According to the algorithm, an entrance guard must be fast and stable. Although this change made the circuits stable, it compromised the anonymity of the guards at the entrance since only a few specific relays were currently qualified to serve as guards at the entrance. Furthermore, the periodicity of choosing another circuit was hampered when the condition that another guard could be chosen just when the old one was inaccessible was implemented. Those who were inaccessible were dropped and resigned. In more ways than one, the determination of the guards at the entrance was confined to a select group. The second algorithm aims to improve the anonymity factor of people who do not return to guards. He acknowledged that the main algorithm failed in this regard. Therefore, the entire system of selecting the best relays was abandoned and new selection criteria were attested. Consistency in the choice of relays was of paramount importance. This algorithm ensured that fast and stable relays were not the main relays chosen, but that they were chosen all the more frequently. Emphasis was placed on the choice of relays deemed stable. Additionally, Tor qualifies a few ports as long-lived and if traffic passing through a path uses one of these long-lived ports, Tor will improve stability by eliminating the list of accessible routers to those that are considered stable. Onion routing is performed by encryption in the application layer of a protocol stack, installed like the layers of an onion. Tor encodes information, including the next destination IP address, and sends it through a virtual circuit involving arbitrarily chosen Tor relays. Each relay decodes a layer of encryption to discover only the next relay in the circuit keeping in mind the end goal of passing the rest of the encoded information to it. The final relay decodes the deepest layer of encryption and sends the original information to its destination without discovering the source IP address. Since the direction of communication is somewhat obscured at each hop in the Tor circuit, this technique eliminates any single point where communication peers can be determined. According to Johnson, McLaughlin, and Thompson (2010), “Tor is an overlay protocol and uses an underlying Transmission Control Protocol (TCP)/Internet Protocol (IP) layer to manage the transport, delivery, and routing of data ". The small amount of centralized control that exists in any Tor network comes from central registry servers. These maintain the state of the system and collect and examine information, for example, about which nodes it is reasonable to use as exit nodes, their availability and any data transfer capacity limitations imposed by the node administrators. This data allows Tor to make a decision for a particular connection based on user needs. Traffic to and from an index server uses an alternate port than payload traffic and can be effortlessly isolated. There are three types of nodes typically found in a Tor network. Exit nodes – which send unencrypted activity to its destination. Input nodes - which recognize uncoded movements,..