Tor

Etymology

Onion Routing

At the heart of Tor is onion routing. To achieve privacy, messages are encrypted in multiple layers, much like those of an onion. When this onionized message traverses the network, each intermediary router peels off one layer at a time then forwards the result to the next router, until the final router extracts the core (the original message) which it transfers to the destination.

Tor

Before Tor became known as such, it was a nameless implementation of onion routing. Similar projects started sprouting up, so the original was named The Onion Routing to separate it from the rest¹. Despite Tor’s name stemming from an abbreviation, it is capitalized as Tor (not TOR).

My (Totally Real And True) Conspiracy Theory

Note

Skip this section if you want to learn something

I think (it would be really funny if) Tor and onion routing as a whole were inspired by Shrek. Here’s why:

Shrek, as a character, is a recluse who seeks privacy in the comfort of his swamp; however, he is oft disturbed by a deluge of visiting villagers and inscrutable squatters. Parallels between his character and that of those who seek privacy online are obvious.
Shrek famously stated: “Ogres are like onions. […] We both have layers” ². It is here that privacy and onions form an association, potentially sparking the idea of onion routing in the minds of its creators.
The timelines add up: Shrek the picture book (1990) predates onion routing (1995) ¹ by five years, while Shrek the movie (2001) premiered one year before the deployment of Tor (2002) ¹.

Terminology

Circuit: A path through the Tor network
Relay: A node in a circuit
- Entry: The first relay in a circuit
  - Guard: A small set of trusted entry relays used by a client for a fixed period of time (unless using a bridge)
  - Bridge: A relay that is not publicly listed (and can thus act as an entry without being identified as such)
- Exit: The last relay in a circuit
Relay Cell: The encrypted message at a specific relay
- Create Cell: A cell for creating a circuit, contains the first half of a Diffie-Hellman handshake
- Extend Cell: A cell for extending a circuit; similar to a create cell but also contains the address of the relay to extend to
Onion Key: A public decryption key held by every relay
Onion Sites/Services: Sites/services only available via the Tor network

Background

A Note On Notation

For the sake of this example, I will use a simplified model of the cell structure defined in ³:

| Circuit ID | Command | Body |

For clarity, commands will be referred to by their identifiers ⁴ rather than their true values.

Constructing A Circuit

Suppose Alice wants to create a circuit.

First, she chooses an exit node, followed by a chain of relays that constitute a path ⁵ such that no path constraints are violated ⁶. By doing so, she will have obtained the onion keys and addresses of each relay on the path. For $n$ relays, let ${k_{re l a y_{1}}, k_{re l a y_{2}}, \dots, k_{re l a y_{n}}}$ be the keys for each relay.

Second, Alice establishes a TLS connection with the entry. Then she creates a relay create cell with a unique circuit ID ( $c_{1}$ ) created according to ⁷ and as the payload, the first half of a Diffie-Hellman handshake ( $g^{X}$ ) encrypted using a public key encryption function $E (\cdot)$ with the first relay’s onion key (i.e. $E_{k_{re l a y_{1}}} (g^{X})$ ). This cell is as follows:

| $c_{1}$ | CREATE | $E_{k_{re l a y_{1}}} (g^{X})$ |

Alice sends the cell to the first relay over the TLS channel. The relay decrypts the payload using the corresponding decryption function $D (\cdot)$ , and responds with its half of the key ( $g^{Y}$ ), as well as a hash of the shared key ( $H (g^{X Y})$ ) where $H (\cdot)$ is a cryptographic hash function. This cell is constructed like so:

| $c_{1}$ | CREATED | $g^{Y}$ , $H (g^{X Y})$ |

Once Alice receives the response, both parties will have established the shared key $k_{1} = g^{X Y}$ . Henceforth, the circuit ID and shared key is used for all communications between Alice and relay 1.

To extend the circuit, Alice creates a relay extend cell using the same circuit ID, but with a new handshake part ( $g^{X_{2}}$ ), encrypted with the second relay’s onion key.

| $c_{1}$ | EXTEND | $E_{k_{re l a y_{2}}} (g^{X_{2}})$ |

Alice sends the new cell to the first relay. Relay 1, seeing the cell is an extend cell, copies the payload directly into a relay create cell, but replaces the circuit ID with a new one ( $c_{2}$ ) before sending the cell to relay 2:

| $c_{2}$ | CREATE | $E_{k_{re l a y_{2}}} (g^{X_{2}})$ |

Relay 2 responds to relay 1 with its half of the handshake ( $g^{Y_{2}}$ ) and a hash of the negotiated key ( $H (g^{X_{2} Y_{2}})$ ):

| $c_{2}$ | CREATED | $g^{Y_{2}}$ , $H (g^{X_{2} Y_{2}})$ |

Relay 1 then copies the payload into a new cell with the original circuit ID, and sends it to Alice:

| $c_{1}$ | EXTENDED | $E_{k_{re l a y_{1}}} (g^{X})$ ) |

Now Alice has negotiated a shared key $k_{2} = g^{X_{2} Y_{2}}$ with relay 2 such that relay 1 cannot discover the key.

This process then repeats for all other relays to be added to the circuit. Circuits typically consist of at least three relays, while many onion services use six ³.

Summary

Step Number	Adding Relay 1	Adding Relay 2	…	Adding Relay n
1	Alice → $re l a y_{1}$ : $(c_{1}, E_{k_{re l a y_{1}}} (g^{X}))$	Alice → $re l a y_{1}$ : $(c_{1}, E_{k_{re l a y_{2}} (g^{X_{2}})})$	…	Alice → $re l a y_{1}$ : $(c_{1}, E_{k_{re l a y_{n}} (g^{X_{n}})})$
2	$re l a y_{1}$ → Alice: $(c_{1}, g^{Y}, H (g^{X Y}))$	$re l a y_{1}$ → $re l a y_{2}$ : $(c_{2}, E_{k_{re l a y_{2}} (g^{X_{2}})})$	…	$re l a y_{1}$ → $re l a y_{2}$ : $(c_{2}, E_{k_{re l a y_{2}} (g^{X_{2}})})$
3		$re l a y_{2}$ → $re l a y_{1}$ : $(c_{2}, g^{Y_{2}}, H (g^{X_{2} Y_{2}})$	…	$re l a y_{2}$ → $re l a y_{3}$ : $(c_{3}, E_{k_{re l a y_{3}} (g^{X_{3}})})$
4		$re l a y_{1}$ → Alice: $(c_{1}, g^{Y_{2}}, H (g^{X_{2} Y_{2}})$	…	$re l a y_{3}$ → $re l a y_{4}$ : $(c_{4}, E_{k_{re l a y_{4}} (g^{X_{4}})})$
…			…	…
n-1				$re l a y_{(n - 2)}$ → $re l a y_{(n - 1)}$ : $(c_{n - 1}, E_{k_{re l a y_{n - 1}} (g^{X_{n - 1}})})$
n				$re l a y_{(n - 1)}$ → $re l a y_{n}$ : $(c_{n}, E_{k_{re l a y_{n}} (g^{X_{n}})})$
n+1				$re l a y_{n}$ → $re l a y_{(n - 1)}$ : $(c_{n}, g^{Y_{n}}, H (g^{X_{n} Y_{n}})$
…				…
2n				$re l a y_{1}$ → Alice: $(c_{1}, g^{Y_{n}}, H (g^{X_{n} Y_{n}})$

Using A Circuit

Suppose Alice now has a complete circuit, and thus the keys ${k_{1}, k_{2}, \dots, k_{n}}$ with all $n$ relays. To send a message $M$ through the circuit, she creates a relay cell, $C$ , like so:

$C = E_{k_{1}, k_{2}, \dots, k_{n}} (M) = E_{k_{1}} (E_{k_{2}} (\dots (E_{k_{n}} (M))))$

and sends it to the entry. The entry peels the first layer, illustrated as follows:

$C_{1} = D_{k_{1}} (C) = D_{k_{1}} (E_{k_{1}, k_{2}, \dots, k_{n}} (M)) = D_{k_{1}} (E_{k_{1}} (E_{k_{2}} (\dots (E_{k_{n}} (M))))) = E_{k_{2}} (\dots (E_{k_{n}} (M))) = E_{k_{2}, \dots, k_{n}} (M)$

The entry then sets $C_{1}$ ‘s origin to itself, then sends $C_{1}$ to relay 2. Relays $2, \dots, n - 1$ repeat the same process. Relay $n$ , the exit, finally peels $C_{n} = D_{k_{n}} (E_{k_{n}} (M)) = M$ , and sends $M$ to the destination.

Note

Almost all traffic these days is TLS-encrypted⁸, so the exit does not actually see $M$ itself, but instead, $E_{k_{T L S}} (M)$ . The only information the exit knows is the message’s destination, which is necessary for forwarding the message.

Throughout this process, relay $i$ only knows of relays $i - 1$ and $i + 1$ , hence only the entry knows the sender and only the exit knows the receiver.

Summary

The process of constructing and using a two-hop circuit is visualized in ³ as follows:

Attacks

De-anonymization is an obvious attack on an anonymization network.

Per the construction above, key recovery means breaking Diffie-Hellman (and thus the discrete log problem), and meaningful inter-relay man-in-the-middle attacks require breaking secure cryptosystems; both of which are infeasible. Hence, most de-anonymization techniques focus on weaker links: the traffic before the entry and after the exit, or human error.

1. Traffic Correlation

If an attacker controls a circuit (i.e. controls both entry and exit on the same circuit), they can see both the source and destination of the message.

Circuit Confirmation Attack

Since the relays themselves will not know which circuits they are a part of, an attacker will first have to confirm that both the entry and exit node under their control are part of the same circuit. One way for them to do this is by perturbing packets at the entry in some predictable way, then observing the same pattern at the exit node.

Correlating Traffic

Once an attacker has confirmed their control over a circuit, they must correlate traffic entering the entry relay and exiting the exit relay. This can be achieved via timing attacks or traffic analysis.

2. Website Fingerprinting

A set of methods to uniquely identify destination websites based on metadata and/or patterns in communication traffic observed between the client and entry relay. Packet sequences, lengths, order, timing information, and other seemingly innocuous features can uniquely identify a site.

Examples

kNN⁹ (k-nearest neighbours): Leverages features extracted from packet sequences to distinguish web pages
CUMUL¹⁰ (CUMULative representation): Support vector machine (SVM) using cumulated packet size to represent load behaviour
kFP¹¹ (k-nearest neighbours Finger Printing): Random forests and kNN trained on fingerprints of clearnet traffic between specific web pages in order to classify encrypted traffic
DF¹² (Deep Fingerprint): A high-precision deep Convolutional Neural Network (CNN) classifier

3. Browser Fingerprinting

A method to uniquely identify a user based on their browser and device setup; ex. OS, graphics card, screen dimensions, language, order of fonts installed, HTTP headers, time zone, browser plugins can identify users with 99% accuracy in some cases¹³.

Note

Even if the client makes small changes (installing new fonts, moving time zone, etc.), they are still highly identifiable

4. Canvas Fingerprinting

A method to uniquely identify a user by asking them (their browser, that is) to draw an image on a canvas (hidden in the DOM), then retrieves that image.

Note

This is VERY unique across different computers (anti-aliasing, how they draw colours, etc.)

Defences

1. Traffic Correlation

Use guard nodes, do not choose the same router twice for the same path, do not choose any router in the same family as another router in the same path, do not choose more than one router in a given network range⁶

2. Website Fingerprinting

Website fingerprinting is an active research topic within the Tor community and many defences have been presented over the years, thus there are more than I can conceivably compile here. Instead, I’ll include some techniques that I came across, categorized into two main classes: Randomization and Regularization. Others will be placed into an Other category.

1. Randomization

These defences use randomness such that no two traces from the same webpage have the same pattern.

Adaptive padding¹⁴: introduce dummy packets into traffic to mask traffic bursts and their corresponding features
WTF-PAD¹⁵ (Website Traffic Fingerprinting Protection with Adaptive Defence): a generalization of adaptive padding
FRONT¹⁶: randomize the shape of distributions used for sampling the timing and number of dummy packets added, and place dummy packets near the front of a trace

2. Regularization

These defences fit traces into deterministic patterns. That is, traces from different pages become indistinguishable.

Padding: pad packets such that all have equal size (ex. by making all packets the maximum size)
BuFLO¹⁷ (Buffered Fixed-Length Obfuscation): send packets of a fixed size at fixed intervals, using dummy packets to both fill in and (potentially) extend the transmission
Walkie-Talkie¹⁸: transform packet sequences of monitored sensitive pages and benign non-sensitive pages such that the packet sequences are identical (in terms of timing, length, direction, and ordering)
Tamaraw¹⁹: an extension of BuFLO which sets packet size at 750 bytes rather than the MTU, and treats input and outgoing traffic differently (i.e. outgoing fixed at higher interval)

3. Other

Traffic morphing²⁰: load a web page using a packet size distribution from a different page
Decoy pages²¹: load a decoy page simultaneously with the real page to hide the real packet sequence
TrafficSilver²²: split traffic over several “sub-circuits” (i.e. circuits containing distinct entry nodes) in a random manner
Surakav²³: train a generator that is able to generate various reference traces, then sample reference traces from the trained generator and send bursts of data based on the reference trace

3. Browser Fingerprinting

Give standardized answers for everything (implemented in Tor Browser). For example, always return 1920x1080 for the screen size, UTC for the time zone, and Comic Sans as the only font (jk), etc.

4. Canvas Fingerprinting

Disable canvassing (implemented in Tor Browser)

cbarkr

Explorer

Tor

Etymology

Onion Routing

Tor

My (Totally Real And True) Conspiracy Theory

Terminology

Background

A Note On Notation

Constructing A Circuit

Summary

Using A Circuit

Summary

Attacks

1. Traffic Correlation

Circuit Confirmation Attack

Correlating Traffic

2. Website Fingerprinting

Examples

3. Browser Fingerprinting

4. Canvas Fingerprinting

Defences

1. Traffic Correlation

2. Website Fingerprinting

1. Randomization

2. Regularization

3. Other

3. Browser Fingerprinting

4. Canvas Fingerprinting

Graph View

Table of Contents

Backlinks

cbarkr

Explorer

Tor

Etymology

Onion Routing

Tor

My (Totally Real And True) Conspiracy Theory

Terminology

Background

A Note On Notation

Constructing A Circuit

Summary

Using A Circuit

Summary

Attacks

1. Traffic Correlation

Circuit Confirmation Attack

Correlating Traffic

2. Website Fingerprinting

Examples

3. Browser Fingerprinting

4. Canvas Fingerprinting

Defences

1. Traffic Correlation

2. Website Fingerprinting

1. Randomization

2. Regularization

3. Other

3. Browser Fingerprinting

4. Canvas Fingerprinting

Footnotes

Graph View

Table of Contents

Backlinks