$Id: path-spec.txt,v 1.9 2003/11/12 05:58:13 nickm Exp $ MIX3:path Type III (Mixminion) Path Generation Nick Mathewson Peter Palfrader Status of this Document This draft describes a proposed system for path generation and specification for Type III (Mixminion) clients. It is not a final version. It has not yet been submitted to any standards body. This document supersedes the older "path-spec.txt" document, and also replaces the old Appendix A.2 from "dir-spec.txt". This document refers to concepts introduced in minion-spec.txt and dir-spec.txt; you may want to re-read those before proceeding. Abstract This document describes criteria for path validity, a way to specify paths, and a way to generate paths through the the Type III network. Table of Contents Status of this Document Abstract Table of contents 1. Introduction 2. Path validity 2.1. Terminology 2.2. Determining validity 2.3. Relay compatibility 2.4. Exit compatibility 3. A standard path specification format 3.1. Syntax 3.2. Semantics 3.3. Specification validity 3.4. Generating recommended valid paths 1. Introduction When a Type III client generates a packet or a SURB, it must choose a sequence of Mixes (or "path") through which the packet will travel. While dir-spec.txt describes a mechanism for clients to learn about reliable and trustworthy mixes, it does not provide a mechanism for clients to choose mixes, or to build paths using those mixes. This document contains two parts: first, a description of which paths a client should use; second, a particular user interface and path selection algorithm to generate good paths. The algorithm is the same one implemented in the reference implementation. All software that generates paths SHOULD generate paths that are "valid" and "recommended" as described in section 2 below, unless the user specifically requests otherwise; and SHOULD warn the user whenever an invalid path is requested. Implementations MAY allow users to specify paths of their own. If they do, implementations SHOULD at least warn users who generate paths that would not be generated by the standard algorithm. Implementations that allow path selection SHOULD allow partial path selection as well. 2. Path validity 2.1. Terminology A "path" is a sequence of Type III server descriptors. A path may be either a single undivided sequence (in the case of SURB and reply packet paths); or a sequence divided into two legs (in the case of forward packet paths). A path is "valid" if the following holds: All messages send along a valid path _will_ be delivered if all the mix on the path are running and have capabilities and features as advertised in the most recently downloaded directory. [For example, paths that route messages between incompatible mixes, or that exit via an inappropriate method, are _NOT_ valid.] Validity can be time-dependent: because server descriptors and packet keys have a limited lifetime, every path has a time before which it will not work, and a time after which it will not work. Validity can also depend on exit address and exit message size: some paths are suitable for delivering SMTP messages; others are suitable for delivering large fragmented SMTP messages with server-side reconstruction, and so on. A path is "recommended" if: - It is valid. - Every mix on the path is recommended. - No pair of adjacent mixes on that path is listed as a broken link in the directory. [XXX add this to directories] - No pair of adjacent mixes on that path has the same nickname. Below we describe in more detail how to determine whether a path is "valid" as described above. 2.2. Determining validity For a path to be valid, all of the following must hold: - If the path has two legs, neither leg is empty. - For every pair of adjacent server descriptors on the path (), S1 must describe a mix that can relay packets to S2. [Discussed in 2.3 below.] - The client can relay a packet to the mix described by the first server descriptor on the path. - [XXX describe a maximum length requirement. This will be ugly, since the maximum length of a path depends on the size of the routing info fields along the path.] For a path to be valid from time START through time END, all of the following must hold: - Every server descriptor on the path has a Valid-After date no later than START, and a Valid-After date no later than END. For a path to be valid for delivering packets to a SURB or exit address TARGET, all of the following must hold: - If TARGET is a SURB, then the last server descriptor on the path must describe a server that can relay packets to the first hop of SURB. - Otherwise, the last server descriptor on the path must describe a server that can deliver packets to the exit address. [Discussed in 2.4 below.] 2.3. Relay compatibility In order to determine whether a path is valid, we need to know whether the mixes on that path can relay to one another. We determine this as follows. Given server descriptors SD1 and SD2 for two mixes M1 and M2 respectively, M1 can relay packets to M2 iff all of the following hold: - SD1 has an Outgoing/MMTP section with a recognized version number. - SD2 has an Incoming/MMTP section with a recognized version number. - One of the protocols listed in SD1's Outgoing/MMTP "Protocols" entry is also listed in SD2's Incoming/MMTP "Protocols" entry. [XXX Disregard Allow and Deny? I'm not sure how to make them work now that we use hostname-based delivery. Anyway, they're not implemented yet. -NM] - If SD1's Incoming/MMTP section has an "IP" entry and no "Hostname" entry, then SD2's Incoming/MMTP section has an "IP" entry. [XXX This rule is deprecated, and will go away after 0.0.7, when nobody has an "IP" entry and everybody has a "Hostname" entry. -NM] 2.4. Exit compatibility In order to determine whether a non-reply path is valid, we need to know whether the last mix on the path can deliver to the target address, and whether last mix on the path is willing to deliver messages as large as we intend to send. We determine address compatibility depending on the exit type of the address: - If the message is intended for SMTP delivery, then the descriptor must have a "Delivery/SMTP" section with a recognized version. Furthermore, if the user has specified a "From:" header, the descriptor must have an "Allow-From" entry of "yes". - [XXX note also that if we're using any user-specified headers at all, the exit server must have Version at least 0.0.5. This requirement will go away once there are no more mixes without exit header support. -NM] - If the message is intended for MBOX delivery, then the descriptor must have a "Delivery/MBOX" section with a recognized version. Furthermore, if the user has specified a "From:" header, the descriptor must have an "Allow-From" entry of "yes". - If we are sending a dummy packet with DROP exit type, any exit server is valid. - If there is a different exit type, there may be additional restrictions on allowable exit servers. We determine size-compatibility as follows: - If the message is to be fragmented and reassembled by the server, then exit server descriptor must have a "Delivery/Fragmented" section with a recognized version, and a "Maximum-Fragments" entry no larger than the number of fragments in the message. - If the message is intended for forward delivery via SMTP or MBOX, then the corresponding Delivery/SMTP or Delivery/MBOX section of the descriptor must contain a "Maximum-Size" entry no smaller than the size of the message (in kilobytes, before compression.) 3. A standard path specification format 3.1. Syntax A path specification is string consisting of one or two leg specifications, separated by a single colon. A leg specification is a list of of one or more path components, separated by commas. Each path component specifies either a node by nickname, a single random hop, a given number of random hops, or a normal distributed number of random nodes. Whitespace before and after separators is ignored. The format of nicknames is defined in dir-spec. To specify a single random hop, "?" is used. To specify a number of random nodes "*n" is used, where n is the number of random hops. To specify a normal distributed number of random nodes "~n" is used. Here is a grammar: PathSpec ::= LegSpec | LegSpec OptSpace ":" OptSpace LegSpec LegSpec ::= PathComponent | LegSpec OptSpace "," OptSpace PathComponent PathComponent ::= ByNickname | RandomHop | RandomHops | GaussianHops ByNickname ::= RandomHop ::= "?" RandomHops ::= "*" [0-9]+ GaussianHops ::= "~" [0-9]+ Space ::= " " | "\t" OptSpace ::= | OptSpace Space 3.2. Semantics A path specification may be "satisfied" by zero, one, or more paths. A path satisfies a path specification iff all the following hold: - If the PathSpec has two legs, then: - The path has two legs. - The first leg of the path satisfies the first leg of the PathSpec. - The second leg of the path satisfies the second leg of the PathSpec. - If the PathSpec has one leg, then: - The path has one leg. - The path's leg satisfies the PathSpec's leg. - If the PathSpec has N PathComponents, then the path can partitioned into N subsequences of server descriptors, such that the Nth PathComponent is satisfied by the Nth subsequence of server descriptors. PathComponents are satisfied by sequences of server descriptors as follows: - A ByNickname component is satisfied by a sequence containing a single server descriptor whose nickname is equal to the specified nickname, ignoring case. - A RandomHop component is satisfied by any sequence containing a single server descriptor. - A RandomHops component is satisfied by any N-element sequence of server descriptors, where N is the decimal integer in the RandomHops component. - A GaussianHops component is satisfied by any non-empty sequence of server descriptors. 3.3. Specification validity [This section is helpful information for implementors; it can be inferred from the other sections. Implementations often need to check whether a path specification can be satisfied at all--for example, when validating a user's configuration options. That is to say, they need to know whether there is any valid path that satisfies the PathSpec. In general, a PathSpec is satisfiable by a valid path if: - For every ByNickname component, a server is known with that nickname. - No pair of adjacent ByNickname components are the nicknames of two mixes M1 and M2 such that M1 cannot relay to M2. - If the last component is a ByNickname component, then the Mix with that nickname can indeed deliver messages to the selected exit address. - [XXX Not too long.] [XXX there are additional ways that a PathSpec may be unsatisfiable. For example, we might not know about any descriptors at all. Or we might have "A,?,B" where no server can receive from A and deliver to B. Ignoring that for now.] [XXX Also, mention time constraint.] 3.3. Generating recommended valid paths Clients SHOULD proceed as follows when generating paths. It generates valid paths that satisfy the provided PathSpec. It tries to generate a recommended path if possible. Note that this procedure generates a _set_ of paths. This is necessary for server-side message reassembly, which requires that all of the paths used for a fragmented message have the same exit hop. PROCEDURE: Generate a set of paths Inputs: DIR (The most recent Type III directory) SPEC (A PathSpec) START (The start of the interval over which the path must be valid.) END (The end of the interval over which the path must be valid.) ADDR (The exit address) MSG (The message to deliver) N (The number of paths to generate) Outputs: A valid path. 0. Remove any RandomHops components that are zero hops long (i.e. "*0"). If PATH has only a single component, and that component is a GaussianHops component "~N", and the path(s) to be generated are for forward messages (not replies or SURBs), replace the contents of PATH with two components: "?" and "~M", where M = N-1. (This way, a user can say ~5 for a forward path, and be sure of having at least one hop in each leg as required.) 1. If the last component of PATH is a ByNickname component, then verify that the corresponding server: - Has a descriptor that is valid at least from START to END. - Can deliver messages to ADDR. - Is willing to deliver a message as long as MSG (if the message is forward). - Is willing to de-fragment a message with as many fragments as MSG (if using server-side reassembly). 2. If the last component of SPEC is not a ByNickname component, and if the message calls for server-side fragment reassembly, then pick a last hop at random from among those satisfying the criteria of step 1 above that are also Recommended. (If the second-to-last component of PATH is a ByNickname component, restrict the candidates to those to which the second-to-last server can relay.) 3. Repeat N times, in order to generate N paths: 4. Expand SPEC so that every component is a RandomHop or a ByNickname. We do this as follows: - Expand each "*K" RandomHops component into a sequence of K RandomHop components. - Expand each "~K" GaussianHops component into a sequence of _approximately_ K RandomHop components, choosing from a Gaussian distribution with mean K and standard deviation 1.5. A GaussianHops component expands into at least 1 RandomHop component. Call the resulting expanded path specifier "ESPEC". 5. If we selected a random exit hop in step 2 above, replace the last hop in the path with that node. 6. From last to first, choose a server descriptor to satisfy each component in ESPEC. - For ByNickname components, choose a server descriptor that has corresponding nickname, and that is valid from START to END. - For RandomHop components, choose a server descriptor at random from those that are valid from START to end, that can relay to the next hop in the sequence (which will already have been selected), and to which the previous hop in the sequence (if it will be selected by a ByNickname) can route. If there are restrictions on which hops the client can relay packets to, the first hop is additionally restricted to such paths. 7. If a two-legged forward path is needed, and SPEC is not two-legged, divide the path into two legs at the middle, rounding up the first leg and rounding down the second. ====== [XXX: I think that Allow Deny lines should also allow giving host names since we now have FWD_HOST routing types. Clients should not have to lookup the ip addresses of nodes, in fact I think they SHOULD NOT do a lookup if they don't deliver directly to it. -PP] [XXX: Agreed. But I'd like to call the new ones AllowHost and DenyHost. Or maybe they should take server nicknames instead of hostnames. -NM] [XXX: I don't think that checking if a client can talk to a hop leaks any info. An adversary can always learn that we want to send a message once we actually send it. However an active adversary can easily limit our choice of first hops, so this should receive some consideration. -PP] [XXX: Checking offline is harmless. Checking online is, as you say, potentially unsafe. Right now, Mixminion tries to send the packets it generates -- and when it cannot do so, it queues them and tries them later. The only way to lose packets this way is if the directory says you ought to be able to use server FOO but really you can't. -NM]