Introduction
P2PHTTP is a system to dynamically maintain at HTTP origin servers lists of mirrors of HTTP content. Currently it is in draft status and we don't know when we'll ever get around to implementing it. By mirror we mean a caching proxy willing to share the content with the world. P2PHTTP adds functionality to HTTP to allow origin servers to dynamically adjust the lists as caching proxies download the content from them (or from other mirrors). This makes it possible for origin servers to forward requests to proxies should they receive more requests than their bandwidth allows them to handle. A trust model is used to quickly detect invalid mirrors and remove them from the lists.
Think of it as a distributed P2P network designed to seamlessly integrate with HTTP. Note, however, that it does not make HTTP fault-tolerant: if the authoritative server fails, clients will be unable to obtain its contents, even if it is available in many caches.
To participate in the P2PHTTP network all you need is to install the P2PHTTP program, which acts as an accelerating proxy for your website or as a caching proxy for your web browsers (or both). In the first case you'll probably want to run your web server on a non-standard port and let the P2PHTTP proxy handle all normal HTTP connections.
Currently P2PHTTP is still in the planning stage. We'll probably implement it as a branch of HTTPEyes soon (as soon as we consider our design solid and stable) and unleash it unto the world.
It should be noted that P2PHTTP works only with cacheable contents. It serves no useful purpose with dynamic contents (unless they are somehow made cacheable).
How it works
P2PHTTP is based on the use of web proxies that talk P2PHTTP, a superset of HTTP 1.1 maintaining full backwards compatibility.
When a standard web client (a browser or a standard proxy) talks to a P2PHTTP server, the response is exactly the same as a standard HTTP server would provide. However, when two P2PHTTP talk to each other, an extended exchange takes place.
P2PHTTP as a caching proxy
Lets first consider the case when P2PHTTP runs as a caching proxy. In this case it can receive requests for any URL from a certain set of trusted addresses: it will act as a standard caching proxy, downloading content, caching it and passing it back to the client.
When a connection to the proxy is established from an untrusted address, P2PHTTP will only accept requests for which the origin server is a P2PHTTP server and the content is cacheable. The purpose of this is to give the trusted addresses full control over the proxy while allowing this P2PHTTP server to act as a mirror of others.
Lets call Clint the web client (the browser or proxy requesting some content), Sergio the P2PHTTP server acting as a caching proxy and Oriana the origin server.
Requests for new content from trusted addresses
Lets, for a moment, assume Clint is one of the trusted addresses and the content can't be found in the cache. Sergio will contact Oriana using HTTP 1.1 and include the following headers in the request (which, according to section 7.1 of RFC 2068, should be ignored by HTTP servers):
- P2PHTTP-Source
- A HOST[:PORT] entry specifying how Sergio can be contacted. If this header is not present, it will default to the source IP address of the current connection. The optional PORT portion defaults to port 80. If Sergio has multiple addresses, it can include this header multiple times.
- P2PHTTP-Lifetime
- A number of seconds. If the proxy does not receive any requests for this content during an interval of this length, it will remove it from its cache. This serves as an inexact estimate of how long the content will be cached: Sergio is allowed to remove it from the cache at any time.
If Oriana is a HTTP server (not a P2PHTTP server), it will reply with the content (or an error code) and nothing special will happen: Sergio caches it and returns it to Clint (or passes the error).
Lets consider the other case, when Oriana talks P2PHTTP. If Oriana is not currently attending many clients, it will return a normal response with the contents to Sergio (just like any server would). If, on the other hand, Oriana finds itself too busy to attend Sergio, it will send a 305 Use Proxy response including multiple Location, a Content-SHA1 and the Content-Length headers.
In both cases, a P2PHTTP-Update-Interval header specifying a number N will be sent in the response: Sergio will never contact Oriana more than once every N seconds with regards to this URL. This is used by Oriana to set a limit on the number of messages it receives from Sergio (for this specific content). This value may be 0 to indicate that no limit should be used.
When a 305 Use Proxy is received by Sergio, it will use the proxies listed in the Location headers to obtain the content. When the Content-Length is bigger than a certain size, Sergio will divide it in multiple parts and request one from each mirror. However, Sergio will start downloading from just one mirror and gradually add more if the total throughput is lower than a threshold set the admin and a limit of maximum concurrent downloads has not been reached.
If one of the mirrors returned by Oriana fails or if Sergio succeeds downloading one of the portions of the content, Sergio sends a request to Oriana with method P2PHTTP. The syntax of this method is similar to that of a GET, but it can include multiple of the following headers:
- Mirror-Err:
- A HOST:PORT portion identifying a mirror from which Sergio has received an error.
- Mirror-Ok:
- A HOST:PORT portion identifying a mirror from which Sergio has successfully downloaded an entire portion of the content from this URL (or the entire content) and where no error has been detected.
The purpose of this is to allow Oriana to adjust the level of trust it places on each mirror.
Requests for new content from untrusted addresses
If a request is received from an untrusted address, Sergio will need to make sure that Oriana is a P2PHTTP server to know whether or not to allow the request.
If Sergio's cache has entries with Oriana as the origin server marked as obtained through P2PHTTP, the request will be allowed. Otherwise, Sergio will send an empty request with the P2PHTTP method (as described above) and check whether or not Oriana returns an error. This will indicate if Oriana is a HTTP server or a P2PHTTP server; the request will be allowed only in the later case.
If Oriana is indeed a P2PHTTP server, Sergio will proceed just as if the request has been received from a trusted address. However, before passing the content back to Clint, Sergio will make sure it is indeed cacheable.
Requests for old content
When Sergio receives a request for content on the cache, its cache lifetime will be renewed.
Sergio will attempt to make sure the content is current by sending a request to Oriana with the If-None-Match header (along with the P2PHTTP headers described above), unless forbidden by the P2PHTTP-Update-Interval header (in which case the content will be considered fresh). The content will then get sent to the client.
Note that in this case it is entirelly irrelevant whether Clint is trusted or not.
P2PHTTP as an origin server
Lets call our P2PHTTP server Sergio and the client Clint.
Structures used by the server
Sergio keeps a list of mirrors. The following information is kept for each:
- A trust value between 0 and 1, indicating the level of trust placed in the mirror.
- The source IP address from the connection in which we learned about the mirror.
- A servers list of HOST:PORT pairs that can be used to connect to the mirror.
- A timestamp of the last communication with the mirror.
- The redirect timestamp of the last time we redirected a client to this mirror.
The purpose of the source value is to only allow one entry from each IP address, thus making it difficult for an attacker to perform a DoS attack by adding multiple invalid mirrors for the content.
For each URL for which Sergio is an origin server, it will keep a list of mirrors that have it on their caches. The following information is kept for each:
- A timestamp indicating the time when the mirror obtained the contents.
- A lifetime estimating for how long the contents will be cached in the mirror.
Sergio will also keep a list of P2PHTTP clients it has redirected to mirrors. Each entry in the list records:
- A timestamp indicating the time when the last request from this client was received.
- The client's IP address.
- A last-report timestamp, of the last time the client sent a request with method P2PHTTP having Mirror-Ok or Mirror-Err headers.
- A list with the timestamps of all the requests for mirrors that this client has issued within the last minute. This is used to make sure that no client can obtain more than a certain number of mirrors for a specific URL every minute.
All these lists start empty: new entries are added as the result of certain requests (see below).
The lists of mirrors and of clients have certain maximum sizes set by the administrator. Once this size is reached, the oldest entry will be removed when a new one needs to be added.
There is also a maximum count for the sum of the lengths of the lists of mirrors for each URL. Once reached, in order to add new mirrors to URLs, P2PHTTP will remove the entries having the lowest values (ctime - timestamp) * 1 / (1 - trust), where ctime is the current time.
Normal content transfers
First lets suppose that Sergio is not very busy right now or that the request does not require content to be transfered (for example, it includes the If-None-Match header and the content does match the hash provided). A server considers itself busy when the sum of throughput of all active downloads is greater than max-normal-bandwidth, a certain admin-specified value.
A new HTTP header, P2PHTTP-Mirrors, is defined: if present, Sergio will act as if it was busy. This can be used by the client to explicitly request mirrors.
If Sergio receives a request for a URL it doesn't master, it acts as a P2PHTTP caching proxy, according to the description provided above.
When the request sent by Clint does not include the P2PHTTP-Lifetime header, Sergio notices that Clint is a standard HTTP client and behaves exactly as any web server would (serving content or returning an error).
Lets consider the opposite situation: Clint has included a P2PHTTP-Lifetime header (and, optionally, a P2PHTTP-Source header). Sergio will:
- Transfer the entire content, if required, to Clint. If a network error is detected, this entire sequence is aborted.
- Add the mirror to the global list of mirrors unless an entry by the same IP address already exists there, in which case it will suffice to refresh its timestamp and make sure the servers match those specified in the current request. The initial trust value will be the average over all current mirrors (or 1/2 if this is the first mirror).
- Add an entry to the list of mirrors for the URL. The lifetime will be max of the value of the P2PHTTP-Lifetime header and a value set by the administrator. If one entry for the mirror already existed, we merely update its lifetime and timestamp.
Busy operation
Lets now consider what happens when Sergio is busy and the request requires content to be transfered. There are a few cases:
If Clint is an HTTP client (not a P2PHTTP client), Sergio will simply send a 305 Use Proxy response. In the Location header it will include the URL for one of the HOST:PORT entries of a mirror.
Note that in the previous case, Sergio will forward a standard HTTP client to an untrusted server. Since Clint is an HTTP client, no security measure can be taken to make sure that the untrusted server does not alter the content. Should this be important to you, you can configure the P2PHTTP proxy to only redirect P2PHTTP, not standard HTTP, clients. In one such setup, Sergio would reply with an error to the HTTP client.
Finally, if Clint is a P2PHTTP client, Sergio will send it a 305 Use Proxy response and will include multiple Location headers that Clint can use. It will also include a Content-SHA1 and Content-Length headers with the SHA1 sum and the size of the content respectively.
We use Content-SHA1 instead of Content-MD5 as, according to RFC 2616, it "is good for detecting accidental modification of the entity-body in transit, but is not proof against malicious attacks". The semantic of the Content-SHA1 header is that it isn't a simple message integrity check but rather plays a vital role in the security of our system.
Selecting mirrors
When Sergio needs to redirect a client to a mirror for a specific URL, it will pick the one from the URL's list having the greatest value seconds / (1 - trust), where seconds is the number of seconds elapsed since the redirect-timestamp.
If the greatest value is higher than a certain threshold, the mirror will be used. Otherwise, Sergio will pick a mirror from the global list of mirrors with the same algorithm (the one having the greatest such value). If this value is still lower than the threshold, an error will be returned (as Sergio doesn't know of any mirrors it can redirect the client to).
The P2PHTTP HTTP Method
When Sergio receives a P2PHTTP method it will modify the trust of the mirrors based on the headers it receives.
Let reports be the total number of reports and seconds be the number of second since the last report performed by this client (or 60, if the client is not found in the list of clients).
For all Mirror-Ok header, set the trust of the specified mirror to:
trust + (1 - trust) * max(60, seconds) / (2 * 60 reports)
For all Mirror-Err headers, set the trust to:
trust * (1 - max(60, seconds) / (2 * 60 * reports))
If at least one of the above headers was included, the client's last-report timestamp is reset.
Why it might work
From a technical standpoint, assuming we get at least a decent number of people using P2PHTTP, it is very likely going to work. HTTP already supports a wide array of features for managing caches and content-expiration. The P2PHTTP protocol merely adds some optional functionality that allows the originating servers to keep a list of all the mirrors, in a manner similar to what BitTorrent does.
Unlike BitTorrent, however, this would integrate seamlessly with HTTP. Note that standard HTTP clients can automatically be pointed to caching proxies and take advantage of the P2PHTTP network (though some incentives for using a P2PHTTP caching proxy still exist).
The P2PHTTP system is at least as reliable as standard HTTP: only when HTTP would fail (when the web server is getting far more hits than its bandwidth allows it to handle) the extensions kick in and save it from dying a horrible death.
End users don't really need to install anything: it suffices for an ISP to install the caching proxy and all its users will already be able to take advantage of the P2PHTTP network.
There is one important assumption, though: that we will get a decent number of people using P2PHTTP. ...
Things to consider
Fault Tolerance
P2PHTTP could be extended to add fault tolerance, to a certain extent.
P2PHTTP still depends on a central server to keep the list of mirrors. Should this server fail, the entire network will be unable to deliver the content, even if there are many mirrors of it.
The only way to achieve this would probably involve a distributed querying system to find where to fetch it from. However, this would very likely not integrate well with HTTP, which would take the idea too far away from the original spirit of P2PHTTP.
We consider functionality for fault-tolerance to P2PHTTP a goal secondary to seamless integration with HTTP. As such, we will only add functionality for fault-tolerance which doesn't require us from extending our protocol beyond our backwards compatibility with HTTP.
Schemes such as using a new method for URL clearly fall beyond the scope of P2PHTTP.
It might be interested to consider designing a distributed search system based on ICP (RFCs 2186 and 2187).
305 Use Proxy is ignored by HTTP clients
Yes, many HTTP clients can't handle the 305 Use Proxy response.
This isn't problematic: at worst, they will receive this response when normally they would receive a 503 Service Unavailable error (or the unofficial but popular 509 Bandwidth Limit Exceeded). So nothing is lost.
However, clients supporting the 305 Use Proxy responses would greatly gain from P2PHTTP.
Frequently Asked Questions
But this requires us to run open proxies!
Not really, it only requires you to allow your proxies to send content already on their cache to any client as well as cacheable content from P2PHTTP servers.
Last update: 2006-12-04 (Rev 9821)