Parity Pieces: Avoiding the BitTorrent End Game
By Kaetemi on Sunday 14 January 2007, 21:33 - Articles - Permalink
Originally uploaded at http://www.kaetemi.be/bt/paritypieces.pdf in 2007. First published on this blog during archive cleanup in August 2010 to provide some food for thought.
This document is a draft suggesting an extension for the BitTorrent protocol. It is not complete, might contain errors, will be changed, and should for that reason not be implemented yet in non-development releases of clients.
Introduction
Parity is a method of creating redundant data, usually used to recover lost
data. By XORing a few arrays of bits of the same length, for example, you get a
new array of the same length. If you then remove one of the original arrays and
XOR all the other arrays including the newly created array, you will get the
array you removed as result.
Using this system, it is possible to avoid the End Game in BitTorrent, by
downloading parity pieces as soon as possible, which allow you to reconstruct
the last pieces as soon as enough pieces have been downloaded.
Description
A parity piece contains different parity blocks, which correspond to torrent pieces. If the parity piece contains 4 parity blocks, ABCD, and the file contains 23 pieces, the parity blocks would cover the following regions of the file: ABCDABCDABCDABCDABCDABC. In this example region A contains 6 pieces. As soon as all except one of the pieces of a region are downloaded (this would be 5 in region A), the missing piece is created using the corresponding parity block. The created missing block should be checked in the same way as any downloaded block, recreated from the beginning if an on-the-fly creation fails the hash check, and just downloaded when a full calculation from scratch fails the hash check.
Simplified Example
Imagine that you are downloading a file of 9 bytes, each piece being 2 bytes
length, and the parity piece containing 2 parity blocks.
This is the file your are downloading: (AB AB
A, 3 pieces in region A, 2 pieces in region
B)
01000010 01010100 00100000 01010000
01100001 01110010 01101001 01110100
01111001
Region A contains 5 bytes, but 3 pieces of 2 bytes length, so
the parity block would require 6 bytes. The missing bits in this case are
considered to be 0, and the parity block for region A would be
calculated from:
01000010 01010100 01100001 01110010 01111001 00000000
resulting in:
01011010 00100110
Region B contains 4 bytes, 2 pieces of 2 bytes, so the parity block requires
only those 4 bytes. This block is then calculated from:
00100000 01010000 01101001 01110100
resulting in:
01001001 00100100
With this we have a parity piece, which is all these parity blocks together:
(AB)
01011010 00100110 01001001 00100100
Once you have downloaded the first 2 pieces of the file, there is only 1 piece
of region B that you do not have yet. As soon as this happens, you take all the
pieces you have of this region, together with the parity block for region B,
and XOR them, which should result in the missing piece:
01101001 01110100
As soon as either the third or the last piece is downloaded, the same can be
done for region A, and we have the complete file again.
Multi File Support
It is often the case with torrents containing multiple files, that some users wish to download only one of these files, which is possible to do with many clients. In order to allow the use of parity blocks for single files, there are parity pieces for each file. These parity pieces cover all pieces of the file, from the first piece containing the file until and including the last piece containing the file. This means that there can be multiple parity pieces, each containing multiple parity blocks.
Compatibility
When no parity information is specified in the .torrent file, the following
settings must be used: for each file in the torrent, 1 parity piece containing
1 parity block, meaning that each file has it's own and only one region.
A seeder should only announce that it has parity blocks if it knows that the
client supports parity pieces.
Implementation
The amount of parity blocks for a parity piece is recommended to be about
2%-5% of the pieces covered by that parity piece, and is defined in the
.torrent file, which also contains it's hash values. The client should keep and
share the parity pieces if they are valid or if the used .torrent file does not
contain a way to check them.
In order not to kill performance when a missing piece needs to be created, a
client should be calculating the missing piece of each region on the fly, which
can be done by XORing all pieces, as soon as they come in and are validated, to
some place in memory corresponding to their region, and XOR that value with the
parity block corresponding to that region as soon as the missing piece is
needed. Network implementation is to be discussed and specified.

