Skip to main content

Build Your Own BitTorrent Client

This challenge is to build your own BitTorrent client. A BitTorrent client is an application that enables you to download and share files using the BitTorrent protocol. 

What Are The Benefits Of The BitTorrent Protocol For Downloads?

When downloading a file using the BitTorrent protocol, instead of downloading a file from a single server, the client downloads small pieces of the file from multiple other clients (called peers) at once. The peers who are sharing those pieces are also downloading other pieces from other peers. This peer-to-peer method distributes the workload, often leading to faster download speeds and reduced strain on a single server.

Whilst BitTorrent gamed some notoriety for it’s use to share pirated content, it has many other entirely legal uses. For example, BitTorrent is used to provide faster and more efficient downloading of large files, such as software, videos, and game updates It’s used by the likes of gaming companies such as Blizzard Entertainment and Wargaming; broadcasting organisations such as CBC, governments such as the British Government and big tech firms like Meta and Twitter.

BitTorrent was developed by Bram Cohen in 2001 and first shared at CodeCon conference in 2002. The first client was written in Python. He chose Python because he viewed it as the best language to just get stuff done. For something like BitTorrent, which is inherently IO-bound that makes a lot of sense.

The Challenge - Building A BitTorrent Client

In this Coding Challenge the goal is to build a BitTorrent client and then use it to download a file. What file you chose to download is up to you, but a good starting point is probably your favourite Linux distribution or open source software. Many of them are available via BitTorrent. For example I tested mine against the Libre Office and Debian torrents.

Step Zero

Like all good programming languages Coding Challenges is zero indexed! In this step you’re going to set your environment up ready to begin developing and testing your solution.

We’re building a BitTorrent client in this project. It gets a little confusing when talking about BitTorrent, because we normally talk about network clients (i.e. a web browser) and servers (i.e. a web server), but BitTorrent is a peer-to-peer protocol, so there aren’t clients or servers there are peers. So when I use client in this project I’m meaning the software you are running locally on your machine to use to connect to the peer-to-peer network to download a torrent from it. That client will also be a peer in the network.

Except there are also trackers, which are servers and our client connects to them. Confused yet? 😀

Once you’ve done that, it’s worth grabbing a cup of your favourite beverage and reading through the BitTorrent specification.

Step 1

In this step your goal is to write the code to encode and decode Bencoding. Bencoding is the encoding used in the BitTorrent protocol for torrent files and other related metadata.

As this functionality will be embedded within the application my approach is to use Test-Driven Development (TDD) to create the Encode and Decode functions in a Bencode module. I find TDD very useful for writing code to handle network protocols and file formats. If you’d like to know more about how I do TDD hit reply and let me know! Thanks.

Here’s the specification for Bencoding:

  • Strings are length prefixed using base 10 strings. The length is separated from the string by a colon. For example the string ‘coding’ would be encoded as 6:coding.
  • Integers are prefixed with the character ‘i’. The prefix is followed by a number in base 10 which is terminated with the character ‘e’. For example the integer 100 would be encoded as i100e.
  • Lists are prefixed with the character ‘l’. The prefix is followed by a list of elements. The elements are other Bencoded values. The list is terminated with the character ‘e’. For example the list ["Coding", "Challenges"] would be encoded as l6:Coding10:Challengese.
  • Dictionaries are prefixed with the character ‘d’. The prefix is followed by a string of Bencoded key value pairs where the key is a Bencoded string and the value is a Bencoded element. The keys must be in lexicographical order. For example the dictionary: {"Coding Challenges": {"website:": "codingchallenges.fyi", "Rating": "Awesome"}} would be encoded as d17:Coding Challengesd6:Rating7:Awesome8:website:20:codingchallenges.fyiee .

Step 2

In this step your goal is to create a program that will open a torrent file passed as an argument. The torrent file could be specified using a local file path or URL. If it’s a URL, download the file and open it, otherwise just open it.

Once opened the torrent file should be decoded using your Bencoding Decode function. Your focus for this step is to extract the URL from the announce key in the dictionary and the dictionary of values under the info key. You can find details of the torrent file format in Wikipedia, as well as the The BitTorrent Protocol Specification.

Step 3

In this step your goal is to contact the tracker and get a list of peers from it. To do that you’ll need to assemble a valid HTTP GET request and make it. The response body will contain a list of peers that can be contacted to download the file(s) that make up the torrent.

The GET request is composed of the URL in the announce key from step one, followed by and list of key value pairs:

  • info_hash - the 20 byte SHA-1 hash of the info dictionary, URL encoded.
  • peer_id - a 20 byte string identifying this peer, usually composed of a string in the format such as the byte: -ES1001- encoding the client software and version followed by 12 random bytes for the specific instance, URL encoded.
  • port - the client’s port that is listening for incoming peer requests, commonly used values are in the range 6881 to 6889.
  • uploaded - the number of bytes uploaded, for now set to zero.
  • downloaded - the number of bytes downloaded, for now set to zero.
  • left - the number of bytes left to download, set to the file size taken from the length key in the info dictionary.
  • compact - whether to send the compact peer list or not. Set this to 1.

If your request is correct you’ll get a HTTP status code of 200 and the response body that consists of a string that encodes a list of peers. There are six bytes per peer. The first four bytes are the host (in network byte order), the last two bytes are the port, again in network byte order.

Decode the response and print out the peer list as IP and port to complete this step. For my solution that looked like this:

% ./ccbt test.torrent
ccBitTorrent
Announce URL: http://tracker.documentfoundation.org:6969/announce?compact=1&downloaded=0&info_hash=j%90%1C%9F2%BA%80%AC%27%0E%D5%C1%DD%F6%18%3A%2B%D5l%FE&left=300224677&peer_id=-CC0101-%7F0%9C%AD%81c%0Et%E2%83%CF%F9&port=6881&uploaded=0
200 OK
Got 37 peers
Peer 0 is ip: 45.226.6.202 port: 61680
Peer 1 is ip: 185.198.240.233 port: 11188
Peer 2 is ip: 163.252.219.175 port: 11188
...

Step 4

In this step your goal is to build the encode and decode for the peer protocol. You’ll find the specification of the peer protocol in the BitTorrent specification, under the heading: peer protocol.

There are three broad types of message:

  1. Handshake.
  2. Keep-Alive.
  3. Length-prefixed messages.

Handshake

The handshake must be the first message transmitted by the client. It is 49 plus the length of the protocol string bytes long.

The handshake message is composed as so:

<protocol string length><protocol string><reserved><info_hash><peer_id>

Where:

  • protocol string length: string length of <protocol string>, as a single raw byte.
  • protocol string: string identifier of the protocol.
  • reserved: eight (8) reserved bytes. All current implementations use all zeroes.
  • info_hash: 20-byte SHA1 hash of the info key in the metainfo file. This is the same info_hash that is transmitted in tracker requests.
  • peer_id: 20-byte string used as a unique ID for the client. This is the same peer_id that is transmitted in tracker requests.

In version 1.0 of the BitTorrent protocol, the protocol string length is 19, and protocol string is "BitTorrent protocol".

Keep-Alive

Probably the simplest message to send and receive. Keep-alive is sent as a length prefixed message where the length is zero. The length is encoded in four bytes.

Length-Prefixed Messages

The remaining messages in the peer protocol take the form:

<length prefix><message ID><payload>

The length prefix is a four byte value in network byte order. The message ID is a single decimal byte. The payload is message dependent.

The message ID is one of:

  • 0 - choke
  • 1 - unchoke
  • 2 - interested
  • 3 - not interested
  • 4 - have
  • 5 - bitfield
  • 6 - request
  • 7 - piece
  • 8 - cancel

For the sake of brevity and to avoid repetition I won’t detail the payload values, please see the specification for them and ensure you pay particular attention to: bitfield, request, and piece which will be key to downloading a file.

Again, this is something I’d tackle with TDD, you might prefer to write your tests after development, or not at all. My only suggestion is you experiment with automated tests if they’re new to you, projects like this are a great place to try out new things.

Step 5

In this step your goal is to connect to all the peers, perform the handshake and get their bitfield. You probably want to do this concurrently to maximise the download speed. The bitfield of a peer tells you what pieces of a file it has. You’ll never guess how it’s encoded!

Yes, you’re right, it’s a bitfield, I wonder what gave it away? 😇

In the bitfield each byte refers to eight of the pieces and each bit within the byte refers to a single piece. So the high bit of the first byte refers to piece zero and the high bit of the second byte to piece 8 and so on. If a bit is set, the peer has that piece, if it is not set, the peer does not have the piece.

Once you have the bitfield store it for each peer, you’ll need it to know whether or not the peer can provide you with that piece. To develop this I used TDD and then tested the handshake and bitfield against a live tracker and peers. I dumped out the bitfield for each tracker to file. The file looked like this:

Peer: 1
Has piece 0
Has piece 1
Has piece 3
Has piece 4

Here the peer doesn’t have piece 2. In reality I found most peers had the low numbered pieces, but I figured you didn’t want to read several thousand lines to see that!

Step 6

In this step your goal is to create a mechanise to track all the pieces. You need to know what they are and whether or not you have them yet or still need to download them. You could simply stick the numbers zero to number of pieces in an array of bools and work from that or as you’re probably building something that’s going to run multiple concurrent peer connections use a queue (suitable protected for concurrent access of course).

If you use a queue, put details of each piece into the queue, have each worker (one worker per peer) take and piece from the queue and attempt to fetch it from its peer. There are several reasons this might fail:

  1. The peer doesn’t have the piece. Do be a good citizen and check the peer’s bitfield before asking for it.
  2. The peer refuses to provide it / the connection is lost.
  3. The peer provides it and the hash doesn’t match.

In each case, with a queue you can simply put the work back on the queue and fetch the next piece to be download from the queue.

In point 3 above I mentioned the hash. In step 2 we decoded the torrent file, in it there is a key pieces within the info dictionary. The value stored under that key is a string of hashes, one for each piece. Each hash is 20 bytes long and corresponds to the equivalent indexed piece. Once downloaded a piece should have its SHA-1 hashed and should then rejected if the hash does not match the hash from the torrent file. You’ll need to implement this in the following step.

Step 7

In this step your goal is to download the pieces, to do so you will need to send request messages and handle all the responses and other messages a peer sends you (see the possibilities in step 4).

Having established the handshake the next step is to unchoke a peer and send an interested message. After that send it requests for pieces.

When a BitTorrent client send a request for a piece it wants from a peer, it uses the request message Id specifying the index of the piece and the zero based offset within the piece and number of bytes that it wants. In other words the request is for a small part of the piece, known as a block. The specification has changed for block size limits, with some debate over what it now is, so a safe bet is 16KB. However some peers might allow more.

The peer will respond with a piece message containing the data, make sure you handle that message and store the block ready to reassemble the blocks into the piece. Be sure to handle the keep-alive and have messages, updating the bitfield when a peer tells you it now has a new piece.

Once you have reassembled a piece from it’s blocks, calculate the SHA-1 hash for the piece and verify it matches the hash extracted from the torrent file.

To complete this step print out the id of each piece you successfully download whose hash matches.

Step 8

In this step your goal is to assemble the file, save it and shutdown the client. You’ve basically done the hard work now so all that’s left is to ensure your client runs until all the pieces have been downloaded.

When you get the pieces write them to the target file. The target filename was provided in the info dictionary of the torrent file. Each piece will be written to the file in an offset based on it’s id and the piece length.

Once you’ve done all that, congratulations, you’ve built a working BitTorrent client and used it to download a file.

Go install/enjoy your new Linux distro, open source or whatever it was you downloaded.

Going Further

If you’ve enjoyed this Coding Challenge and want to take it further, here are some ideas to explore.

  • Add support for seeding the file to others.
  • Add support for the Magnet URL Scheme.
  • Build a tracker.
  • Build a long running client that can peer on multiple files at a time.

Help Others by Sharing Your Solutions!

If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know - ping me a message on the Discord Server, via Twitter or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo.

Get The Challenges By Email

If you would like to receive the coding challenges by email, you can subscribe to the weekly newsletter on SubStack here: