3 Simple Ways to Make BitTorrent Sites Easier to Search

Wait, why do I care about this?

Especially if you work on a torrent-related website, you’ll want to read this. Why? Well, two reasons:

It’s too hard to find BitTorrent files. Search engines help make it easier for new audiences to try out BitTorrent.
Centralization is bad for the BitTorrent community. Search engines help make it more feasible to distribute many files in many different places.

Furthermore, we know that people want traffic to their site, and not just get a bunch of bandwidth sucked away through direct links to torrents. So we also think it’s important to display the URL of the site hosting it. In TowerSeek.org’s case, it’s prominently added to the title, as well as print the “comment” in the search listing, so you can advertise your site.

We want to make the community better, as well as give you more traffic. Like my momma said, “Respect your friends.”

OK, so how do we make BitTorrent sites easier to search?

There are a couple key methods to make this happen:

Make torrents accessible
Make it easy to get descriptive information
Consolidate torrent links

Let’s cover each one:

#1 Make torrents accessible
This is pretty obvious, but our bots won’t be able to find and index your torrent if they’re blocked through registration pages, or by a robots.txt file. One of my favorite sites, http://bt.etree.org, blocks our MonkeyCrawl bot, so we can’t get the internal filenames within the torrent, nor scrape it to figure out how fast it is. Aargh!

So even if you put it in an out of way place that you submit directly into a search engine, make sure you let bots crawl your site and it isn’t blocked by registration. We need both links and access to the torrent file itself.

#2 Make it easy to get descriptive information
One of the problems with media, as a whole, is that it’s very difficult to get “metadata” about the file. So just examining a torrent, if the name is 100.torrent, and contains files 100.zip and 101.zip, makes it very difficult to discern what the file is. Instead, use the filename to put in rich data, and remember to separate using spaces or dashes so we can select our keywords! Our favorites are things like EFF.Argues.Against.DMCA.Video.DIVX.avi.torrent, since it gives plenty of keywords to match against.

Examples of bad filenames:

100.torrent
blah.torrent
CantTokenizeThisMCHammer.torrent

Make sure you also make the underlying filenames descriptive as well. The more keywords to match, the better. And we prefer spaces, periods, or dashes to underscores. We can also parse out the “comment” field of the torrent, so that’s also an option if you want to have longer text.

Philosophy question: If a torrent is posted on the Internet and no one can find it, does it cause any downloads?

#3 Consolidate torrent links
At least right now, torrents are sparsely located around the Internet. So it’s very very painful to crawl millions of pages, just to find a handful of torrents. Because of this, we’ve built two cool features into TowerSeek.org. The first is the idea of “hotspots” which is that our crawler remembers specific websites and pages that have multiple links to torrents, and then remembers to visit it an a daily basis for more files. The second is that we have an RSS parser that remembers when it sees torrent files embedded as enclosures, and will parse those like any other page, and kudos for LegalTorrents.com for inspiring us to implement it.

Does this work well? In practice, yes. But we also occasionally have seen, in testing, sites that like to put each torrent on a separate page. Then each separate page is linked from the main page. This sucks because we have to descend an extra level, from each hotspot. So put direct links all from one page, and it makes peoples’ jobs much easier.

Another thing people do is put all their torrents in a giant forum, and make each torrent a new message posting. This is also bad, since you don’t want bots running around visiting each page of each thread.

Leave a Reply

Your email address will not be published. Required fields are marked *