Algorithms, simple functions that hover between mathematical problems and computer programs, are everywhere.
Whether dealing with lost packets when using Wi-Fi, or getting your
credit-card details securely to an online store, most consumer
technology couldn't work without some ingenious solutions to common
problems.
And when Google decides to change its search algorithm - as it did with Hummingbird last week, it
can make or break whole companies.
Here are just five which you couldn't live without.
Pagerank – how Google calculates search results
The broad strokes of how Google's search algorithm works have been
public for over fifteen years, though the exact way it organises search
results remains the company's most closely guarded secret.
PageRank is the system at the heart of it all, and the key invention
behind Google's rapid dominance of internet search. Ten years ago, when
competitors relied on human-maintained indexes of webpages, it allowed
the company to assess the value of websites automatically - a massive
advantage as the web grew exponentially.
The algorithm works by looking at every link to and from every page
on the internet. A link to a page is, in effect, a vote for that page's
validity, because it means that someone thought that whatever was on
that page was worth sharing. So the more inbound links a page has, the
higher its PageRank is.
But then it adds a second measure: links from pages which have a high
PageRank themselves confer a higher PageRank. So being linked to by
Stanford, the Google creators' alma mater, is significantly more valuable than being linked to by
Stanford's, the map shop.
On top of PageRank are
myriad tweaks and
adjustments to
improve the results further. Some of those, such as the decision to
punish 'link farms', vast networks of sites which link to each other in
an effort to boost their PageRank , are directly connected to the core
algorithm.
Others, like the company's attempts to leverage what it knows about
users based on their previous searches to deliver personalised results,
were bolted on later.
But as the demands we make of search have got ever more intensive,
the company has been forced to adapt. The most recent update to the
algorithm, Hummingbird is "a new engine built on both existing and new
parts", according to Danny Sullivan of the search blog
Search Engine Land.
Hummingbird was built to deal with the fact that increasing familiarity with search, as well as
the rise of voice control, mean that we now ask Google actual questions, rather than just typing in relevant words.
In the future, new problems will require further tweaks, but the position of PageRank at the heart of it all seems secure.
Public key cryptography - keeping credit card data secure
Public key cryptography is the name for a broad collection of
algorithms which lie at the heart of nearly every form of security
online. Using
what is perhaps best described as 'magic maths', public key cryptography lets people encode data with a key which cannot then decode it.
If Alice has a piece of information which she needs to get to Bob
without anyone else seeing it – maybe a credit card number which she's
using to buy a computer with, or perhaps
evidence of state wrongdoing which she's leaking to a national newspaper – she has to encrypt it.
Way back in history, the only way to do this would be to use a shared
secret: a cipher which both Alice and Bob know, but no-one else does.
That's how encryption all the way up to the second world war worked.
But the obvious problem is that Alice and Bob can't use open channels
to agree on their cipher. That's fine if they can meet in person to
swap codes, but less effective if Alice is a consumer and Bob a
multinational corporation.
Public key cryptography means that Bob can tell the world his public
key, and let them know that anything encoded with that will be readable
by him and only him. Alice sees the public key, locks up her credit card
data using it, and then sends that packet on the way.
Only Bob, using a second, private, key can decrypt the data and read the number.
Unfortunately, doing all of this every time is pretty hard on a
computer's processor, so another step, where both Alice and Bob use
their keys together to generate a shared secret, is frequently added in
most practical uses of public key encryption. That shared secret can
then be used in old-fashioned symmetrical ciphers. But it all comes back
to the public and private keys.
Correcting errors
CDs are temperamental beasts. When they were introduced, they were
hailed as being a resilient replacement for vinyl and cassettes, but as
anyone who has tried to play a
carelessly preserved album found in their car's glovebox can attest to, that's only half true.
Still, it could a lot worse. When you're storing data using
microscopic pits on a sheet of metal covered with a thin layer of
plastic, it's quite a feat to read every single pit correctly, a feat
only compounded by later generations of optical media, which shrink the
pits still further.
Error correction lies at the heart of that reliability, thanks to the
use of a cunning algorithm which lets CDs be readable even with quite a
lot of damage to the data stored on them.
In hugely simplified form, imagine the data on a CD as a grid of 1s and 0s. Like this:
101
010
000
If there's an error reading the CD, one of those 0s may turn into a
1. Without error correction, there's no way to tell if that's happened,
and no way to fix it.
The simplest way to correct errors is to add another load of data.
Each row and column is counted up, and if there's an even number of 1s,
another 1 is added on the end. If there isn't, a 0 is added instead:
1011
0100
0001
000
Now, if one of the 0s is misread, the player can check with the error
correction codes, and tell that there's been a mistake – and even,
assuming there's not too much damage, what the real value should be.
Suppose the bottom right 0 is read as a 1: by reading the error
correction code for that row, the player can tell that there's one too
many 1s. It can then cross-check with the error correction codes for
each column, and spot that there are also too many 1s in one column as
well.
Now the player knows where the error is, and can carry on with those sweet tunes.
Error correction isn't just used by disc drives, though. Nearly every
electronic device which gets data from one place to another will have
some error correction on it, from WiFi to DSL. Even ISBN numbers on the
back of books have error correction: the final digit serves the role.
Protecting passwords
Sometimes, it's really important to check that the file you have been
given is exactly the one you expected. Maybe you are worried it's been
tampered with, or just want to check that a large download finished
without corruption. One way to do that is to look at its hash.
There isn't really one algorithm which can be used to make 'hashes'
of data: any process which can take information and spit out something
which fulfils a few criteria will do.
A good cryptographic hash function will give the same output every
time it's given the same input; the hash will change if the message
changes; it would be nearly impossible to work backwards from the hash
to the message; and it would be nearly impossible for two messages to
have the same hashes.
But there's a much more important use for hashing data than just checking files: password protection.
As is painfully obvious these days,
not many organisations can guarantee they won't lose your data. That's particularly problematic if it's your password they've lost, because — well,
you don't have a different password for every service, do you?
These days, companies shouldn't be keeping passwords in plaintext at
all. Instead, when a user types in their password for the first time,
the site should hash the password, and only keep that.
Every time they come back to log in, it can take another hash, and
compare it to the one on file. If they match, the password's correct.
But now, if the site's hacked, the only thing which gets lost is a table
full of hashes which can't be reverse engineered into passwords, so
everyone is happy.
(Technically, site owners should be salting their hashes - an
oh-so-cute term which means adding a little bit of extra data into the
password hash to prevent it being
reverse engineered.)
Perlin noise: generating landscapes in games
Games are just big bags of algorithms. There's art, music, writing,
direction, design and playtesting too, but a lot of algorithms.
Take procedurally generated terrain, the favoured way of filling in
the vast expanses of games like Minecraft or Dwarf Fortress. It's not
enough just to generate random noise, and apply it to a landscape,
because if you do, you end up with something which is
too random:
all noise, no pattern. Instead, you want terrain which demonstrates the
same fractal nature as the real world, with mountains, hills, boulders
and pebbles all having effects on different scales.
That's what Perlin noise can do. It's a simple enough algorithm:
generate some random noise at a load of different frequencies, smooth it out, and then add them together. But when you do, you go from this:
To this:
Or consider trying to get enemies to take an intelligent route
through a map. The obvious way to do it is easy enough: consider every
possible route, then take the shortest one from A to B. But that's so
computationally intensive that it's unusable in most situations.
Instead algorithms like the A* search can be used.
How it works is tricky to explain (essentially, it finds a path by
always taking the step where the number of steps already taken plus the
number of steps in a straight line to the destination is lowest), but it
is
entrancing in action.
•
In August 2011, a rogue algorithm lost its owners $440m on the stock market before it was eventually shut down.
This article originally appeared on
guardian.co.uk