Dissecting the Leaf of TrustAfter implementing the group matrices I figured it would be nice to see the group matrix of a much larger group. (If you are a mathematician, the Web of Trust is a large directed graph where the vertices are called "keys" and the edges "signatures". There are four different types of signatures, just think of them as four colors of the edges. The group matrix is the adjacency matrix of this graph. You probably want to take a look at the FAQ, especially the part about MSD.) I generated a PNG image with the keys sorted in MSD order and expected little more than random noise. When I first saw the result I thought I had done something wrong, but a little bit of thinking revealed that the resulting leaf-like shape was perfectly natural, almost unavoidable. Writing my master's thesis on a mathematical departement (but not about this) has its advantages, but I did not manage to find anyone who had seen anything like it. If you know if it has been analyzed before, please send me a mail. Scroll down for some images analyzing the mail addresses of the OpenPGP keys, but let's try to dissect this leaf first. About the imagesAn image of all signatures in the Web of Trust, scaled
61 times. The keys are sorted in MSD order, with the key with
best MSD being the top row and leftmost column. Before we start talking about the technical details of the images, remember that what they really represent is relations between real human beings. They may look good regardless, but keeping in mind that each and every dot (well, almost) is an actual meeting between two humans gives the patterns a new dimension. If you own an OpenPGP key which was available here at the time this was written, then some of these dots were made by you. The Web of Trust information used on this page was taken from the .wot file from 2004-12-10.
A dot in column Since the Web of Trust contained approximately 24000 keys when these images were made (December 2004), most images on this page should really be 24000x24000 pixels large. That is very large and unpractical. If you really want one of these, this is an unscaled image of the whole leaf. Beware that most programs eats something between several hundred MiB and some GiB of memory and then crashes when trying to unpack it, while most web browsers just give up and claim it contains errors. The rest of the images are downscaled when needed to more practical sizes. To avoid suspicion of patterns created by the downscaling, only integer downscalings are made. Since the images would be far too dark if the pixels were just averaged, a floating point average was first calculated, and that value was gamma corrected with a high gamma value. If you click on the image, or on any other image on this page, you will come to a page with some more technical information on the image and a HTML hack that lets you see the name and MSD for individual keys. The links below the images takes you to similar pages with larger images. If you follow a link on the image above, you can easily check that it is sorted in MSD order by moving your mouse over the image while watching your status bar. Order mattersThis too is
an image of all the signatures in the Web of Trust, just like
the one above, but the order of the keys is
random. Not all group matrices look like leaves. The group matrices look very different depending on the order of the keys. There are a large number of ways to arrange 24000 keys in a list. According to Stirling's approximation, the number would have around 94702 digits. A large number indeed. Most key orders gives a group matrix that look like the random one to the right or above. We shall explore some more key orders that comes natural, but much more may be hidden in the other ways to order the keys. The only difference between this random image and the one above is the order of the keys. Because of the lack of other order, one detail stands out - there are both vertical and horizontal colored stripes, but the vertical ones are clearly more visible. This is because a particular human being tends to be somewhat consistent in what signature types she uses, and while signature types on a particular key are also correlated, that correlation is much weaker. The same patterns can be seen on the other images too, but they are much harder to spot amongst the other patterns. Hop metric of the leaf The Leaf of Trust again, but MSDs are marked every
hop with yellow lines and every 1/10th hop with dark cyan
lines. There are two natural ways to measure distances in this graph. Counting the number of keys is an obvious method, but since the keys are sorted by MSD we can also look at the MSD distances, at least for purely vertical or purely horizontal distances. MSD is measured in hops, so this gives us a way to speak of the distance between two lines or two columns as a number of hops. In this graph every integer MSD is marked by a yellow line and cyan lines are placed every 1/10th hop. You can also follow one of the links below the image or on the image itself to (hopefully be able to) explore the distances with your mouse pointer. There are a number of interesting things that can be seen here.
One way to understand the 1-hop line is to pick a key, any
key, and call it Alice's key. If Alice's key has MSD
Alice has a horizontal line that contains signatures on her key and a vertical line that contains signatures she has made. Those lines intersect at a point in the diagonal, Alice's diagonal point. If you follow that line down from the diagonal more than one hop it can not contain any more signatures. If she had signed a key further down, that key would have a higer MSD and be sorted higher up, above the 1-hop line. The fact that keys tend to cluster on and close to the 1-hop line isn't very strange either. Keys will often end up with one particular signature that lowers its MSD considerably, and will therefore get an MSD of just below one hop more than the MSD of the signing key. There will therefore be lots of dots exactly one hop below the diagonal. For similar reasons, and also by plain geometrical symmetry, every point in the 1-hop line is exactly one hop to the left of the diagonal.
The (-1)-hop line is just a mirror image of the 1-hop line.
Remember, the rows and columns of these images represent
human beings, and the dots represents meetings between human
beings. If Alice meets Bob, there is a good chance that Bob
simultaneosly meets Alice. In practice, if Alice signs Bob's
key, Bob will probably sign Alice's key too. If Alice has a
lower MSD than Bob, Alice's signature on Bob's key will be
visible as a dot a certain distance, no more than
I said that Bob will probably sign Alice's key, and in practice, for different reasons, signatures are sometimes one-way. The fact that there are signatures above and to the right of the (-1)-hop line is a proof of this. If all signatures were cross-signatures, the top/right side of the leaf would be an exact (but with possibly different colors) mirror copy of the left/bottom side. Since all signatures are not cross-signatures the leaf is not symmetrical, and there can therefore be signatures further away. For example near the (-2)-hop line, which seems to attract signatures in a similar way as the (-1)-hop line.
The dark crosses cannot be explained with simple theory like
this. We'll have to look deeper into the Web of Trust to
find their cause. It turns out that the two keys 0xB3B2A12C
ct magazine CERTIFICATE <pgpCA@ct.heise.de> and 0xBB1D9F6D
ct magazine CERTIFICATE <pgpCA@ct.heise.de> are
pretty high ranked and have made so many signatures on keys
that would otherwise have a quite high MSD. The keys seems
to be part of an initiative
by a German magazine to promote cryptography and
OpenPGP. Those two keys are the only ones I can find leaving
obvious visible traces in the leaf, so the initiative seems
to be a success or at least have made quite an impact. This
has lead to lots of keys with an MSD Two hopsThe above images just show direct links between keys. It is possible to find longer paths in the unscaled image, but it isn't something one would like to do by hand. Instead, we can look at this image. Instead of showing signatures it shows all paths that are exactly two hops long. (If you are a mathematician, think of the square of an adjacency matrix with only zeroes in the diagonal.) The color is chosen by averaging the signature type of the two signatures and rounding downwards, then selecting color as above. Notice how we now also see the 2-hop line, but just as before everything below that line is black. The fact that the 1-hop and (-1)-hop lines are still visible is interesting. One might wonder if they come from the probable scenario that if Alice and Bob have cross-signed and Bob and Carol have done the same, then there is a good chance that Alice and Carol have also cross-signed. Looking at the image below proves that this is not the whole explanation. In this image only the shortest paths of length two are marked. The 1-hop and (-1)-hop lines are clearly fainter, but still clearly visible. Adjusting to hop metricSince we have both a key metric and a hop metric it makes sense to also draw the images so that one hop instead of one key gets a specific distance in the image. In other words, we stretch the image until the distances between the MSD marks are equal. If we remove the grid this stretched leaf is easily seen, but its shape is quite boring. Since the integer-hop lines always intersects the MSD grid crossings they must now all be straight lines. Notice how we now clearly can see the (-3)-hop line, which was barely guessable before. Faking a web of trustAll the other images on this page are generated from the actual OpenPGP Web of Trust, but it might be interesting to see if we can reproduce the leaf shape with a random Web of Trust. It looks like we can produce something similar. This image was made from a Web of Trust, sorted in MSD order, generated from these rules:
Top 1000As can be seen in the above images, the top left corner is quite dense with signatures. This is simply a zoom of that corner. Unfortunately, it doesn't look very interesting. The hop distance from top to bottom row is less than one hop, which places the 1-hop and (-1)-hop lines, and therefore also the dark side of the leaf, outside the picture. Sorting them by local MSD instead of global MSD doesn't make it less boring, but if we also stretch it like we did before to make the pixel distance proportional to hop distance something happens. The two lighter squares are interesting, especially since they seem to overlap in the middle. (A stretched image with global MSDs look somewhat similar but is more boring.) Using the user identity when sorting
This far we have used no information besides the structure
of the Web of Trust when sorting the keys. But the keys also
have another piece of information attached to them which
discloses some aspects of their owner's relations with each
other, completely independant of the signatures. Each key
has a primary User ID, usually in the form
Please remember that when creating these images I used no
information about domains or anything similar. The sorting
did not treat the Germany is clearly an important part of the Web of Trust, both containing many keys and being dense in signatures. This is probably in large thanks to the magazine initiative found above, but I don't doubt there are more factors.
Notice the very dark border around Germany? The top and left edge says that Germany has very little contact with Canada, which makes sense geographically. However, the right and bottom edge says that the social contact between Germany and Sweden is almost non-existant, despite the mere 70 km between the two countries. In fact, there are only 187 signatures between the countries, of which 19 are to or from my key. Separating top domainsKeys sorted first by top domains and then in
MSD order within the domains, as suggested by Jan-Åke Larsson. Instead of just sorting by either MSD or User ID, we can sort by a mix of them. Unfortunately, to do this we must part somewhat from the dumb and simple sorting policy, but the result is well worth it. Here the keys are sorted by email address like before, but then the top domains are identified and within each top domain the keys are sorted by the MSD those keys has in the whole Web of Trust.
Each leaf along the diagonal has the similar features as the
whole leaf analyzed before, for the same reasons. The 1-hop
and (-1) hop lines are there, and even the (-2)-hop line is
visible on some of the leaves. Why the dark side of the
leaves is still completely dark is not very strange when you
realize that a signature within the top domain limits the
MSD difference to
Notice how the two dark crosses only appear in the
While I was able to predict that the squares along the diagonal would look somewhat similar to the whole leaf before producing the image, the fact that the other squares, which represents signatures between top domains, also share most of the same features is strange. The only visible difference seems to be that the diagonal doesn't need to be straight. .org
We can also look at only the keys with a specific top
domain. Here we have all This image is very similar to the leaf for the whole Web of Trust. But the MSD values these keys are sorted by are not produced in the same way - for the whole Web of Trust we calculated the MSDs in the same set of keys that we graphed. Now the MSDs are calculated in a much larger superset. That does not seem to make much difference.
We can instead calculate distances among just the
These
keys are first sorted by how large set of keys can reach the
particular key and then in local MSD order as above inside
those groups. Since it might be odd that a non-reachable key is sorted first we can make reachability highest priority and sort by MSD only among keys that are identically reachable. If you follow the links you see that it was implemented by rewarding large penalty MSDs for each key that could not reach the key. No key can be reached by all the other, but the lone keys at the top can be reached by slightly more than the large leaf in the middle. Sweden
You don't need a very large set of keys to produce a leaf
shape. These are just the Swedish keys. Or, more accurately,
the keys whose primary User ID has an email address that
ends in
Of course I have to sort the Swedish keys by email address
too. The most visible square is Top domain MSDs used incorrectlyThese are all keys but sorted not by their MSD
in this set, but by their local MSDs in the smaller top domain
sets. This was one of those things I just had to do - sort the keys by their local MSD from only those keys inside their own top domain which can reach them. I have only faint ideas of what this image means, but those Germans are clearly visible here too. Future workSome ideas:
| |
Jörgen Cederlöf <jc+wotsap@lysator.liu.se> |