 analysis of the strong set in the PGP web of trust

author Henk P. Penning Sun May 5 14:09:08 2019 UTC

definitions

• The PGP web of trust can be viewed as a directed graph where the points are the PGP keys, and the arrows (directed lines) are the signatures.
• If there is a path from key A to key B, the distance from A to B is the length of the shortest path from A to B.
• The strong set is the largest set of keys such that for any two keys in the set, there is a path from one to the other.
• The mean shortest distance (MSD) of a key is the average distance to that key.
• The avarage mean shortest distance (AMSD) is the average of the MSD's of all keys.

basics

The size of the strong set is growing ...

... and dived down in 2004-12 ; a massive cleanup removed a lot of revoked/expired keys. ... and the average distance between the points is shrinking! A possible reason : the average degree (number of signatures per key) is rising. population change

The population of the strong set changes over time, as keys enter and leave the set.

In the graph below, for a given date

• a blue line represents the size of the strong set
• a red line represents the number of keys that disappeared from the strong set
About half of the keys in the first set (april 2003) have disappeared since then. distribution of degree

The plot below shows the distribution of degree (the number of signatures per key).

The distribution hardly changes with time. The fraction of keys with only one signature is remarkably stable at some 25%. The graph below shows last month's distribution of degree (number of signatures per key) using a log/log scale.

• the red points show the raw degree/count scatter graph
• the blue points show the average degree per count (level)
• the smooth green line follows the blue points
The graph is very typical for a social network exhibiting the small world phenomenon, a scale-free network. distribution of distance

The graph below shows the distribution of the distances between all PGP keys in the strong set.

Since the strong set grows, the number of paths grows too. If you look closely, you can see that the most common distance decreases. In april 2003 it was just below 6, now it is creeping to 5.5. normalized distribution of distance

The plot below shows the same data, but the counts are normalized (divided) by the number of paths in the strong set :
#paths = |G| × (|G| - 1), where |G| is the number of points in the graph

The plot is interesting, because it is so boring! The shape of the distribution curves is very stable; over time, the pattern shifts slowly toward smaller distances. key ranking by mean shortest distance

Some people are interested in key ranking ... See the top 50 and top 1000.   Following the footsie idea of Matthew Wilcox, here is the footsie for the strong set : leaving out keys

What happens to the size of the largest strongly connect component (scc) if we delete points ? The graph will break up and the size of the biggest component will go down. How this happens, depends on which points we remove. Below we will see what happens when we remove core, fringe or random points.

The plots below show what happens if we remove k core keys : the keys with low msd's.

• the red line shows the size of the remaining largest strongly connected component
• the green line shows the number of strongly connected components
• the blue line shows (an estimate for) the average mean shortest distance in the largest strongly connected component ; the msd is computed for 10% of the points
To see how the size of the scc (red line) goes to zero, the second plot shows the same data on a logscale.

The graph crumbles gracefully until about a third of the keys are removed. After that, only splinters remain. As expected, the amsd shoots up, to over 20. Note that the top-1000 aren't very special. The crumbling is evenly distributed over the entire range, until there is nothing but graph dust, probably local clusters.  The plots below show what happens if we remove k fringe keys : the keys with high msd's. To get a better view of the number of components (green line), the second plot shows the same data on a logscale.

The graph remains very much intact, all the way down. The number of components remains low and very stable.
At first, the amsd decreases more than average because the the largest connected component becomes 'rounder'.  What happens if we remove random keys ?

The scc of the graph remains big, almost all the way down. The amsd (blue line) remains very stable, between 5 and 6.
This plot shows the fractal nature of the strong set. A random subgraph has the same structure as the entire graph.   