In order to study the large-scale graph structure of the Web, we performed
the following experiments on 500 million Web pages downloaded in year 2002.
In the experiments, Web pages were considered
as nodes in a graph and the hyperlinks between the pages
were considered as the edges between the nodes.
1. We ran an algorithm that finds all strongly-connected components (SCC)
from the Web graph. As a result, we found that the largest SCC
has 150 million pages and that
the second largest SCC has 10,000 pages.
2. After randomly selecting 10,000 Web pages from the dataset,
we followed links in a breadth-first manner
and measured how many pages we could reach
from each of the pages. For 6,000 pages, we could reach 350 million
pages, and for the remaining 4,000 pages we could reach less than 10,000
pages.
3. We performed similar experiments for the same 10,000 Web pages,
but this time we followed links in the reverse direction.
In these experiments, we could reach 300 million pages from 7,000 pages
and for the remaining 3,000 pages we could reach less than 1,000
pages.
Based on these results, draw the general structure of the Web in as much
detail as you can. In particular, indicate how many pages belong
to each part of the graph structure that you draw.