Home » Evolution of the Web from 2000 to 2007 – average web object size quintuples since 2000

Evolution of the Web from 2000 to 2007 – average web object size quintuples since 2000

Summary:

In a comparative survey of data traces from 2000 and 2007, University of Twente researchers found that the nature of the Web has changed from a static one-way medium to a dynamic platform for interactive services such as photo and video sharing portals.

The Web has changed dramatically over the past seven years. During that time the Web has moved from a static one-way medium toward a dynamic platform for interactive services such as photo and video sharing portals. In a comparative survey of data traces served over the Web from 2000 and 2007, University of Twente researchers found that the nature of web sites has changed (Sadre and Haverkort, 2008).

Where in 2000 nearly all traffic volume was caused by traditional HTML formats of images and HTML, in 2007 nearly all the traffic volume was made up of videos and binary software updates and downloads. Due in this shift toward larger videos clips and binary downloads the mean response size has increased from 12,294 to 68,275 bytes, a factor of 5.5 times larger (see Figure 1). The caching efficiency of the Web has decreased while dynamic responses have increased from 21.2% to 37.1%. These changes show how the Web has shifted from static information to dynamic rich media.

mean object size growth

Figure 1: Growth of the Mean Object Size

Mean Web Object Size Quintuples Since 2000

Table 1 shows how response size statistics have changed between the 2000 and 2007 traces. The mean response size grew 455% from 12,294 bytes to 68,275.2 bytes, or some 5.5 times larger. The median response size grew from 2,410 bytes to 2,780 bytes. Both traces show a heavy-tailed distribution, with the 2007 even more heavily tailed. This effect can be seen in the order of magnitude increase in the Squared Coefficient of Variation of the response size distribution (SCV) from 320.9 to 3,425, as well as an increase in the largest documents observed. The largest document observed in a 2004 trace had a size of 193 Mbytes, whereas the largest file in the 2007 trace had a size of around 2 Gbytes showing that the 2007 trace was even more heavy tailed with a much larger distribution of larger objects.

Table 1. Response size statistics (in bytes)
2000 trace 2007 trace
min 17 85
max 0.228 · 109 2.147 · 109
mean 12294.0 68275.2
median 2410 2780
SCV 320.9 3425.1

Popular Object Type Evolution

Table 2 shows the diversification of object distribution over the years. In 2007 the top 5 object types accounted for 88.2% of all requests, while fully 98.1% of requests were accounted for by the top five types in 2000. Table 2 also shows the increased popularity of the JPEG and PNG formats away from the GIF format. Only .2% of all requests were for JPEGs in 2000 while one-third (33.3%) of all requests were for JPEGs in 2007. Table 2 also shows a shift towards more interactive applications with an increase in the use of JavaScript from 1.1% of all requests in 2000 to 6.9% in 2007. CSS use is also on the rise, with 2.7% of HTML documents using CSS in 2000 while 15.6% of HTML documents used CSS in 2007.

Table 2. Number of requests by type for the 2000 trace (left) and 2007 trace (right) as percentage of the trace size
type #
image/gif 53.2%
image/jpeg 24.9%
text/html 18.4%
application/x-javascript 1.1%
text/plain 0.5%
type #
image/jpeg 33.3%
image/gif 28.5%
text/html 16.0%
application/x-javascript 6.9%
image/png 3.5%

Traffic Volume and Size

Table 3 shows that while nearly all the traffic in 2000 was caused by traditional HTML formats of images and HTML in 2007 nearly all the traffic volume was made up of videos and binary software updates and downloads. Table 3 also shows the dramatic increase in the average file size of objects, even HTML and JPEGs have increased by 30%.

Table 3. Fraction of traffic volume and average response size (in Kbytes) by type for
the 2000 trace (left) and the 2007 trace (right)
type volume size
image/jpeg 21.5% 10
image/gif 15.5% 4
text/html 14.6% 9
application/msword 9.0% 4147
application/octet-stream 8.4% 672
application/zip 8.1% 1322
video/mpeg 6.8% 861
application/vnd.ms-excel 2.5% 3637
text/plain 2.2% 49
audio/mpeg 2.1% 3360
type volume size
application/octet-s 34.6% 1766
image/jpeg 6.6% 13
application/x-otrkey 6.6% 240610
text/plain 6.1% 231
video/x-msvideo 6.0% 109533
video/x-flv 5.9% 10954
video/flv 5.4% 6730
video/x-ms-wmv 3.2% 42636
text/html 3.1% 13
application/zip 2.5% 9632

Average Page Size and Number of Requests

The number of requests per client has more than doubled from 7074 requests in 2000 to 14,670 requests in 2007, while the average page size (despite what page delimeter threshold delay was used to delineate a page from proxy traces) has doubled from 2000 to 2007.

Caching and Dynamic Pages

The caching efficiency has decreased dramatically from 2000 to 2007 with 54.3% hits in 2000 to 23% hits in 2007. This is due in part to the more dynamic nature of the Web. In 2000 only 21.2% of all URLs were requests for dynamic documents (form, cgi requests, etc.) while 37.1% were dynamic requests in 2007.

Further Reading

Average Web Page Triples Since 2003
Within the last five years, the size of the average web page has more than tripled, and the number of external objects has nearly doubled. While broadband users have experienced somewhat faster response times, dial-up users have been left behind.
Sadre, R., and B. Haverkort, 2008. “Changes in the Web from 2000 to 2007,”
DSOM 2008, 136-148. In a comparison of data traces from 2000 and 2007, University of Twente researchers found that the nature of data distributed over the World Wide Web has changed. Object size and object type distribution show that the Web has moved from a static one-way delivery platform to a more dynamic and interactive medium.

Leave a Comment