
SharpSpider: A Continuous, Parallel and Distributed Spider
Design and implementation of software to navigate the Web autonomously
Versandkostenfrei!
Versandfertig in 6-10 Tagen
38,99 €
inkl. MwSt.
PAYBACK Punkte
19 °P sammeln!
Search engines have become so indispensable that they rank second only to e-mail as the most popular online activity. To respond to queries in a timely fashion, search engines make use of large indices of word occurrences on Web pages to cross-reference websites to keywords. Such indices are maintained by spiders, a special kind of computer program that browses the Web autonomously. However, due to a variety of technological limitations, a single spider has proven insufficient to maintain a search engine's index. Hence, in this book, we review several alternatives to split a spider's work into...
Search engines have become so indispensable that
they rank second only to e-mail as the most popular
online activity. To respond to queries in a timely
fashion, search engines make use of large indices of
word occurrences on Web pages to cross-reference
websites to keywords. Such indices are maintained by
spiders, a special kind of computer program that
browses the Web autonomously. However, due to a
variety of technological limitations, a single
spider has proven insufficient to maintain a search
engine's index. Hence, in this book, we review
several alternatives to split a spider's work into
multiple processes, and define a methodology to
preserve an up-to-date index of the Web.
SharpSpider, our prototype spider, has been
evaluated using the resources of PlanetLab, a
globally distributed platform for developing and
deploying planetary-scale services. Despite the
utilisation of very modest equipment, we have
performed large crawls of the Web, distributing the
workload amongst various computers spread across
different continents. The statistics derived from
our research offer valuable insight into the nature
of educational Web resources.
they rank second only to e-mail as the most popular
online activity. To respond to queries in a timely
fashion, search engines make use of large indices of
word occurrences on Web pages to cross-reference
websites to keywords. Such indices are maintained by
spiders, a special kind of computer program that
browses the Web autonomously. However, due to a
variety of technological limitations, a single
spider has proven insufficient to maintain a search
engine's index. Hence, in this book, we review
several alternatives to split a spider's work into
multiple processes, and define a methodology to
preserve an up-to-date index of the Web.
SharpSpider, our prototype spider, has been
evaluated using the resources of PlanetLab, a
globally distributed platform for developing and
deploying planetary-scale services. Despite the
utilisation of very modest equipment, we have
performed large crawls of the Web, distributing the
workload amongst various computers spread across
different continents. The statistics derived from
our research offer valuable insight into the nature
of educational Web resources.