Internet search tools fall into two camps:search engines,such as HotBot and AltaVista,and online directories,such as Yahoo and Lycos.The difference between the two is related to how they compile their site listings.Of course,there are exceptions to every rule.Some search utilities,such as Ask Jeeves,combine the search engine and directory approaches into a single package,hoping to provide users with the best of both worlds.
In directory-based search services,the Web site listings are compiled manually.For example,the everpopular Yahoo dedicates staff resources to accept site suggestions from users,review and categorize them,and add them to a specific directory on the Yahoo site.
You can usually submit your Web site simply by filling out an online form.On Yahoo,for example,you'll find submission information at .Because human intervention is necessary to process,verify,and review submission requests,expect a delay before your site secures a spot in a directory-based search service.
On the flip side,search engines completely automate the compilation process,removing the human component entirely.
A software robot,called a spider or crawler,automatically fetches sites all over the Web,reading pages and following associated links.By design,a spider will return to a site periodically to check for new pages and changes to existing pages.
Results from spidering are recorded in the search engine’s index or catalog.Given the wealth of information available on the Internet,it is not surprising that indexes grow to very large sizes.For example,the AltaVista index has recently been increased to top out at 350 million pages.This may seem like a mammoth number,but by all estimates it still represents less than 35 percent of all pages on the Web.
Because of the depth and breadth of information being indexed,there is usually a delay,sometimes up to several weeks,between the time a site has been“spidered”and when it appears in a search index.Until this two-step process has been completed,a site remains unavailable to search queries.
Finally,the heart of each search engine is an algorithm that matches keyword queries against the information in the index,ranking results in the order the algorithm deems most relevant.
Because the spiders,resulting indexes,and search algorithms of each search engine differ,so do the search results and rankings across the various search engines.This explains why a top 10 site in HotBot may not appear near the top of Alta Vista when the same keyword search criterion is entered.
In addition,many,but not all,search utilities also reference metatags—invisible HTML tags within documents that describe their content—as a way to control how content is indexed.As a result,proper use of metatags throughout a site can also boost search engine ranking.
因特網(wǎng)搜索工具分為兩大陣營:搜索引擎,如HotBot和AltaVista,以及在線目錄,如Yahoo和Lycos。兩者間的差別與它們?nèi)绾尉幾W(wǎng)站編目有關(guān)。當(dāng)然,對任何規(guī)律都有例外。有些搜索實(shí)用程序,如Ask Jeeves,把搜索引擎和目錄方法合并成單一的軟件包,希望把這兩個(gè)陣營中最好的東西提供給用戶。
在基于目錄的搜索服務(wù)中,Web網(wǎng)站編目是手工編撰的。比如一直流行的Yahoo就指定專門的人力資源來接受用戶對網(wǎng)站的建議,并對建議進(jìn)行評價(jià)和分類,再把它們加到Y(jié)ahoo網(wǎng)站上特定目錄中。
通常是通過簡單地填寫在線表格就能把你的網(wǎng)站信息提交給(搜索引擎)。例如,在Yahoo網(wǎng)站上,你可以在 www.yahoo.com/docs/info/include.htm1上找到提交信息。由于人工干預(yù)對處理、驗(yàn) 證和評價(jià)提交請求是必要的,所以在網(wǎng)站在基于目錄的搜索服務(wù)中捕捉到一處之前,可 望有些延遲。
另一方面,搜索引擎完全實(shí)現(xiàn)了編撰過程的自動(dòng)化,徹底消除了人工干預(yù)。
一個(gè)叫做蜘蛛或爬蟲的軟件機(jī)器人自動(dòng)地在整個(gè)Web上取出站點(diǎn),閱讀頁面和跟隨相關(guān)的鏈接。通過設(shè)計(jì),蜘蛛可以周期性地返回到站點(diǎn),檢查新的頁面和修改已有頁面。
蜘蛛爬行得到的結(jié)果記錄在搜索引擎的索引或目錄中。已知了因特網(wǎng)上可資利用的信息的價(jià)值,對索引擴(kuò)張到非常大的規(guī)模是不會(huì)感到驚訝的。 例如,AltaVista的索引最近已增至3.5億頁而名列前茅。這個(gè)數(shù)字看來好像非常大,但總體估計(jì)它僅代表了Web上不足35%的頁面。
由于已編索引的信息的深度與廣度(非常大),所以通常在“蜘蛛爬行過”站點(diǎn)的時(shí)間與出現(xiàn)在搜索索引中的時(shí)間之間有一個(gè)延遲,有時(shí)多達(dá)幾周。只有這兩步的過程完成之后,站點(diǎn)才能供搜索查詢使用。
最后,每個(gè)搜索引擎的心臟是一種算法,它將關(guān)鍵字查詢與索引中的信息匹配起來,并按算法認(rèn)為最有關(guān)聯(lián)的順序把結(jié)果列出。
由于每種搜索引擎的蜘蛛、產(chǎn)生的索引和搜索算法都是不一樣的,所以在不同搜索引擎上的搜索結(jié)果和排列次序是不同的。這就解釋了為什么當(dāng)相同的關(guān)鍵字搜索準(zhǔn)則輸入進(jìn)去時(shí),HotBot中排在最前面的10個(gè)站點(diǎn)不會(huì)出現(xiàn)在 AltaVista中最前面的站點(diǎn)中。
此外,很多(但不是所有的)搜索實(shí)用程序也引用元標(biāo)記(文檔中用來描述其內(nèi)容的、看不見的HTML標(biāo)記),作為控制內(nèi)容如何編索引的方法。因此,在整個(gè)站點(diǎn)中正確使用元標(biāo)記也能提高(此站點(diǎn))在搜索引擎中的排列名次。