Manila Bulletin Online
Nav Bar   Mon Feb 13,2006 Navigation Nav Bar
spacer
 
spacer
spacer
spacer
spacer
spacer
spacer



 
spacer
OPENING PAGES
spacer
How Google collects and ranks results

By Peachy Limpin

Months after I’ve written the series on Smokey Mountain, I still get inquiries from people interested to help the parish and its projects. Two weeks ago a stranger called me up at work asking for a copy of the articles. He was particularly interested on the e-church, which incidentally had its groundbreaking ceremonies last month, and told him it was available on the web. But he told me he couldn’t find it there, which took me by surprise. Since he was so eager to get a copy, I just told him that I’d have the articles photocopied and for him to pick them up, which he did.

Still puzzled at his inability to find the story online, I did my own Google search.  Lo and behold, the article he was looking for was fourth in the search list.  My guess was he typed “smokey mountain” without using delimiters such that the highest ranked pages were for vacation spots in the US.

So how does Google collect and rank thousands, if not, millions of search results for a query?

Google Newsletter for Librarians provides a simplified answer to this.  Before the search results are displayed, a Google search goes through several processes. 

First, Google’s ‘spider’, called Googlebot, crawls and indexes billions of web pages.  The ‘spider’, which is really a program, asks a web server to return a specified web page, scans that page for hyperlinks that in turn provides new documents, and then assigns a number to the retrieved pages for reference if fetched.

The initial crawl will produce a lot of documents which will then be indexed.  So a search for example of ‘smokey mountain’, Google’s servers will have to read the complete text of every document and build an index.  Having found the documents with ‘smokey’ and ‘mountain’, they will juggle the data and list every document that contains the word.  For example, the word ‘smokey’ might appear in documents 3, 12, 16, 54, 78, 89, and 98, while ‘mountain’ might appear in documents 2, 3, 14, 16, 38, 47, 54, 66, 89, and 95.  Looking at the document numbers, the words ‘smokey’ and ‘mountain’ both appeared in 3, 16, 54, and 89.

When Google has finished ‘intersecting the posting list, it will rank the pages in terms of relevance.  Google is known for its PageRank algorithm that ranks a web page based on the number of links it has from other pages and the quality of the linking sites.  For example, if the story I wrote on Smokey Mountain had five or six links from websites such as www.adb.org or www.un.org  then it would be valued more than any links from less dependable sites.

Besides PageRank, Google, in our example, will also rank a document high when it contains the words ‘smokey’ and ‘mountain’ next to each other, or if the words are on the title, which applies to my article, or if the words appear several times throughout the page.  Once the documents are scored, Google will get a few sentences from each document and highlight the words a searcher is looking for.  Then, Google will return the ranked results and the sentences to the searcher as a results page.  Google goes through this process for each search request all in under a second.

How? Google has hundreds of computers each with a stored index of web pages and when a search is made, the task is distributed to these computers for faster searching and document matching.

Tip: when searching for a phrase, it’s advisable to put the words in quotation marks. This way, the search engine will return more relevant results.  A Google search for smokey mountain turned out more than 2.5 million hits, while a search for “smokey mountain” trimmed down the results by a more than a million, and further limiting the search to web sites in the Philippines listed 19,200 documents.

000  000  000  000  000  000

The 6th annual E-Services Philippines will be held this coming Thursday and Friday at EDSA Shangri-la in Mandaluyong City.  One of the highlights of the two-day event is the CEO Forum that will feature presentations of 23 international experts and executives on major outsourcing industry issues and trends.  Coinciding with it is the C-Level Summit where 20 senior executives from international ICT companies will meet with local executives in a networking conference.

The data transcription track, one of the breakout sessions of E-Services Philippines, will be held at the Waterfront Hotel in Cebu City on February 20 to promote regional ICT hubs in the country.

000  000  000  000  000  000

The Innovations Expo scheduled this week at the Hiyas ConventionCenter in Malolos City, Bulacan has been moved to March 2 and 3.  The schedule at the Diwa ng Tarlac Convention Center in Tarlac City will push through on February 23 and 24.

The list of exhibitors continue to grow and now includes the Central Luzon Industry and Energy Research and Development Consortium, the Phil-Sino Center for Agricultural Technology of the Department of Agriculture,

Digitel, DLSU College of Computer Studies, DOST Philippine Council for Advanced Science and Technology Research and Development, Filipino Inventors Society, Manila Bulletin, Media G8way,

PUP College of Computer Management and College of Engineering, and Smart.

The Expo will then be held in Ilocos Norte during the second half of the year.  By then, the Expo, which is on it second year, would have reached 40,000 students. 

Organized by ConvergeX Asia Expositions Management, the Expo, which has been making the rounds of major cities in Luzon, presents the latest local technologies, inventions, and researches to students in the provinces.  It also provides the students, who are the future innovators, an opportunity to meet and interact with inventors, science and technology researchers, and industry practitioners.


(For feedback, comments, suggestions email me at openingpagemb@yahoo.com)

Printer Friendly Version spacer Email to a friend
 

spacer
OTHER TECHNEWS NEWS
spacer
spacer
spacer
spacer
 

spacer




HOME | SUBSCRIBE | ADVERTISE | CONTACT US | SEARCH | ARCHIVE | FEEDBACK

FEATURES: MB WAP | MB Mobile Edition | Desktop Headlines

SECTIONS: MAIN NEWS | BUSINESS | OPINION & EDITORIAL | SPORTS | YOUTH & CAMPUS | ENTERTAINMENT | AGRICULTURE | INFOTECH | HEALTH | TOURISM | SOCIETY | METRO & NATIONAL NEWS | PROVINCIAL NEWS | MOTORING SECTIONS | SCHOOLS COLLEGES AND UNIVERSITIES | WELL BEING | TECHNEWS | TASTE | WEDDINGS | I | BOARD PASSERS | 

LINKS: PHILIPPINE PANORAMA | TEMPO | CLASSIFIED ADS ONLINE | USER PRIVACY POLICY

Copyright © 2001-2005, Manila Bulletin. All Rights Reserved.

designed and developed by
Alchemy Solutions