People Search Project

The project goal was to acquire the possible addresses of the Ph.D. students who graduated from School of Library and Information Science (SLIS), Indiana University, Bloomington.

The internet is one massive repository of information. It is an in-house for people's work in the form of publications, projects, homepages and others. Several search engines are currently available to help us search for information. In our project, we used "Google" search engine to aid us in accomplishing our goals.


Methodology:
The algorithm to meet the goal was wriiten using Perl. The pseudo-code of the following is discussed below:

Input
a) PhD graduates names list (First Name # Last Name)
b) US States Names (Abbrev. # Name)

Parsing approach
+ Google search engine supports the search query only if it appears from the google search webpage. To support this feature the code    encompasses the refereal site as google and makes it seem that the query is coming from the google web-page.
+ Search query terms are persons first name and last name.
+ To acquire the information, search was performed on the following:
    a) Google phone directory:
        + Initial search was performed in the Google phone directory where people's addressess along with their phone numbers are listed.
        + Each person's name was queried against each US states and the search hit pages were filtered for required data, which comprised of the            persons name, their address and the contact no

    b) Homepage Search:
       + Person's full name was used to query google search engine.
       + Search result hit was taken as the homepage
       + 2 conditons were checked: i) w/ Frames ii) w/o Frames
       + If frames exists, then dir url was extracted and existence of the webpage was confirmed using multiple combinations (eg: resume.html,           resume.htm)
       + If frames donot exists then, check for the existence of the page directly.
       + Finally the address information was extracted from the homepage

Output
a) Search results are stored in individual persons data files named using a combination of first and last name. The data obtained by each     approach is clearly indicated in the file (sample persons result file).
b) A webpage using the script, which shows the results of all the people's names that were used in the search.


Authors:
Ketan Mane, SLIS, Ph.D student
Dr. Kiduk Yang, SLIS, Assistant Professor

Last Updated: 5th Jan 2004