|
Using WebSites: Enter website URL and collect all phone and fax data found in that web site.
Quick Start:
-
Select "WebSite/Dirs/Groups" as a source in New Extraction Settings
Dialog
- Enter website URL in "Initial Address" box
- Select Spidering Depth or enter the No. of
pages to be processed (Optional). A setting of "0" will process and look for data in whole website. A setting of "1" will process index or home page with associated files under root dir only.
- Click OK
Note: Program will retrieve html/text pages of the website according to the Depth you specified
and extract all phone, fax found in those pages. By default,
program will stay only the current domain.
If you want the spider to retrieve files of external sites that are linked from starting site specified in "General" tab, then you need to set "Follow External URLs" of "External Site" tab.
By default, program will follow external sites only once, that is -
- Program will process initial address and all external sites found in
initial address.
- If you want the spider to follow external sites with unlimited loop, select "Unlimited" in "Spider External URLs Links" combo box, and remember you need to manually stop
the extraction, because this way program can spider entire internet.
Using Directories/Groups: Choose Yahoo,
Google, Dmoz or other directory and get all phone, fax contact data from there.
Quick Start:
Let's say you want to extract data of all companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
Follow steps:
- Select "WebSites/Dirs/Groups" as a source in New Extraction Settings
Dialog
- Enter the URL in "Initial Address" box
- Select Spidering Depth=0 and check "Stay within Full URL" option.
- Click OK
Or, lets say you want to extract data of all companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
+ all down level folders like
http://directory.google.com/Top/Computers/Software/Freeware/windows
http://directory.google.com/Top/Computers/Software/Freeware/windows/browser
http://directory.google.com/Top/Computers/Software/Freeware/linux
etc....
Follow steps:
- Select "WebSites/Dirs/Groups" as a source in New Extraction Settings
Dialog
- Enter the URL in "Initial Address" box
- Select Spidering Depth=0 and check "Stay within Full URL" option.
- Go to "External Site" tab and check "Follow External URLs" option.
- Click OK
Note: Using the above steps
program will download http://directory.google.com/Top/Computers/Software/Freeware/ page and optionally all down level pages and will build a URLs list of companies listed there.
Finally extract all data found in those sites.
(Remember: this setting tells the spider to process/follow/visit all URLs found while processing
"Initial Address" of "General" tab).
|