Services Products Purchase Free Trial Partner Support About LoopIP Contact Us Home

Tutorial 5 > Next tutorial

NRS Tutorial: Site Search

Step 1: Create a collection

In admin under "Crawler|collection list", use the add collection form, select 'site' for type, and enter an arbitrary name, say 'LoopIP'. Click 'Add'.

Step 2: Edit the collection

Click on your new collection name. You can now edit the properties of the collection. Specify the url of the site you wish to crawl, for example: http://www.loopip.com/

Review some of the property settings:

  • stay on path: deselect this option unless your start URL points to a directory you wish to crawl exclusively.
  • stay on site: deselecting this option will make the crawler crawl links pointing off the site. Not recommended.
  • follow cgi: select this option if you have URL's that have '?' in them and are dynamically generated.
  • index forms: select this option to discover search forms on the site and present them as part of search results.
  • robot check: if your site has a robots.txt file that excludes all bots, you can skip the robots.txt check here.
  • coverage: you can control how deeply the crawler crawls. Specify a distance value that indicates how many crawl hops to allow from the start url. You can also specify the maximum number of pages to crawl.
  • politeness: the default politeness is 30 seconds per page on the same domain. To speed things up you can specify a smaller value.
  • history: as pages change over time you can indicate how many versions of the page to keep in the database. This allows differencing features on pages, showing what has changed over time.
    excludes: specify URL patterns to exclude from the crawl.
  • includes: specify URL patterns to include in the crawl. You can specify other domains that are allowed, even though you have 'stay on site' or 'stay on path' turned on.

Step 3: Start the Crawl

Click on the crawler tab, and click the 'Start' button next to start crawl. The crawl will start in a minute or so. By repeatedly clicking on the crawler tab you can find out the status of the crawl. You can stop it at any time. By default the crawler is set to run every night at midnight for 5 hours.

Step 4: Index full-text

Click on the crawler tab, and click on the 'Start' button next to reindex full text, if it has not already started. The full text indexing process can take anywhere from an hour or so, to several days if you have millions of documents.

Step 5: Create a template

After finishing the full-text indexing process you need to create a template in order to search against the index. Under "Crawler|Collection list|create template", select your collection and click 'Add'. Now click on the 'Templates' tab and you'll see your new template. Click on it and try some searches.

Back to support



 
LoopIP search
Web search
Net Research Server
Net Research Server - demonstration website
visit Net Research Server - demonstration website
Demo Links
Web Search
Shopping Engine
Local Search
Directory
Metasearch
Enterprise
Wiki
Integration

Copyright © 2008 LoopIP LLC. All rights reserved | Terms | Privacy