Tutorial 5 > Next tutorial
NRS Tutorial: Site Search
Step 1: Create a collection
In admin under "Crawler|collection list",
use the add collection form, select 'site' for
type, and enter an arbitrary name, say 'LoopIP'.
Click 'Add'.
Step 2: Edit the collection
Click on your new collection name. You can now edit
the properties of the collection. Specify the
url of the site you wish to crawl, for example:
http://www.loopip.com/
Review some of the property settings:
- stay on path: deselect this option unless your
start URL points to a directory you wish to
crawl exclusively.
- stay on site: deselecting this option will make
the crawler crawl links pointing off the site.
Not recommended.
- follow cgi: select this option if you have URL's that have
'?' in them and are dynamically generated.
- index forms: select this option to discover search
forms on the site and present them as part of
search results.
- robot check: if your site has a robots.txt file
that excludes all bots, you can skip the robots.txt
check here.
- coverage: you can control how deeply the crawler crawls.
Specify a distance value that indicates how
many crawl hops to allow from the start url.
You can also specify the maximum number of pages
to crawl.
- politeness: the default politeness is 30 seconds per page
on the same domain. To speed things up you can specify a smaller value.
- history: as pages change over time you can indicate how
many versions of the page to keep in the database.
This allows differencing features on pages,
showing what has changed over time.
excludes: specify URL patterns to exclude from
the crawl.
- includes: specify URL patterns to include in the crawl.
You can specify other domains that are allowed,
even though you have 'stay on site' or 'stay
on path' turned on.
Step 3: Start the Crawl
Click on the crawler tab, and click the 'Start' button
next to start crawl. The crawl will start in a
minute or so. By repeatedly clicking on the crawler
tab you can find out the status of the crawl.
You can stop it at any time. By default the crawler
is set to run every night at midnight for 5 hours.
Step 4: Index full-text
Click on the crawler tab, and click on the 'Start' button
next to reindex full text, if it has not already started. The full text indexing
process can take anywhere from an hour or so,
to several days if you have millions of documents.
Step 5: Create a template
After finishing the full-text indexing process you need
to create a template in order to search against
the index. Under "Crawler|Collection list|create
template", select your collection and click
'Add'. Now click on the 'Templates' tab and you'll
see your new template. Click on it and try some
searches.
Back to support
|