Tutorial 6 > Next tutorial
NRS Tutorial: Creating a parse rule for a search engine
This tutorial will lead you through creating a search
template and populating it with search engines.
The template will then let you metasearch all
the engines.
Step 1: Create search template
In admin, click on the 'Templates' tab and on the
add template form, select the 'search' type, and
enter a name, say 'mysearch'. Click 'Add'.
Step 2: Edit search template
Click 'edit' next to the 'mysearch' template. We are
now going to add a couple of search engines that
already in the system. Click 'search list', select
the checkbox next to:
Next, click 'Add' at the bottom of the page, and then
click 'Back to Template' at the bottom of the page.
You now have 4 searches on your template.
Step 3: Add a new search
We will now add a new search to the template that
does not exist in the system. Where it says add
new search enter the url: http://www.kartoo.com/en/kartoo.html
Click 'Save', the url now appears in orange indicating
it has not been properly setup yet. NRS will crawl
the URL immediately to retrieve all form definitions
found on the page. Click 'edit' next to it.
Step 4: Specifying search properties
- For name enter: Kartoo Metasearch
- For description enter: Innovative metasearch
engine using Flash to graphically manage search results.
- In the drop down box that says 'All Searches':
pick the first item. Sometimes many items
are listed here representing all the forms found
on the URL. You need to select the one you wish
to use. To help, you can click the 'view' link
to find out more about the forms.
Click 'Save'.
NRS features algorithms to automatically extract search
results from the search results page. Sometimes
NRS cannot determine the search results and a
parse rule must be written. A parse rule is also
needed if extra metadata information is desired
for each search result. For example some search
engines return metadata fields like size, date,
category,...
Step 5: Test the search
Click the 'test' link. A new window opens up with your
template and a list of search engines. Deselect
all search engines except Kartoo. Enter a search
term, for example: java, and click search. You
will now get a search result page. NRS sucessfully
retrieved the search results automatically. On
the form that says 'Top results from Kartoo Metasearch'
click 'search'. This brings you to the original
Kartoo search result page. We are now going to
write a parse rule to also retrieve the list of
search engines each search result came from.
Step 6: Specify a parse rule
Bring back the search property window and enter into
the parse rule field:
<c label="." hide>
<title link>
<desc>
<domain link>
<sources prefix="(" suffix=")">
Parse rules operate by breaking down the search result
page into a list of text lines and links.
The first line looks for the search result count and
identifies it with the "." character.
The hide attribute says to not display this item.
The second line says the next search result item is
the title and is a link. "title" is
a reserved word indicating to NRS that it is the
title.
The third line says the next search result item is
the description and is text.
The fourth line says the next search result item is
the domain and is a link.
The fifth line says the next search result item is
a list of sources and can be identified with a
prefix of "(" and a suffix of ")".
You can test the parse rule by clicking 'Save' and
the 'test' link again. Notice now the search results
also return the metadata items of domain and sources.
The trick in writing parse rules is to help NRS identify
uniquely a search result. In this case by first
identifying a text line with a '.' and the next
item is always a link helps already narrowing
down the list of options. Using label,prefix,
suffix attributes are important to prevent false
positives.
For more info on parse rules, consult the help system
in NRS and have a look at the parse rules in the
demo app.
Back to support
|