Services Products Purchase Free Trial Partner Support Company Contact LoopIP

FAQ for Net Research Server

The Questions

  1. Background
    1. What is Net Research Server?
    2. How and why was Net Research Server created?
    3. OK, so how does Net Research Server compare to other search engines?
    4. Where can I get Net Research Server?
  2. General Technical Questions
    1. "Why can't I ...? Why won't ... work?" What to do in case of problems
    2. What's the best hardware/operating system/... How do I get the most out of my Net Research Server?
    3. Why isn't there a binary for my platform?
  3. Starting Net Research Server
    1. Why does Net Research Server no longer start up?
    2. How can I fix a corrupt database?
    3. How can Net Research Server run on port 80?
    4. How can Net Research Server start on boot?
    5. How can I stop Net Research Server?
  4. Error Log Messages
    1. Where is the log file?
    2. How can I get more log information?
    3. What are the common errors found in the log?
  5. ODP Configuration Questions
    1. How do I specify when to import ODP?
    2. How do I import a subset of ODP?
    3. How do I exclude ODP categories during import?
    4. How do I import the Adult ODP category?
  6. Crawler Configuration Questions
    1. What is a collection?
    2. How do I crawl and index a subset of ODP?
    3. How do I start the crawler?
  7. Mail Configuration Questions
    1. Why does Net Research Server have a mail engine?
    2. How do I setup SMTP and POP3?
    3. How do I setup virtual mail domains?
  8. Webserver Configuration Questions
    1. How can the webserver run on a particular domain/IP/port?
    2. How many webserver threads should I use?
    3. Where are the web logs stored?
    4. How do I set the default page?
  9. Rendering Configuration Questions
    1. How are HTML pages rendered?
    2. Which XSLT engine is used?
    3. How do I modify an XSL template?
    4. How do I access a page as XML data?
  10. Database Configuration Questions
    1. How can I backup my databases?
    2. How can I rebuild my databases?
  11. Access Control Questions
    1. How can I control access to admin?
    2. How can I control access to my application?
    3. How do I change user login and signup?
  12. Template Configuration Questions
    1. What is a template?
    2. How can I add a search engine to my template?
    3. How can I add a news source to my template?
    4. How do I aggregate XML/RSS feeds?

The Answers
A. Background
  1. What is Net Research Server?

    Net Research Server

    • is a powerful, flexible, application server
    • implements web crawling, indexing, and search
    • imports the Open Directory Project or a custom directory
    • is highly configurable
    • comes in binary form and installs with no dependencies
    • provides cross-platform support for Windows and Linux
    • creates applications to provide web users with features including:

      News aggregation
      allows you to easily aggregate any news source found on pages on the web. News sources are added by specifying the URL of the page on which the news is found, and specifying a parse rule for the news item title, description, and any other metadata.
      Search engine aggregation
      Allows you to create meta search pages that provide search results from multiple search engines found on the web. Search engines are added by specifying the URL on which the search engine form is found, and then specifying a search result parse rule. The parse rule lets you specify which metadata to extract from the page.
      Open Directory content
      Allows you to import the Open Directory Project's RDF dump, which provides over 500,000 categories and over 5 million site listings. You can customize which areas of ODP to import, browse, search, crawl, and index. Alternatively, use your own directory and combine it with extra metadata for a rich search experience.
      Applications
      Net Research Server can organize content into applications. Applications are built entirely within an HTML application editor and provide a customizable navigation interface to all the content pages in the application. The pages are organized into tabs, and a drop down menu for each tab listing all pages.
      Personalization
      Users can signup for an application, login, create news subscriptions by e-mail, create custom news or search pages with selected sources, monitor search results from search engines, monitor web pages for changes, organize content into folders, and send/receive mail.
      Wiki
      Import a Wiki or create your own. Incorporate Wiki answers and wiki metadata into search results.
      Tagging
      Users can import their bookmarks and submit more. By tagging the bookmarks you can find them easily using tag lists, tag clouds, or search.
      Customizability
      You can customize the entire application server HTML by modifying XSL templates. The application editor lets you build custom applications. Templates let you add your own content. All content is available as XML.

  2. How and why was Net Research Server created?

    Net Research Server was created to provide innovative ways to bring together web content from various sources into a rich and unified interface. As the web grows, information overload becomes more and more of a problem. NRS addresses this with it's application metaphor which lets you build a portal featuring all the information in one place. NRS is written in C++ and is available for Linux and Windows.

  3. OK,so how does Net Research Server compare to other search engines?

    NRS is highly scalable. It crawls and indexes over 10 million pages per index. Federated search lets you search into multiple indexes in the same amount of time.

    NRS is unique in bringing together many search technologies into one unified platform, and providing a rich interface to create research solutions and portals.

  4. Where can I get Net Research Server?

    The latest version is always found on the download page. We have free versions of NRS restricted to 10,000 documents. You can upgrade the document limit easily within NRS.

B. General Technical Questions
  1. "Why can't I ...? Why won't ... work?" What to do in case of problems

    If you are having trouble with NRS, you should take the following steps:

    • Check the log

    NRS generates a log file that you can find in the application directory. You can also access the log through the admin interface on the webserver tab, by clicking on the "system log" link.

    • Check this FAQ

    The latest version of this Frequently-Asked Questions list is regularly updated.

    • Contact support

    LoopIP has 3 levels of support:

    • free support: free support includes help by e-mail and phone in getting NRS up and running, and general configuration and usage questions.
    • hourly support: hourly support is provided by first purchasing support hours within NRS. This kind of support offers hands-on help in configuration, installation, customization of templates, and solution development.
    • support plan: if you have purchased an NRS maintenance agreement, priority support is provided for the length of agreement, as well as priority patches and bug fixes.

    All e-mail support inquiries should be made to support

  2. What's the best hardware/operating system/... How do I get the most out of my Net Research Server?

    The Windows platform provides the following advantages:

    • faster XSLT engine: if your Windows installation has the Microsoft XML 4.0+ parser installed, it is used over the internal Sablotron engine and provides significantly faster page rendering performance.
    • threading architecture: NRS uses multiple threads to deal with concurrent user requests. The Windows architecture provides better performance when large numbers of threads are in use. This issue has been fixed under Linux in 2.5+ kernels.

    The Linux platform provides the following advantages:

    • no aggressive filesystem caching: Windows server installations use filesystem caching options by default that impact database performance. These settings can be modified in the Windows registry.


    Generally you will not notice much difference between the two platforms.

    NRS is disk intensive and benefits from a fast striped RAID disk array. NRS operates with as little as 512MB of RAM, but requires up to 2GB of RAM when indexing over 10 million full-text documents.

    It is also possible under Linux to use raw disk partitions and software RAID to further increase performance.

    Dual CPU machines provide better responsiveness during update cycles, as NRS can crawl/index in the background whilst still serving pages. Also due to the multithreaded non-blocking architecture of NRS, better scalability is achieved with more CPUs.

    NRS requires 7GB diskspace to import and keep up-to-date the ODP directory, and requires a further 12GB per million documents crawled and indexed.

    Using Flash solid state storage (SSD) disk drives, it is possible to increase search performance from a couple of searches a second to 50 or more searches a second.

  3. Why isn't there a binary for my platform?

    Binaries are provided for the Windows platform, and for the i386 Linux 2.4+ platform. Contact us for additional platform support.

C. Starting Net Research Server
  1. Why does Net Research Server no longer start up?

    By default the NRS webserver binds to "localhost" or 127.0.0.1, and to the port number 2012. You must purchase an NRS upgrade, by registering and making a purchase within the NRS admin, to be able to change the address from localhost:2012. Check the log file to see if an error occurred starting the webserver. If so you can start NRS with the command-line option: -address "host:port" where host is an IP address or fully qualified domain name, and port is the port number such as 2012 or 80. Under Linux only the root user can start services on port numbers below 1024. So if you want to start NRS on port 80 under Linux, make sure you are the root user (you can use the su command to become root, or use the sudo tool), and that no other service such as Apache is listening on the same port.

    It is also possible you have run out of disk space.

  2. How can I fix a corrupt database?

    In the unlikely event of database corruption it is possible to delete the databases in question:

    • deleting settings.db will require reset all system flags, but keep intact everything else.
    • deleting template.db will remove all templates, collection definitions, metadata definitions, cookies.
    • deleting dir.db and dirindex.db will remove the directory.
    • deleting form.db will delete search engine definitions and require resetting some search engine elements on templates.
    • deleting web.db and webindex.db will delete full-text documents and search indexes. Also alert pages will be deleted.
    • deleting mail.db will delete user mail accounts and mailbox content.
    • deleting agent.db will delete all user alerts.
    • deleting tag.db will delete all user bookmarks.
    • deleting wiki.db and wikiindex.db will delete all wiki content.

  3. How can Net Research Server run on port 80?

    Make sure you have no other service running on port 80 such as IIS or Apache. Under Linux, you need to be a root user to start a service with a port number under 1024. Most webservers will also by default bind to all IP's on port 80, causing a conflict. You can specify IIS or Apache to only use particular IP's and let NRS use others.

    For Apache, you need to configure Apache to only listen on the IP addresses and ports it needs by using the "Listen" configuration option in the httpd.conf file. For example: Listen mywebsite.com:80

    For Microsoft Internet Information Server (IIS), if you have IIS 5.0 read the knowledge base entry that explains how to disable socket pooling. If you have IIS 6.0, read the knowledge base entry that explains how to list the IP addresses that IIS will use.

  4. How can Net Research Server start on boot?

    Under Linux, you can add a line to /etc/rc.d/rc.local:
    For example: /home/nrs/nrsd -address "127.0.0.1:80" &

    Under Windows, NRS is installed by default as a service and will thus start automatically. Use the Service Manager to stop and start the service. You can remove NRS as a service with the command-line: -removeservice. You can re-install NRS as a service with: -installservice.

  5. How can I stop Net Research Server?

    There are many ways to stop NRS:

  • In admin, click the "Quit Server" button.
  • Under Linux, find out the name of the executable with: ps -A. Then kill the process using: killall -QUIT processname
  • Under Windows, stop the service using the service manager. Or if running in console mode, press Ctrl-C
D.Error Log Messages
  1. Where is the log file?

    The log file is found in the application directory. It is also accessible from the admin interface under the Webserver tab by clicking on the "system log" link.

  2. How can I get more log information?

    You can start NRS with the command-line: -verbose , or turn on verbose log in the admin webserver tab. The log will now include extra information such as each URL request received, each URL crawled, and each URL indexed. This information can be valuable to our support team when determining support issues.

    If you have crawl/indexing problems you can determine exactly what is happening with this option.

  3. What are the common errors found in the log?

    Look in the log for any line beginning with "Err" for serious errors. These can include:

    • no licence found: make sure your license file is in the application directory.
    • invalid license: invalid license. Contact support to fix your license.
    • web index update failed: Serious errors occurred during full-text indexing.
    • directory update failed: NRS was unable to download the RDF dump, or the RDF dump was invalid. By default, NRS downloads the RDF dumps from http://rdf.dmoz.org/rdf/. Make sure the following files exist:

      http://rdf.dmoz.org/rdf/structure.rdf.u8.gz
      http://rdf.dmoz.org/rdf/content.rdf.u8.gz

      If these files are missing you can change the dmoz rdf path in admin, or using the command-line option: -dmozurl "http://xxx.com/yyy/zzz/"
    • bad address: NRS could not start the web server with your given address. Change it using the command-line option: -address "hostname:port"

E.ODP Configuration Questions
  1. How do I specify when to import ODP?

    You can specify the date of the next update in this format: mm/dd/yyyy. You can then also specify the hour at which to update, and the number of days until the next update.

    ODP updates happen in the background, and once ODP has been imported a database swap is made. This database swap results in a few seconds of downtime.

    To disable ODP updates, in the "update date" field, specify "disable".

  2. How do I import a subset of ODP?

    You can specify a new default subset by using the command-line option: -dmozroot "newrootlist" or using the admin interface. You can specify one or more category ID's that will form the new root of the default directory template. The next ODP import will then only import this set of content. You can see the new content subset on the directory template, or any new template of type "directory" you create.

  3. How do I exclude ODP categories during import?

    You can specify a directory filter to exclude particular categories. It is done by name, for example:

    Regional/North America/United States

    would be specified in the category filter.

    If you wanted to remove all categories except "Computers" you would specify:

    Adult,Arts,Business,Games,Health,Home,Kids and Teens,News,Recreation,Reference,Regional,Science,Shopping,Society,Sports,World


  4. How do I import the Adult ODP category?

    By default the adult section of the ODP is not imported. To enable, specify the command-line option: -pornfilter 0

    or modify the porn filter setting in admin.

F. Crawler Configuration Questions
  1. What is a collection?

    A collection is a way to crawl and index ODP categories, a website, or a user library. In a collection you specify either the URL or the list of ODP category ID's to crawl and index. You can specify how the crawler should behave with settings to control how much of a site to crawl, URL patterns to exclude or include, stay on path, stay on site, robot behavior, politeness, number of pages to crawl per site, depth of crawl, and more. You can also specify indexing settings such as rank boost, popularity ranking through page link analysis.

    Collection definitions can be imported and exported from the admin interface.

    To test a collection, it must first be crawled, then indexed. You can then create a template for it, to search against it.

  2. How do I crawl and index a subset of ODP?

    Create a collection of an arbitrary name using the new collection form found under "Crawler|Collection List". Then select the collection to edit it. Under "Categories to Index" deselect all, and specify a category id using the 'category picker'. Review the crawling and indexing settings, then click "Save". Click on the crawler tab, and click the "Start" button to start crawl. Once the crawl is finished, or you can stop the crawler at any point, click "Start reindex full text". You can monitor the progress by clicking on the crawler tab. Once finished, to test your collection you need to create a corresponding template. Click on "Crawler|Collection List|Create Template" to automatically create one for your collection. Then select your new template under the templates tab, and do a search.

  3. How do I start the crawler?

    The crawler can be manually started by selecting the "Crawler|Start crawl" button. A schedule can also be setup where you specify the next crawl date, how long to crawl for, and how many days to wait for the next crawl. Specifying a date value of "disable" will disable the crawl schedule.

G. Mail Configuration Questions
  1. Why does Net Research Server have a mail engine?

    Providing a mail account for users provides a way to organize in folders all user mail. NRS generates mail for alerts placed on search results, page changes, and new subscriptions. NRS also has a feature to discover in the full-text ODP index newsletters that can be signed up for automatically. NRS also sends out mail for alerts sent to external mail accounts.

  2. How do I setup SMTP and POP3?

    Under the mail tab, you can specify the SMTP and POP3 addresses in the form of "hostname:port". Typically you use port 25 for SMTP and 110 for POP3. The SMTP address must correspond to a valid MX record in your DNS records, and also have a valid reverse DNS record, for mail to be properly sent out to other mail gateways.

  3. How do I setup virtual mail domains?

    When you add a mail template to an application, you can edit the properties of the mail template and specify and new domain. All users who have signed up with this application will receive e-mail addresses with the given domain.

H. Webserver Configuration Questions
  1. How can the webserver run on a particular domain/IP/port?

    You can use the command-line option: -address "hostname:port"

    Or specify the address in the admin interface under "Webserver|Address".

    The "hostname" can be any domain/hostname/ip that can be bound on the particular machine.

    The port can be any value from 1 to 65535. Under Linux to use a port number under 1024 requires root priviledges. Also make sure no other service on the machine has bound the same address. For example both Apache and IIS bind by default to all IP's on port 80.

  2. How many webserver threads should I use?

    Each concurrent user request requires a separate thread. When a user requests a template page it can sometimes take 30 or so seconds to fully load, as determined by the gather timeout value. For example when conducting a metasearch, as the search results return from the search engines they are streamed back to the user. If retrieving search results from a search engine takes more than the default timeout of 30 seconds, the page will take 30 seconds to load. So for example 100 threads would allow 100 users to simultaneously request a page. All other users are placed on a queue. The listen queue size can be modified with the "socket listen backlog" setting, and the thread count can be modified with the "threads" setting, both found on the web server tab..

  3. Where are the web logs stored?

    By default log files for all user requests are placed in the application directory. The log files are of the format:

    DATE TIME TIMETAKEN IP METHOD PAGE QUERY STATUS USERAGENT REFERER COOKIE

    To better organize your logs, create a new directory and specify it under "Webserver|Web log directory".

  4. How do I set the default page?

    If no URL path is specified to NRS, NRS will return the admin page. To modify this setting, modify "Templates|Default Template Settings|Default template" and specify a new default page or path.

I. Rendering Configuration Questions
  1. How are HTML pages rendered?

    HTML pages are rendered by performing an XSLT transform on an XML stream and an XSL template. Each template type has an associated XSL template which can be overriden. In the case of streaming templates such as templates of type "search" an XSLT transform occurs multiple times for the page, first to render the header of the page, next to render the middle of the page one or more times, and last to render the footer of the page.

  2. Which XSLT engine is used?

    NRS incorporates the Sablotron XSLT engine from Ginger Alliance. Under Windows, if the Microsoft XML 4.0+ parser is installed it is automatically detected and used. The Microsoft engine is generally faster and preferable.

  3. How do I modify an XSL template?

    There are default XSL templates for each type of template. You can access the default XSL templates by selecting the template from the template list and appending ".xsl" to it. Using your favorite editor, you can modify the XSL file and upload it under "Templates|Default template settings" to modify default templates.

    Each template can also have its own XSL template. You can use the default XSL file, modify it, and upload it under the template's property page, or using the "group set" interface to modify multiple templates at the same time. The XSL file can either contain an entirely new XSL file, or just the <xsl:template> blocks you wish to override over the default XSL templates.

  4. How do I access a page as XML data?

    Any page can be retrieved as XML by appending ".xml" to the URL (before the query or ?). Using Internet Explorer 5.0+, you can view the XML and its structure. The major sections are:

    • /results/query: reflects the URL query variables.
    • /results/sysinfo: reflects NRS system settings and licensing
    • /results/user: reflects some user account information
    • /results/title: reflects a page title
    • /results/attribution: reflects ODP attribution information
    • /results/cat: reflects an ODP category given by the "p" query
    • /results/dirsearchresults: reflects a directory search
    • /results/searches: reflects meta search results
    • /results/snippets: reflects news results
    • /results/xml: reflects XML/RSS aggregation
    • and more

J. Database Configuration Questions
  1. How can I backup my databases?

    You can backup all databases when NRS is not running manually, or when running you can use the admin interface to backup all databases to a directory. During the backup proces, databases are compacted improving their performance.

    You can also specify a daily backup. Specify a directory, under which NRS will create a directory for every day of the week.

  2. How can I rebuild my databases?

    If you need to rebuild your NRS installation, it is possible for example to export all templates to a file, delete all databases, and then reimport your templates. Alternatively don't delete settings.db to keep all system configuration, don't delete template.db to keep your templates, don't delete agent.db to keep your user alerts, don't delete mail.db to keep your mail accounts and mailboxes, don't delete dir.db and dirindex.db to keep your ODP catalog, don't delete form.db to keep search form definitions, and lastly don't delete web.db and webindex.db to keep your full-text index.

    Generally any of these databases can be safely deleted without compromising your NRS installation. The only database interdependencies are between dir.db and dirindex.db, and web.db and webindex.db, so these 2 pairs of databases must always be manipulated together.

K. Access Control Questions
  1. How can I control access to admin?

    Use the command-line option: -adminpeer "accesslist"

    Or specify the access list under "Webserver|Admin peer ip".

    The access list consists of a comma separated list of ip addresses, or usernames.

  2. How can I control access to my application?

    Templates in an application can be marked as secure. If the user is not known, the user is redirected to a template (usually mail or library) in the application that has a login/signup option. An application can also have a template of type "application" that allows editing the application. Access to this template is controlled in the property page of the template.

  3. How do I change user login and signup?

    Templates of type "mail" or "Library" feature login and signup functionality. Any application containing one of these templates on its tab bar will redirect the user to this template if the page they are accessing is marked as secure. To modify the login and signup pages, you need to edit the corresponding XSL template and look for the "drawsignup" and "drawllogin" sections.

L.Template Configuration Questions
  1. What is a template?

    A template is a page that can be accessed through the NRS webserver. A template can be of many types such as "news", "search", "directory", "application", "mail",.. Each template type offers different kind of functionality. Templates are commonly organized as applications by providing an app name, and group name to their settings. A template is rendered into HTML by generating an XML stream for the page and query parameters and performing an XSLT transform on it and its corresponding XSL template. A template can accessed as XML, HTML, or XSL through the NRS webserver.

  2. How can I add a search engine to my template?

    Create a template of type "search", edit it, and specify the URL of a search engine, for example http://www.google.com, where it says "add new search". Click "Save". You can now edit the new search, and specify its name, description, and optionally a parse rule. A parse rule is used to help NRS retreive the search results off the page. The parse rule also enables metadata extraction associated with search results such as date, size, cached link, etc..

  3. How can I add a news source to my template?

    Create a template of type "news", edit it, and specify the URL of a news page on the web where it says "add new snippet/news". Click "save", then edit your news item, and specify its parameters. You have to specify a parse rule, the simplest of which might be: <title link><desc>. If this does not work you need to read up on the parse rule syntax in the admin help and refer to the demo new templates examples.

  4. How do I aggregate XML/RSS feeds?

    NRS offers powerful aggregation and caching of XML/RSS feeds. You specify all the URLs to the feeds in a template, and for each you can also define an XSLT identity transform to optionally standardize the XML of each source into the same vocabulary.



 
LoopIP search
Web search
Net Research Server
Net Research Server - demonstration website
visit Net Research Server - demonstration website
Demo Links
Web Search
Shopping Engine
Local Search
Directory
Metasearch
Enterprise
Wiki
Integration

Copyright © 2008 LoopIP LLC. All rights reserved | Terms | Privacy