Limits Tab

Maximum depth specifies how many page down from the starting page to continue spidering. A value of 1 only retrieves the initial page, a value of 2 retrieves the page and pages that are directly linked from the starting page.

Stay on host / stay on domain filter If this filter is turned on, then pages that originate from a different host or domain are ignored. If you use Stay on host, then pages originating from a different host from the starting pages, will be ignored, and not included in the output. For example: if the starting page is http://www.slashdot.org, then items coming from http://adfu.slashdot.org and http://www.cnn.com would be ignored. If you use Stay on domain, then pages originating from a different domain from the starting pages, will be ignored, and not included in the output, put pages from a different server on the same domain will be included. For example: if the starting page is http://www.slashdot.org, then items coming from http://adfu.slashdot.org would be also be included, but pages coming from http://www.cnn.com will be ignored.

URL pattern filter If this filter is turned on, then pages that do not match this regular expression pattern in the URL will be ignored. For example if spidering www.bmj.com and you only want to retrieve the current articles, which are in the directory www.bmj.com/current, then specifying a URL pattern value of .*www.bmj.com/current.* would only retrieve files in that subdirectory on the server. Beginner tip: For a brief tutorial on the basic syntax of regular expressions, see regular expressions .

Inclusion/exclusion list filter Allows a channel-specific inclusion/exclusion list to be specified for the channel. This channel-specific exclusion list is appended to the shared global exclusion list when this particular channel is spidered. Click the Edit channel-specific inclusion/exclusion list button to add and edit the listings. For details on inclusion/exclusion lists and their editing, see Inclusion/Exclusion List Dialog.