Spidering Tab

Retrieve files by following links is an important setting that tells how you want the links to be followed. Links can either be followed in a breadth first manner or a depth first manner.

Beginner tip: Depth first is usually faster, but may miss a few pages. Breadth first is usually slower, storing more links in the 'todo' queue of links at any given moment. If you are using depth first, and you notice some expected pages are missing from the output, switch to breadth first.

Progress message detail level is the number of messages that are shown by the parser as it spiders a channel. Set to minimum for almost no messages, medium for errors only, or to maximum for all possible messages. Maximum is useful for debugging malformed HTML or finding a possible parser bug.

Power user tip: If using a progress dialog instead of a console window to watch progress, in the 'minimum' mode, the progress dialog has zero messages, whereas the console window will show any error messages. There is a technical reason for this (you can read the sourcecode of the progress dialog class heirarchy if you want the details).

Truncate URLs displayed in the progress window to [] letters allows you to truncate the length of displayed URL of files as displayed in the progress dialog/console window. Truncation replaces the middle part of the displayed URL as a series of dots. For example, a truncated URL might be displayed as: Processing http://www.businessweek.com/technology/cont....5047.html This truncation only affects the display of the progress information, it doesn't change any of the actual spidering.

Specify a referrer URL allows a 'referrer' string to be sent to the server when accessing the start page of the channel. The 'referrer' is the URL of page that was page that was downloaded before the start page was downloaded.

When retrieving files from this channel's web server, identify as: specifies what 'user-agent' the spider identifies itself as, when it requests the files to download. The 'user-agent' is often used by webmasters to decide what page should be delivered, so that it may look best on a certain type of browser. You can either select This browser type and choose one of the listed browsers, or you can select This custom user-agent string and enter a string into the text control.

Power user tip: Strings for 'user-agent' are usually of the format Mozilla/3.0 (compatible; Plucker 1.2) but they can be anything at all.

Execute a command before spidering this channel executes the specified operating system command immediately before this channel starts its spider.

Execute a command after plucking channel executes the specified operating system command immediately after this channel has completed its spider.

Power user tip: These are 'synchronous executions'. That is, the program flow of Plucker Desktop is halted until the applications called by these commands have terminated. This gives the command time to finish whatever it needs to do.