public class PubListCrawler extends ACrawler
Constructor and Description |
---|
PubListCrawler(PDatabase pdb,
java.lang.String url,
int transLev,
boolean refPubMode)
Sets up the publication list crawling.
|
Modifier and Type | Method and Description |
---|---|
protected void |
crawl()
Builds up the URL now with the start field, downloads the page, starts a
PubListHTMLCrawler, stores grabbed publications, then repeats this if there
was any new data on the page.
|
java.util.List<Publication> |
getPublications() |
getTime, interrupt, launch, run, scheduleCrawlers, waitForCrawlers
activeCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
public PubListCrawler(PDatabase pdb, java.lang.String url, int transLev, boolean refPubMode)
pdb
- PDatabase object which contains information for database specific crawling.url
- Base URL of the result list. Should not contain a "start" field.transLev
- 0: only search results, 1: referrer publications also 2: referrer of referrers also will be grabbed.refPubMode
- If true, it handles the list as list of referring publications which may need different patterns to be used.public java.util.List<Publication> getPublications()