public class PubListHTMLCrawler extends ACrawler
Constructor and Description |
---|
PubListHTMLCrawler(PDatabase pdb,
java.lang.String html,
int transLev,
boolean refPubMode)
Sets up the publication list crawling.
|
Modifier and Type | Method and Description |
---|---|
protected void |
crawl()
Crops list block, and parses its content.
|
java.util.List<Publication> |
getPublications() |
static void |
setThreadLimit(int threadLimit) |
getTime, interrupt, launch, run, scheduleCrawlers, waitForCrawlers
activeCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
public PubListHTMLCrawler(PDatabase pdb, java.lang.String html, int transLev, boolean refPubMode)
pdb
- PDatabase object which contains information for database specific crawling.html
- The HTML block containing the publication list.transLev
- 0: only search results, 1: referrer publications also 2: referrer of referrers also will be grabbed.refPubMode
- If true, it handles the list as list of referring publications which may need different patterns to be used.public static void setThreadLimit(int threadLimit)
public java.util.List<Publication> getPublications()