|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.parse.ParserFactory
public final class ParserFactory
Creates and caches Parser
plugins.
Field Summary | |
---|---|
static String |
DEFAULT_PLUGIN
Wildcard for default plugins. |
static org.apache.commons.logging.Log |
LOG
|
Constructor Summary | |
---|---|
ParserFactory(Configuration conf)
|
Method Summary | |
---|---|
protected List<Extension> |
getExtensions(String contentType)
Finds the best-suited parse plugin for a given contentType. |
Parser |
getParserById(String id)
Function returns a Parser instance with the specified
extId , representing its extension ID. |
Parser[] |
getParsers(String contentType,
String url)
Function returns an array of Parser s for a given content type. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final org.apache.commons.logging.Log LOG
public static final String DEFAULT_PLUGIN
Constructor Detail |
---|
public ParserFactory(Configuration conf)
Method Detail |
---|
public Parser[] getParsers(String contentType, String url) throws ParserNotFound
Parser
s for a given content type.
The function consults the internal list of parse plugins for the
ParserFactory to determine the list of pluginIds, then gets the
appropriate extension points to instantiate as Parser
s.
contentType
- The contentType to return the Array
of Parser
s for.url
- The url for the content that may allow us to get the type from
the file suffix.
Array
of Parser
s for the given contentType.
If there were plugins mapped to a contentType via the
parse-plugins.xml
file, but never enabled via
the plugin.includes
Nutch conf, then those plugins
won't be part of this array, i.e., they will be skipped.
So, if the ordered list of parsing plugins for
text/plain
was [parse-text,parse-html,
parse-rtf]
, and only parse-html
and
parse-rtf
were enabled via
plugin.includes
, then this ordered Array would
consist of two Parser
interfaces,
[parse-html, parse-rtf]
.
ParserNotFound
public Parser getParserById(String id) throws ParserNotFound
Parser
instance with the specified
extId
, representing its extension ID. If the Parser
instance isn't found, then the function throws a
ParserNotFound
exception. If the function is able to find
the Parser
in the internal PARSER_CACHE
then it
will return the already instantiated Parser. Otherwise, if it has to
instantiate the Parser itself , then this function will cache that Parser
in the internal PARSER_CACHE
.
id
- The string extension ID (e.g.,
"org.apache.nutch.parse.rss.RSSParser",
"org.apache.nutch.parse.rtf.RTFParseFactory") of the Parser
implementation to return.
Parser
implementation specified by the parameter
id
.
ParserNotFound
- If the Parser is not found (i.e., registered with
the extension point), or if the there a
PluginRuntimeException
instantiating the Parser
.protected List<Extension> getExtensions(String contentType)
contentType
- Content-Type for which we seek a parse plugin.
null
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |