org.apache.nutch.parse.zip
Class ZipParser

java.lang.Object
  extended by org.apache.nutch.parse.zip.ZipParser
All Implemented Interfaces:
Configurable, Parser, Pluggable

public class ZipParser
extends Object
implements Parser

ZipParser class based on MSPowerPointParser class by Stephan Strittmatter. Nutch parse plugin for zip files - Content Type : application/zip

Author:
Rohit Kulkarni & Ashish Vaidya

Field Summary
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
ZipParser()
          Creates a new instance of ZipParser
 
Method Summary
 Configuration getConf()
           
 ParseResult getParse(Content content)
           This method parses the given content and returns a map of <key, parse> pairs.
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ZipParser

public ZipParser()
Creates a new instance of ZipParser

Method Detail

getParse

public ParseResult getParse(Content content)
Description copied from interface: Parser

This method parses the given content and returns a map of <key, parse> pairs. Parse instances will be persisted under the given key.

Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>.

Specified by:
getParse in interface Parser
Parameters:
content - Content to be parsed
Returns:
a map containing <key, parse> pairs

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable


Copyright © 2006 The Apache Software Foundation