Index (Nutch 1.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

A B C D E F G H I J K L M N O P Q R S T U V W X Z _

A

AbstractFetchSchedule - Class in org.apache.nutch.crawl: This class provides common methods for implementations of FetchSchedule.
AbstractFetchSchedule() - Constructor for class org.apache.nutch.crawl.AbstractFetchSchedule
AbstractFetchSchedule(Configuration) - Constructor for class org.apache.nutch.crawl.AbstractFetchSchedule
accept() - Method in class org.apache.nutch.urlfilter.api.RegexRule: Return if this rule is used for filtering-in or out.
acceptLanguage - Variable in class org.apache.nutch.protocol.http.api.HttpBase: The "Accept-Language" request header value.
ACCESS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus: Access denied - authorization required, but missing/incorrect.
ACRONYM - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
ACTION - Static variable in interface org.apache.nutch.indexer.field.Fields
AdaptiveFetchSchedule - Class in org.apache.nutch.crawl: This class implements an adaptive re-fetch algorithm.
AdaptiveFetchSchedule() - Constructor for class org.apache.nutch.crawl.AdaptiveFetchSchedule
add(Token) - Method in class org.apache.nutch.analysis.lang.NGramProfile: Add ngrams from a token to this profile
add(StringBuffer) - Method in class org.apache.nutch.analysis.lang.NGramProfile: Add ngrams from a single word to this profile
add(Inlink) - Method in class org.apache.nutch.crawl.Inlinks
add(Inlinks) - Method in class org.apache.nutch.crawl.Inlinks
add(NutchDocument, Field) - Static method in class org.apache.nutch.indexer.lucene.LuceneWriter: Deprecated. Use NutchDocument.add(String, String) instead and set index-level metadata for field information.
add(String, String) - Method in class org.apache.nutch.indexer.NutchDocument
add(String, String) - Method in class org.apache.nutch.metadata.Metadata: Add a metadata name/value mapping.
add(String, String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
add(Summary.Fragment) - Method in class org.apache.nutch.searcher.Summary: Adds a fragment to a summary.
add_escapes(String) - Method in exception org.apache.nutch.analysis.ParseException: Used to convert raw characters to their escaped version when these raw version cannot be used as part of an ASCII string literal.
addAttribute(String, String) - Method in class org.apache.nutch.plugin.Extension: Adds a attribute and is only used until model creation at plugin system start up.
addClassToConf(Configuration, Class<? extends NutchIndexWriter>) - Static method in class org.apache.nutch.indexer.NutchIndexWriterFactory
addClue(String, String, int) - Method in class org.apache.nutch.util.EncodingDetector
addClue(String, String) - Method in class org.apache.nutch.util.EncodingDetector
addDependency(String) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a dependency
addExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a exported library with a relative path to the plugin directory.
addExtension(Extension) - Method in class org.apache.nutch.plugin.ExtensionPoint: Install a coresponding extension to this extension point.
addExtension(Extension) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a extension.
addExtensionPoint(ExtensionPoint) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a extension point.
addFieldOptions(String, LuceneWriter.STORE, LuceneWriter.INDEX, LuceneWriter.VECTOR, Configuration) - Static method in class org.apache.nutch.indexer.lucene.LuceneWriter
addFieldOptions(String, LuceneWriter.STORE, LuceneWriter.INDEX, Configuration) - Static method in class org.apache.nutch.indexer.lucene.LuceneWriter
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
addIndexBackendOptions(Configuration) - Method in interface org.apache.nutch.indexer.IndexingFilter: Adds index-level configuraition options.
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
addIndexBackendOptions(Configuration) - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
addIndexBackendOptions(Configuration) - Method in class org.creativecommons.nutch.CCIndexingFilter
addMeta(String, String) - Method in class org.apache.nutch.metadata.MetaWrapper: Add metadata.
addNotExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor: Adds a not exported library with a plugin directory relative path.
addPatternBackward(String) - Method in class org.apache.nutch.util.TrieStringMatcher: Adds any necessary nodes to the trie so that the given String can be decoded in reverse and the first character is represented by a terminal node.
addPatternForward(String) - Method in class org.apache.nutch.util.TrieStringMatcher: Adds any necessary nodes to the trie so that the given String can be decoded and the last character is represented by a terminal node.
addProhibitedPhrase(String[]) - Method in class org.apache.nutch.searcher.Query: Add a prohibited phrase in the default field.
addProhibitedPhrase(String[], String) - Method in class org.apache.nutch.searcher.Query: Add a prohibited phrase in the specified field.
addProhibitedTerm(String) - Method in class org.apache.nutch.searcher.Query: Add a prohibited term in the default field.
addProhibitedTerm(String, String) - Method in class org.apache.nutch.searcher.Query: Add a prohibited term in the specified field.
addRequiredPhrase(String[]) - Method in class org.apache.nutch.searcher.Query: Add a required phrase in the default field.
addRequiredPhrase(String[], String) - Method in class org.apache.nutch.searcher.Query: Add a required phrase in the specified field.
addRequiredTerm(String) - Method in class org.apache.nutch.searcher.Query: Add a required term in the default field.
addRequiredTerm(String, String) - Method in class org.apache.nutch.searcher.Query: Add a required term in a specified field.
addSearchTerm(String, OntResource) - Static method in class org.apache.nutch.ontology.jena.OntologyImpl
addUrlFeatures(NutchDocument, String) - Method in class org.creativecommons.nutch.CCIndexingFilter: Add the features represented by a license URL.
analyze(StringBuilder) - Method in class org.apache.nutch.analysis.lang.NGramProfile: Analyze a piece of text
analyze(Path) - Method in class org.apache.nutch.scoring.webgraph.LinkRank: Runs the complete link analysis job.
AnalyzerFactory - Class in org.apache.nutch.analysis: Creates and caches NutchAnalyzer plugins.
AnalyzerFactory(Configuration) - Constructor for class org.apache.nutch.analysis.AnalyzerFactory
ANCHOR - Static variable in interface org.apache.nutch.indexer.field.Fields
AnchorFields - Class in org.apache.nutch.indexer.field: Creates FieldWritable objects for inbound anchor text.
AnchorFields() - Constructor for class org.apache.nutch.indexer.field.AnchorFields
AnchorFields.Collector - Class in org.apache.nutch.indexer.field: Collects and creates FieldWritable objects from the inlinks.
AnchorFields.Collector() - Constructor for class org.apache.nutch.indexer.field.AnchorFields.Collector
AnchorFields.Extractor - Class in org.apache.nutch.indexer.field: Extracts outlinks to be created as FieldWritable objects.
AnchorFields.Extractor() - Constructor for class org.apache.nutch.indexer.field.AnchorFields.Extractor: Default constructor.
AnchorFields.Extractor(Configuration) - Constructor for class org.apache.nutch.indexer.field.AnchorFields.Extractor: Configurable constructor.
APOSTROPHE - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
append(Node) - Method in class org.apache.nutch.parse.html.DOMBuilder: Append a node to the current container.
append(String) - Method in class org.apache.nutch.parse.msword.WordTextBuffer
APPLICATION_NAME - Static variable in interface org.apache.nutch.metadata.Office
ArcInputFormat - Class in org.apache.nutch.tools.arc: A input format the reads arc files.
ArcInputFormat() - Constructor for class org.apache.nutch.tools.arc.ArcInputFormat
ArcRecordReader - Class in org.apache.nutch.tools.arc: The ArchRecordReader class provides a record reader which reads records from arc files.
ArcRecordReader(Configuration, FileSplit) - Constructor for class org.apache.nutch.tools.arc.ArcRecordReader: Constructor that sets the configuration and file split.
ArcSegmentCreator - Class in org.apache.nutch.tools.arc: The ArcSegmentCreator is a replacement for fetcher that will take arc files as input and produce a nutch segment as output.
ArcSegmentCreator() - Constructor for class org.apache.nutch.tools.arc.ArcSegmentCreator
ArcSegmentCreator(Configuration) - Constructor for class org.apache.nutch.tools.arc.ArcSegmentCreator: Constructor that sets the job configuration.
ATSIGN - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
attrName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
AUTHOR - Static variable in interface org.apache.nutch.metadata.Office
autoDetectClues(Content, boolean) - Method in class org.apache.nutch.util.EncodingDetector
AutomatonURLFilter - Class in org.apache.nutch.urlfilter.automaton: RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for Java^TM.
AutomatonURLFilter() - Constructor for class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
AutomatonURLFilter(String) - Constructor for class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
autoResolveContentType(String, String, byte[]) - Method in class org.apache.nutch.util.MimeUtil: A facade interface to trying all the possible mime type resolution strategies available within Tika.

B

BasicFields - Class in org.apache.nutch.indexer.field: Creates the basic FieldWritable objects.
BasicFields() - Constructor for class org.apache.nutch.indexer.field.BasicFields
BasicFields.Flipper - Class in org.apache.nutch.indexer.field: Runs the first part of redirect logic.
BasicFields.Flipper() - Constructor for class org.apache.nutch.indexer.field.BasicFields.Flipper
BasicFields.Merger - Class in org.apache.nutch.indexer.field: Merges output of all segments fields collecting only the most recent set of fields for any given url.
BasicFields.Merger() - Constructor for class org.apache.nutch.indexer.field.BasicFields.Merger
BasicFields.Scorer - Class in org.apache.nutch.indexer.field: The Scorer job sets the boost field from the NodeDb score.
BasicFields.Scorer() - Constructor for class org.apache.nutch.indexer.field.BasicFields.Scorer
BasicIndexingFilter - Class in org.apache.nutch.indexer.basic: Adds basic searchable fields to a document.
BasicIndexingFilter() - Constructor for class org.apache.nutch.indexer.basic.BasicIndexingFilter
BasicQueryFilter - Class in org.apache.nutch.searcher.basic: The default query filter.
BasicQueryFilter() - Constructor for class org.apache.nutch.searcher.basic.BasicQueryFilter
BasicSummarizer - Class in org.apache.nutch.summary.basic: Implements hit summarization.
BasicSummarizer() - Constructor for class org.apache.nutch.summary.basic.BasicSummarizer
BLOCKED - Static variable in class org.apache.nutch.protocol.ProtocolStatus: Thread was blocked http.max.delays times during fetching.
BlockedException - Exception in org.apache.nutch.protocol.http.api
BlockedException(String) - Constructor for exception org.apache.nutch.protocol.http.api.BlockedException
BOOST - Static variable in interface org.apache.nutch.indexer.field.Fields
BOOST_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
BOOSTFACTOR - Static variable in interface org.apache.nutch.indexer.field.Fields
BUFFER_SIZE - Static variable in class org.apache.nutch.protocol.http.api.HttpBase

C

C_PLUS_PLUS - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
C_SHARP - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
CACHE - Static variable in interface org.apache.nutch.indexer.field.Fields
Cached - Class in org.apache.nutch.servlet: A servlet that serves raw Content of any mime type.
Cached() - Constructor for class org.apache.nutch.servlet.Cached
CACHING_FORBIDDEN_ALL - Static variable in interface org.apache.nutch.metadata.Nutch: Don't show either original forbidden content or summaries.
CACHING_FORBIDDEN_CONTENT - Static variable in interface org.apache.nutch.metadata.Nutch: Don't show original forbidden content, but show summaries.
CACHING_FORBIDDEN_KEY - Static variable in interface org.apache.nutch.metadata.Nutch: Sites may request that search engines don't provide access to cached documents.
CACHING_FORBIDDEN_NONE - Static variable in interface org.apache.nutch.metadata.Nutch: Show both original forbidden content and summaries (default).
calculate(Content, Parse) - Method in class org.apache.nutch.crawl.MD5Signature
calculate(Content, Parse) - Method in class org.apache.nutch.crawl.Signature
calculate(Content, Parse) - Method in class org.apache.nutch.crawl.TextProfileSignature
calculateLastFetchTime(CrawlDatum) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule: This method return the last fetch time of the CrawlDatum
calculateLastFetchTime(CrawlDatum) - Method in interface org.apache.nutch.crawl.FetchSchedule: Calculates last fetch time of the given CrawlDatum.
CCDeleteUnlicensedTool - Class in org.creativecommons.nutch: Deletes documents in a set of Lucene indexes that do not have a Creative Commons license.
CCDeleteUnlicensedTool(IndexReader[]) - Constructor for class org.creativecommons.nutch.CCDeleteUnlicensedTool: Constructs a duplicate detector for the provided indexes.
CCIndexingFilter - Class in org.creativecommons.nutch: Adds basic searchable fields to a document.
CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
CCParseFilter - Class in org.creativecommons.nutch: Adds metadata identifying the Creative Commons license used, if any.
CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
CCParseFilter.Walker - Class in org.creativecommons.nutch: Walks DOM tree, looking for RDF in comments and licenses in anchors.
CCQueryFilter - Class in org.creativecommons.nutch: Handles "cc:" query clauses, causing them to search the "cc" field indexed by CCIndexingFilter.
CCQueryFilter() - Constructor for class org.creativecommons.nutch.CCQueryFilter
cdata(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of cdata.
CHAR_ENCODING_FOR_CONVERSION - Static variable in interface org.apache.nutch.metadata.Nutch
CHARACTER_COUNT - Static variable in interface org.apache.nutch.metadata.Office
characters(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of character data.
charactersRaw(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: If available, when the disable-output-escaping attribute is used, output raw text without escaping.
CHECK_BLOCKING - Static variable in interface org.apache.nutch.protocol.Protocol: Property name.
CHECK_ROBOTS - Static variable in interface org.apache.nutch.protocol.Protocol: Property name.
checkBlocking - Variable in class org.apache.nutch.protocol.http.api.HttpBase: Plugin should handle host blocking internally.
checkClientTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
checkOutputSpecs(FileSystem, JobConf) - Method in class org.apache.nutch.fetcher.FetcherOutputFormat
checkOutputSpecs(FileSystem, JobConf) - Method in class org.apache.nutch.indexer.DeleteDuplicates
checkOutputSpecs(FileSystem, JobConf) - Method in class org.apache.nutch.parse.ParseOutputFormat
checkRobots - Variable in class org.apache.nutch.protocol.http.api.HttpBase: Plugin should handle robot rules checking internally.
checkServerTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
childLen - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
children - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
childrenList - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
chooseRepr(String, String, boolean) - Static method in class org.apache.nutch.util.URLUtil: Given two urls, a src and a destination of a redirect, it returns the representative url.
CircularDependencyException - Exception in org.apache.nutch.plugin: CircularDependencyException will be thrown if a circular dependency is detected.
CircularDependencyException(Throwable) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
CircularDependencyException(String) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
CJK - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
cleanMimeType(String) - Static method in class org.apache.nutch.util.MimeUtil: Cleans a MimeType name by removing out the actual MimeType, from a string of the form:
clear() - Method in class org.apache.nutch.crawl.Inlinks
clear() - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
clear() - Method in class org.apache.nutch.metadata.Metadata: Remove all mappings from metadata.
clearClues() - Method in class org.apache.nutch.util.EncodingDetector: Clears all clues.
Client - Class in org.apache.nutch.protocol.ftp: Client.java encapsulates functionalities necessary for nutch to get dir list and retrieve file from an FTP server.
Client() - Constructor for class org.apache.nutch.protocol.ftp.Client
clone() - Method in class org.apache.nutch.crawl.CrawlDatum
clone() - Method in class org.apache.nutch.searcher.Query.Clause
clone() - Method in class org.apache.nutch.searcher.Query
close() - Method in class org.apache.nutch.crawl.CrawlDbFilter
close() - Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
close() - Method in class org.apache.nutch.crawl.CrawlDbReader
close(Reporter) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter
close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
close() - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
close() - Method in class org.apache.nutch.crawl.CrawlDbReducer
close() - Method in class org.apache.nutch.crawl.Generator.Selector
close() - Method in class org.apache.nutch.crawl.Injector.InjectMapper
close() - Method in class org.apache.nutch.crawl.Injector.InjectReducer
close() - Method in class org.apache.nutch.crawl.LinkDb
close() - Method in class org.apache.nutch.crawl.LinkDbFilter
close() - Method in class org.apache.nutch.crawl.LinkDbMerger
close() - Method in class org.apache.nutch.crawl.LinkDbReader
close() - Method in class org.apache.nutch.crawl.URLPartitioner
close() - Method in class org.apache.nutch.fetcher.Fetcher
close() - Method in class org.apache.nutch.fetcher.OldFetcher
close() - Method in class org.apache.nutch.indexer.DeleteDuplicates
close() - Method in class org.apache.nutch.indexer.DeleteDuplicates.HashPartitioner
close() - Method in class org.apache.nutch.indexer.DeleteDuplicates.HashReducer
close() - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
close() - Method in class org.apache.nutch.indexer.DeleteDuplicates.UrlsReducer
close() - Method in class org.apache.nutch.indexer.field.AnchorFields.Collector
close() - Method in class org.apache.nutch.indexer.field.AnchorFields.Extractor
close() - Method in class org.apache.nutch.indexer.field.BasicFields.Flipper
close() - Method in class org.apache.nutch.indexer.field.BasicFields.Merger
close() - Method in class org.apache.nutch.indexer.field.BasicFields.Scorer
close() - Method in class org.apache.nutch.indexer.field.CustomFields.Collector
close() - Method in class org.apache.nutch.indexer.field.CustomFields.Converter
close() - Method in class org.apache.nutch.indexer.field.FieldIndexer
close() - Method in class org.apache.nutch.indexer.FsDirectory
close() - Method in class org.apache.nutch.indexer.IndexerMapReduce
close() - Method in class org.apache.nutch.indexer.lucene.LuceneWriter
close() - Method in interface org.apache.nutch.indexer.NutchIndexWriter
close() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
close() - Method in class org.apache.nutch.indexer.solr.SolrWriter
close() - Method in class org.apache.nutch.parse.ParseSegment
close() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
close() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
close() - Method in class org.apache.nutch.scoring.webgraph.LinkRank
close() - Method in class org.apache.nutch.scoring.webgraph.Loops.Finalizer
close() - Method in class org.apache.nutch.scoring.webgraph.Loops.Initializer
close() - Method in class org.apache.nutch.scoring.webgraph.Loops.Looper
close() - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
close() - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
close() - Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
close() - Method in class org.apache.nutch.searcher.DistributedSearchBean
close() - Method in class org.apache.nutch.searcher.DistributedSegmentBean
close() - Method in class org.apache.nutch.searcher.FetchedSegments
close() - Method in class org.apache.nutch.searcher.IndexSearcher
close() - Method in class org.apache.nutch.searcher.LinkDbInlinks
close() - Method in class org.apache.nutch.searcher.LuceneSearchBean
close() - Method in class org.apache.nutch.searcher.NutchBean
close() - Method in class org.apache.nutch.searcher.SolrSearchBean
close() - Method in class org.apache.nutch.segment.SegmentMerger
close() - Method in class org.apache.nutch.segment.SegmentReader
close() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Closes the record reader resources.
close() - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
close() - Method in class org.apache.nutch.tools.compat.CrawlDbConverter
close() - Method in class org.apache.nutch.tools.compat.ReprUrlFixer
close() - Method in class org.apache.nutch.tools.CrawlDBScanner
close() - Method in class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker
close() - Method in interface org.apache.nutch.tools.PruneIndexTool.PruneChecker: Close the checker - this could involve flushing output files or somesuch.
close() - Method in class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker
close() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool: Closes the indexes, saving changes.
closeReaders(SequenceFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils: Closes a group of SequenceFile readers.
closeReaders(MapFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils: Closes a group of MapFile readers.
Clusterer - Class in org.apache.nutch.clustering.carrot2: This plugin provides an implementation of OnlineClusterer extension using clustering components of the Carrot2 project (http://www.carrot2.org).
Clusterer() - Constructor for class org.apache.nutch.clustering.carrot2.Clusterer: An empty public constructor for making new instances of the clusterer.
clusterHits(HitDetails[], String[]) - Method in class org.apache.nutch.clustering.carrot2.Clusterer: See OnlineClusterer for documentation.
clusterHits(HitDetails[], String[]) - Method in interface org.apache.nutch.clustering.OnlineClusterer: Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).
COLON - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
CommandRunner - Class in org.apache.nutch.util
CommandRunner() - Constructor for class org.apache.nutch.util.CommandRunner
comment(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Report an XML comment anywhere in the document.
COMMENTS - Static variable in interface org.apache.nutch.metadata.Office
COMMIT_SIZE - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
CommonGrams - Class in org.apache.nutch.analysis: Construct n-grams for frequently occurring terms and phrases while indexing.
CommonGrams(Configuration) - Constructor for class org.apache.nutch.analysis.CommonGrams: The constructor.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.CrawlDatum.Comparator
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.Generator.DecreasingFloatComparator: Compares two FloatWritables decreasing.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.crawl.Generator.HashComparator
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.Generator.HashComparator
compare(Object, Object) - Method in class org.apache.nutch.crawl.SignatureComparator
compareTo(CrawlDatum) - Method in class org.apache.nutch.crawl.CrawlDatum: Sort by decreasing score.
compareTo(Object) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
compareTo(Hit) - Method in class org.apache.nutch.searcher.Hit
compareTo(TrieStringMatcher.TrieNode) - Method in class org.apache.nutch.util.TrieStringMatcher.TrieNode
compound(String) - Method in class org.apache.nutch.analysis.NutchAnalysis: Parse a compound term that is interpreted as an implicit phrase query.
COMPUTATION - Static variable in interface org.apache.nutch.indexer.field.Fields
conf - Variable in class org.apache.nutch.analysis.NutchAnalyzer: The current Configuration
conf - Variable in class org.apache.nutch.crawl.Signature
conf - Variable in class org.apache.nutch.plugin.Plugin
conf - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbFilter
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
configure(JobConf) - Method in class org.apache.nutch.crawl.CrawlDbReducer
configure(JobConf) - Method in class org.apache.nutch.crawl.Generator.CrawlDbUpdater
configure(JobConf) - Method in class org.apache.nutch.crawl.Generator.Selector
configure(JobConf) - Method in class org.apache.nutch.crawl.Injector.InjectMapper
configure(JobConf) - Method in class org.apache.nutch.crawl.Injector.InjectReducer
configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDb
configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDbFilter
configure(JobConf) - Method in class org.apache.nutch.crawl.LinkDbMerger
configure(JobConf) - Method in class org.apache.nutch.crawl.URLPartitioner
configure(JobConf) - Method in class org.apache.nutch.fetcher.Fetcher
configure(JobConf) - Method in class org.apache.nutch.fetcher.OldFetcher
configure(JobConf) - Method in class org.apache.nutch.indexer.DeleteDuplicates
configure(JobConf) - Method in class org.apache.nutch.indexer.DeleteDuplicates.HashPartitioner
configure(JobConf) - Method in class org.apache.nutch.indexer.DeleteDuplicates.HashReducer
configure(JobConf) - Method in class org.apache.nutch.indexer.DeleteDuplicates.UrlsReducer
configure(JobConf) - Method in class org.apache.nutch.indexer.field.AnchorFields.Collector: Configures the jobs.
configure(JobConf) - Method in class org.apache.nutch.indexer.field.AnchorFields.Extractor: Configures the job, sets to ignore empty anchors.
configure(JobConf) - Method in class org.apache.nutch.indexer.field.BasicFields.Flipper: Configures the job.
configure(JobConf) - Method in class org.apache.nutch.indexer.field.BasicFields.Merger: Configures the job.
configure(JobConf) - Method in class org.apache.nutch.indexer.field.BasicFields.Scorer: Configures the job.
configure(JobConf) - Method in class org.apache.nutch.indexer.field.CustomFields.Collector
configure(JobConf) - Method in class org.apache.nutch.indexer.field.CustomFields.Converter
configure(JobConf) - Method in class org.apache.nutch.indexer.field.FieldIndexer
configure(JobConf) - Method in class org.apache.nutch.indexer.IndexerMapReduce
configure(JobConf) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
configure(JobConf) - Method in class org.apache.nutch.parse.ParseSegment
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.Loops.Finalizer: Configures the job.
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.Loops.Initializer: Configure the job.
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.Loops.Looper: Configure the job.
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter: Configures the job, sets the flag for type of content and the topN number if any.
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
configure(JobConf) - Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb: Configures the OutlinkDb job.
configure(JobConf) - Method in class org.apache.nutch.segment.SegmentMerger
configure(JobConf) - Method in class org.apache.nutch.segment.SegmentReader
configure(JobConf) - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator: Configures the job.
configure(JobConf) - Method in class org.apache.nutch.tools.compat.CrawlDbConverter
configure(JobConf) - Method in class org.apache.nutch.tools.compat.ReprUrlFixer
configure(JobConf) - Method in class org.apache.nutch.tools.CrawlDBScanner
configure(JobConf) - Method in class org.apache.nutch.tools.FreeGenerator.FG
configure(JobConf) - Method in class org.apache.nutch.util.domain.DomainStatistics
containsKey(Writable) - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
containsValue(Writable) - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
CONTENT - Static variable in interface org.apache.nutch.indexer.field.Fields
Content - Class in org.apache.nutch.protocol
Content() - Constructor for class org.apache.nutch.protocol.Content
Content(String, String, byte[], String, Metadata, Configuration) - Constructor for class org.apache.nutch.protocol.Content
CONTENT_DISPOSITION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_ENCODING - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_LANGUAGE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_LENGTH - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_MD5 - Static variable in interface org.apache.nutch.metadata.HttpHeaders
CONTENT_REDIR - Static variable in class org.apache.nutch.fetcher.Fetcher
CONTENT_REDIR - Static variable in class org.apache.nutch.fetcher.OldFetcher
CONTENT_TYPE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
ContentAsTextInputFormat - Class in org.apache.nutch.segment: An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces.
ContentAsTextInputFormat() - Constructor for class org.apache.nutch.segment.ContentAsTextInputFormat
contextDestroyed(ServletContextEvent) - Method in class org.apache.nutch.searcher.NutchBean.NutchBeanConstructor
contextInitialized(ServletContextEvent) - Method in class org.apache.nutch.searcher.NutchBean.NutchBeanConstructor
CONTRIBUTOR - Static variable in interface org.apache.nutch.metadata.DublinCore: An entity responsible for making contributions to the content of the resource.
coord(int, int) - Method in class org.apache.nutch.indexer.NutchSimilarity
COVERAGE - Static variable in interface org.apache.nutch.metadata.DublinCore: The extent or scope of the content of the resource.
Crawl - Class in org.apache.nutch.crawl
Crawl() - Constructor for class org.apache.nutch.crawl.Crawl
CrawlDatum - Class in org.apache.nutch.crawl
CrawlDatum() - Constructor for class org.apache.nutch.crawl.CrawlDatum
CrawlDatum(int, int) - Constructor for class org.apache.nutch.crawl.CrawlDatum
CrawlDatum(int, int, float) - Constructor for class org.apache.nutch.crawl.CrawlDatum
CrawlDatum.Comparator - Class in org.apache.nutch.crawl: A Comparator optimized for CrawlDatum.
CrawlDatum.Comparator() - Constructor for class org.apache.nutch.crawl.CrawlDatum.Comparator
CrawlDb - Class in org.apache.nutch.crawl: This class takes the output of the fetcher and updates the crawldb accordingly.
CrawlDb() - Constructor for class org.apache.nutch.crawl.CrawlDb
CrawlDb(Configuration) - Constructor for class org.apache.nutch.crawl.CrawlDb
CRAWLDB_ADDITIONS_ALLOWED - Static variable in class org.apache.nutch.crawl.CrawlDb
CrawlDbConverter - Class in org.apache.nutch.tools.compat: This tool converts CrawlDb created in old <UTF8, CrawlDatum> format (Nutch versions < 0.9.0) to the new <Text, CrawlDatum> format.
CrawlDbConverter() - Constructor for class org.apache.nutch.tools.compat.CrawlDbConverter
CrawlDbFilter - Class in org.apache.nutch.crawl: This class provides a way to separate the URL normalization and filtering steps from the rest of CrawlDb manipulation code.
CrawlDbFilter() - Constructor for class org.apache.nutch.crawl.CrawlDbFilter
CrawlDbMerger - Class in org.apache.nutch.crawl: This tool merges several CrawlDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited pages.
CrawlDbMerger() - Constructor for class org.apache.nutch.crawl.CrawlDbMerger
CrawlDbMerger(Configuration) - Constructor for class org.apache.nutch.crawl.CrawlDbMerger
CrawlDbMerger.Merger - Class in org.apache.nutch.crawl
CrawlDbMerger.Merger() - Constructor for class org.apache.nutch.crawl.CrawlDbMerger.Merger
CrawlDbReader - Class in org.apache.nutch.crawl: Read utility for the CrawlDB.
CrawlDbReader() - Constructor for class org.apache.nutch.crawl.CrawlDbReader
CrawlDbReader.CrawlDatumCsvOutputFormat - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDatumCsvOutputFormat() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat
CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter(DataOutputStream) - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter
CrawlDbReader.CrawlDbStatCombiner - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDbStatCombiner() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
CrawlDbReader.CrawlDbStatMapper - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDbStatMapper() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
CrawlDbReader.CrawlDbStatReducer - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDbStatReducer() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
CrawlDbReader.CrawlDbTopNMapper - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDbTopNMapper() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
CrawlDbReader.CrawlDbTopNReducer - Class in org.apache.nutch.crawl
CrawlDbReader.CrawlDbTopNReducer() - Constructor for class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
CrawlDbReducer - Class in org.apache.nutch.crawl: Merge new page entries with existing entries.
CrawlDbReducer() - Constructor for class org.apache.nutch.crawl.CrawlDbReducer
CrawlDBScanner - Class in org.apache.nutch.tools: Dumps all the entries matching a regular expression on their URL.
CrawlDBScanner() - Constructor for class org.apache.nutch.tools.CrawlDBScanner
CrawlDBScanner(Configuration) - Constructor for class org.apache.nutch.tools.CrawlDBScanner
create(String, InputStream, String) - Static method in class org.apache.nutch.analysis.lang.NGramProfile: Create a new Language profile from (preferably quite large) text file
create() - Static method in class org.apache.nutch.util.NutchConfiguration: Create a Configuration for Nutch.
createCrawlConfiguration() - Static method in class org.apache.nutch.util.NutchConfiguration: Create a {@link Configuration for Nutch invoked with the command line crawl command, i.e.
createFields(Path, Path, Path) - Method in class org.apache.nutch.indexer.field.AnchorFields: Creates the FieldsWritable object from the anchors.
createFields(Path, Path[], Path) - Method in class org.apache.nutch.indexer.field.BasicFields: Runs the BasicFields jobs for every segment and aggregates and filters the output to create a final database of FieldWritable objects.
createJob(Configuration, Path) - Static method in class org.apache.nutch.crawl.CrawlDb
createKey() - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
createKey() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Creates a new instance of the Text object for the key.
createLockFile(FileSystem, Path, boolean) - Static method in class org.apache.nutch.util.LockUtil: Create a lock file.
createMergeJob(Configuration, Path, boolean, boolean) - Static method in class org.apache.nutch.crawl.CrawlDbMerger
createMergeJob(Configuration, Path, boolean, boolean) - Static method in class org.apache.nutch.crawl.LinkDbMerger
createOutput(String) - Method in class org.apache.nutch.indexer.FsDirectory
createParseResult(String, Parse) - Static method in class org.apache.nutch.parse.ParseResult: Convenience method for obtaining ParseResult from a single Parse output.
createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase: Creates a new RegexRule.
createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
createSegments(Path, Path) - Method in class org.apache.nutch.tools.arc.ArcSegmentCreator: Creates the arc files to segments job.
createSocket(String, int, InetAddress, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory: Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
createSocket(Socket, String, int, boolean) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
createValue() - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
createValue() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Creates a new instance of the BytesWritable object for the key
createWebGraph(Path, Path[]) - Method in class org.apache.nutch.scoring.webgraph.WebGraph: Creates the three different WebGraph databases, Outlinks, Inlinks, and Node.
CreativeCommons - Interface in org.apache.nutch.metadata: A collection of Creative Commons properties names.
CREATOR - Static variable in interface org.apache.nutch.metadata.DublinCore: An entity primarily responsible for making the content of the resource.
CSV_FORMAT - Static variable in class org.apache.nutch.crawl.CrawlDbReader
curChar - Variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
CURRENT_NAME - Static variable in class org.apache.nutch.crawl.CrawlDb
CURRENT_NAME - Static variable in class org.apache.nutch.crawl.LinkDb
currentToken - Variable in exception org.apache.nutch.analysis.ParseException: This is the last token that has been consumed successfully.
CustomFields - Class in org.apache.nutch.indexer.field: Creates custom FieldWritable objects from a text file containing field information including field name, value, and optional boost and fields type (as needed by FieldWritable objects).
CustomFields() - Constructor for class org.apache.nutch.indexer.field.CustomFields
CustomFields.Collector - Class in org.apache.nutch.indexer.field: Aggregates FieldWritable objects by the same name for the same URL.
CustomFields.Collector() - Constructor for class org.apache.nutch.indexer.field.CustomFields.Collector
CustomFields.Converter - Class in org.apache.nutch.indexer.field: Converts text values into FieldWritable objects.
CustomFields.Converter() - Constructor for class org.apache.nutch.indexer.field.CustomFields.Converter
CustomFields.Converter(Configuration) - Constructor for class org.apache.nutch.indexer.field.CustomFields.Converter

D

DATE - Static variable in interface org.apache.nutch.metadata.DublinCore: A date associated with an event in the life cycle of the resource.
DateQueryFilter - Class in org.apache.nutch.searcher.more: Handles "date:" query clauses, causing them to search the field "date" indexed by MoreIndexingFilter.java
DateQueryFilter() - Constructor for class org.apache.nutch.searcher.more.DateQueryFilter
datum - Variable in class org.apache.nutch.crawl.Generator.SelectorEntry
debugStream - Variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager: Debug output.
dedup(Path[]) - Method in class org.apache.nutch.indexer.DeleteDuplicates
dedup(String) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
DEDUPE - Static variable in class org.apache.nutch.searcher.response.SearchServlet
DEFAULT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: Lexical state.
DEFAULT_BOOST - Static variable in class org.apache.nutch.util.domain.DomainSuffix
DEFAULT_DEDUP_FIELD - Static variable in class org.apache.nutch.searcher.QueryParams
DEFAULT_FIELD - Static variable in class org.apache.nutch.searcher.Query.Clause
DEFAULT_MAX_HITS_PER_DUP - Static variable in class org.apache.nutch.searcher.QueryParams
DEFAULT_NUM_HITS - Static variable in class org.apache.nutch.searcher.QueryParams
DEFAULT_PLUGIN - Static variable in class org.apache.nutch.parse.ParserFactory: Wildcard for default plugins.
DEFAULT_REVERSE - Static variable in class org.apache.nutch.searcher.QueryParams
DEFAULT_STATUS - Static variable in class org.apache.nutch.util.domain.DomainSuffix
DefaultFetchSchedule - Class in org.apache.nutch.crawl: This class implements the default re-fetch schedule.
DefaultFetchSchedule() - Constructor for class org.apache.nutch.crawl.DefaultFetchSchedule
defaultInterval - Variable in class org.apache.nutch.crawl.AbstractFetchSchedule
deflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils: Returns a deflated copy of the input array.
DeflateUtils - Class in org.apache.nutch.util: A collection of utility methods for working on deflated data.
DeflateUtils() - Constructor for class org.apache.nutch.util.DeflateUtils
DeleteDuplicates - Class in org.apache.nutch.indexer: Delete duplicate documents in a set of Lucene indexes.
DeleteDuplicates() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates
DeleteDuplicates(Configuration) - Constructor for class org.apache.nutch.indexer.DeleteDuplicates
DeleteDuplicates.HashPartitioner - Class in org.apache.nutch.indexer
DeleteDuplicates.HashPartitioner() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.HashPartitioner
DeleteDuplicates.HashReducer - Class in org.apache.nutch.indexer
DeleteDuplicates.HashReducer() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.HashReducer
DeleteDuplicates.IndexDoc - Class in org.apache.nutch.indexer
DeleteDuplicates.IndexDoc() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
DeleteDuplicates.InputFormat - Class in org.apache.nutch.indexer
DeleteDuplicates.InputFormat() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.InputFormat
DeleteDuplicates.InputFormat.DDRecordReader - Class in org.apache.nutch.indexer
DeleteDuplicates.InputFormat.DDRecordReader(FileSplit, JobConf, Text) - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
DeleteDuplicates.UrlsReducer - Class in org.apache.nutch.indexer
DeleteDuplicates.UrlsReducer() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.UrlsReducer
deleteFile(String) - Method in class org.apache.nutch.indexer.FsDirectory
deleteUnlicensed() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool: Delete pages without CC licenes.
DELIMITER_SEARCHTERM - Static variable in class org.apache.nutch.ontology.jena.OntologyImpl
DESCRIPTION - Static variable in interface org.apache.nutch.metadata.DublinCore: An account of the content of the resource.
destroy() - Method in class org.apache.nutch.servlet.Cached
DIGEST - Static variable in interface org.apache.nutch.indexer.field.Fields
DIGEST_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
DIGIT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
DIR_NAME - Static variable in class org.apache.nutch.parse.ParseData
DIR_NAME - Static variable in class org.apache.nutch.parse.ParseText
DIR_NAME - Static variable in class org.apache.nutch.protocol.Content
disable_tracing() - Method in class org.apache.nutch.analysis.NutchAnalysis: Disable tracing.
disconnect() - Method in class org.apache.nutch.protocol.ftp.Client: Closes the connection to the FTP server and restores connection parameters to the default values.
DistributedSearch - Class in org.apache.nutch.searcher: Search/summary servers.
DistributedSearch.IndexServer - Class in org.apache.nutch.searcher
DistributedSearch.IndexServer() - Constructor for class org.apache.nutch.searcher.DistributedSearch.IndexServer
DistributedSearch.SegmentServer - Class in org.apache.nutch.searcher
DistributedSearch.SegmentServer() - Constructor for class org.apache.nutch.searcher.DistributedSearch.SegmentServer
DistributedSearch.Server - Class in org.apache.nutch.searcher: Runs a search/summary server.
DistributedSearch.Server() - Constructor for class org.apache.nutch.searcher.DistributedSearch.Server
DistributedSearchBean - Class in org.apache.nutch.searcher
DistributedSearchBean(Configuration, Path, Path) - Constructor for class org.apache.nutch.searcher.DistributedSearchBean
DistributedSegmentBean - Class in org.apache.nutch.searcher
DistributedSegmentBean(Configuration, Path) - Constructor for class org.apache.nutch.searcher.DistributedSegmentBean
distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in interface org.apache.nutch.scoring.ScoringFilter: Distribute score value from the current page to all its outlinked pages.
distributeScoreToOutlinks(Text, ParseData, Collection<Map.Entry<Text, CrawlDatum>>, CrawlDatum, int) - Method in class org.apache.nutch.scoring.ScoringFilters
DmozParser - Class in org.apache.nutch.tools: Utility that converts DMOZ RDF into a flat file of URLs to be injected.
DmozParser() - Constructor for class org.apache.nutch.tools.DmozParser
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.searcher.OpenSearchServlet
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.searcher.response.SearchServlet: Handles all search requests.
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.servlet.Cached
DomainStatistics - Class in org.apache.nutch.util.domain: Extracts some very basic statistics about domains from the crawldb
DomainStatistics() - Constructor for class org.apache.nutch.util.domain.DomainStatistics
DomainStatistics.DomainStatisticsCombiner - Class in org.apache.nutch.util.domain
DomainStatistics.DomainStatisticsCombiner() - Constructor for class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsCombiner
DomainStatistics.MyCounter - Enum in org.apache.nutch.util.domain
DomainSuffix - Class in org.apache.nutch.util.domain: This class represents the last part of the host name, which is operated by authoritives, not individuals.
DomainSuffix(String, DomainSuffix.Status, float) - Constructor for class org.apache.nutch.util.domain.DomainSuffix
DomainSuffix(String) - Constructor for class org.apache.nutch.util.domain.DomainSuffix
DomainSuffix.Status - Enum in org.apache.nutch.util.domain: Enumeration of the status of the tld.
DomainSuffixes - Class in org.apache.nutch.util.domain: Storage class for DomainSuffix objects Note: this class is singleton
DOMBuilder - Class in org.apache.nutch.parse.html: This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.
DOMBuilder(Document, Node) - Constructor for class org.apache.nutch.parse.html.DOMBuilder: DOMBuilder instance constructor...
DOMBuilder(Document, DocumentFragment) - Constructor for class org.apache.nutch.parse.html.DOMBuilder: DOMBuilder instance constructor...
DOMBuilder(Document) - Constructor for class org.apache.nutch.parse.html.DOMBuilder: DOMBuilder instance constructor...
DOMContentUtils - Class in org.apache.nutch.parse.html: A collection of methods for extracting content from DOM trees.
DOMContentUtils(Configuration) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils
DOMContentUtils.LinkParams - Class in org.apache.nutch.parse.html
DOMContentUtils.LinkParams(String, String, int) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
DomUtil - Class in org.apache.nutch.util
DomUtil() - Constructor for class org.apache.nutch.util.DomUtil
DONE_NAME - Static variable in class org.apache.nutch.indexer.field.FieldIndexer
DONE_NAME - Static variable in class org.apache.nutch.indexer.Indexer
DONE_NAME - Static variable in class org.apache.nutch.indexer.IndexMerger
doPost(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.searcher.response.SearchServlet: Forwards all responses to doGet.
doPost(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.servlet.Cached
DOT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
DublinCore - Interface in org.apache.nutch.metadata: A collection of Dublin Core metadata names.
DummySSLProtocolSocketFactory - Class in org.apache.nutch.protocol.httpclient
DummySSLProtocolSocketFactory() - Constructor for class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory: Constructor for DummySSLProtocolSocketFactory.
DummyX509TrustManager - Class in org.apache.nutch.protocol.httpclient
DummyX509TrustManager(KeyStore) - Constructor for class org.apache.nutch.protocol.httpclient.DummyX509TrustManager: Constructor for DummyX509TrustManager.
dump(Path, Path) - Method in class org.apache.nutch.segment.SegmentReader
DUMP_DIR - Static variable in class org.apache.nutch.scoring.webgraph.LinkDumper
dumpLinks(Path) - Method in class org.apache.nutch.scoring.webgraph.LinkDumper: Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.
dumpNodes(Path, NodeDumper.DumpType, long, Path) - Method in class org.apache.nutch.scoring.webgraph.NodeDumper: Runs the process to dump the top urls out to a text file.
dumpUrl(Path, String) - Method in class org.apache.nutch.scoring.webgraph.LoopReader: Prints loopset for a single url.
dumpUrl(Path, String) - Method in class org.apache.nutch.scoring.webgraph.NodeReader: Prints the content of the Node represented by the url to system out.

E

elName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
EmptyRobotRules - Class in org.apache.nutch.protocol
EmptyRobotRules() - Constructor for class org.apache.nutch.protocol.EmptyRobotRules
enable_tracing() - Method in class org.apache.nutch.analysis.NutchAnalysis: Enable tracing.
encode(String) - Static method in class org.apache.nutch.html.Entities
EncodingDetector - Class in org.apache.nutch.util: A simple class for detecting character encodings.
EncodingDetector(Configuration) - Constructor for class org.apache.nutch.util.EncodingDetector
end - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
endCDATA() - Method in class org.apache.nutch.parse.html.DOMBuilder: Report the end of a CDATA section.
endDocument() - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of the end of a document.
endDTD() - Method in class org.apache.nutch.parse.html.DOMBuilder: Report the end of DTD declarations.
endElement(String, String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of the end of an element.
endEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder: Report the end of an entity.
endPrefixMapping(String) - Method in class org.apache.nutch.parse.html.DOMBuilder: End the scope of a prefix-URI mapping.
Entities - Class in org.apache.nutch.html
Entities() - Constructor for class org.apache.nutch.html.Entities
entityReference(String) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notivication of a entityReference.
EOF - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: End of File.
eol - Variable in exception org.apache.nutch.analysis.ParseException: The end of line string for this machine.
equals(Object) - Method in class org.apache.nutch.crawl.CrawlDatum
equals(Object) - Method in class org.apache.nutch.crawl.Inlink
equals(Object) - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
equals(Object) - Method in class org.apache.nutch.fetcher.FetcherOutput
equals(Object) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
equals(Object) - Method in class org.apache.nutch.metadata.Metadata
equals(Object) - Method in class org.apache.nutch.parse.Outlink
equals(Object) - Method in class org.apache.nutch.parse.ParseData
equals(Object) - Method in class org.apache.nutch.parse.ParseStatus
equals(Object) - Method in class org.apache.nutch.parse.ParseText
equals(Object) - Method in class org.apache.nutch.protocol.Content
equals(Object) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
equals(Object) - Method in class org.apache.nutch.protocol.ProtocolStatus
equals(Object) - Method in class org.apache.nutch.searcher.Query.Clause
equals(Object) - Method in class org.apache.nutch.searcher.Query
equals(Object) - Method in class org.apache.nutch.searcher.Query.Phrase
equals(Object) - Method in class org.apache.nutch.searcher.Query.Term
equals(Object) - Method in class org.apache.nutch.searcher.QueryParams
equals(Object) - Method in class org.apache.nutch.searcher.Summary
equals(Object) - Method in class org.apache.nutch.searcher.Summary.Fragment
evaluate() - Method in class org.apache.nutch.util.CommandRunner
EXCEPTION - Static variable in class org.apache.nutch.protocol.ProtocolStatus: Unspecified exception occured.
exec() - Method in class org.apache.nutch.util.CommandRunner
expectedTokenSequences - Variable in exception org.apache.nutch.analysis.ParseException: Each entry in this array is an array of integers.
Extension - Class in org.apache.nutch.plugin: An Extension is a kind of listener descriptor that will be installed on a concrete ExtensionPoint that acts as kind of Publisher.
Extension(PluginDescriptor, String, String, String, Configuration, PluginRepository) - Constructor for class org.apache.nutch.plugin.Extension
ExtensionPoint - Class in org.apache.nutch.plugin: The ExtensionPoint provide meta information of a extension point.
ExtensionPoint(String, String, String) - Constructor for class org.apache.nutch.plugin.ExtensionPoint: Constructor
ExtParser - Class in org.apache.nutch.parse.ext: A wrapper that invokes external command to do real parsing job.
ExtParser() - Constructor for class org.apache.nutch.parse.ext.ExtParser
extract(InputStream) - Method in class org.apache.nutch.parse.ms.MSExtractor: Extracts properties and text from an MS Document input stream
extractText(InputStream) - Method in class org.apache.nutch.parse.ms.MSExtractor: Extracts the text content from a Microsoft document input stream.
extractText(InputStream, String, List) - Method in class org.apache.nutch.parse.zip.ZipTextExtractor

F

FAILED - Static variable in class org.apache.nutch.parse.ParseStatus: General failure.
FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatus: Content was not retrieved.
FAILED_EXCEPTION - Static variable in class org.apache.nutch.parse.ParseStatus: Parsing failed.
FAILED_INVALID_FORMAT - Static variable in class org.apache.nutch.parse.ParseStatus: Parsing failed.
FAILED_MISSING_CONTENT - Static variable in class org.apache.nutch.parse.ParseStatus: Parsing failed.
FAILED_MISSING_PARTS - Static variable in class org.apache.nutch.parse.ParseStatus: Parsing failed.
FAILED_TRUNCATED - Static variable in class org.apache.nutch.parse.ParseStatus: Parsing failed.
FastSavedException - Exception in org.apache.nutch.parse.msword
FastSavedException(String) - Constructor for exception org.apache.nutch.parse.msword.FastSavedException
Feed - Interface in org.apache.nutch.metadata: A collection of Feed property names extracted by the ROME library.
FEED - Static variable in interface org.apache.nutch.metadata.Feed
FEED_AUTHOR - Static variable in interface org.apache.nutch.metadata.Feed
FEED_PUBLISHED - Static variable in interface org.apache.nutch.metadata.Feed
FEED_TAGS - Static variable in interface org.apache.nutch.metadata.Feed
FEED_UPDATED - Static variable in interface org.apache.nutch.metadata.Feed
FeedParserListenerImpl - Class in org.apache.nutch.parse.rss
FeedParserListenerImpl() - Constructor for class org.apache.nutch.parse.rss.FeedParserListenerImpl: Default Constructor
fetch(Path, int, boolean) - Method in class org.apache.nutch.fetcher.Fetcher
fetch(Path, int) - Method in class org.apache.nutch.fetcher.OldFetcher
FETCH_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
FETCH_STATUS_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
FETCH_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
fetched - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
FetchedSegments - Class in org.apache.nutch.searcher: Implements HitSummarizer and HitContent for a set of fetched segments.
FetchedSegments(Configuration, Path) - Constructor for class org.apache.nutch.searcher.FetchedSegments: Construct given a directory containing fetcher output.
Fetcher - Class in org.apache.nutch.fetcher: A queue-based fetcher.
Fetcher() - Constructor for class org.apache.nutch.fetcher.Fetcher
Fetcher(Configuration) - Constructor for class org.apache.nutch.fetcher.Fetcher
Fetcher.InputFormat - Class in org.apache.nutch.fetcher
Fetcher.InputFormat() - Constructor for class org.apache.nutch.fetcher.Fetcher.InputFormat
FetcherOutput - Class in org.apache.nutch.fetcher
FetcherOutput() - Constructor for class org.apache.nutch.fetcher.FetcherOutput
FetcherOutput(CrawlDatum, Content, ParseImpl) - Constructor for class org.apache.nutch.fetcher.FetcherOutput
FetcherOutputFormat - Class in org.apache.nutch.fetcher: Splits FetcherOutput entries into multiple map files.
FetcherOutputFormat() - Constructor for class org.apache.nutch.fetcher.FetcherOutputFormat
fetchErrors - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
FetchSchedule - Interface in org.apache.nutch.crawl: This interface defines the contract for implementations that manipulate fetch times and re-fetch intervals.
FetchScheduleFactory - Class in org.apache.nutch.crawl: Creates and caches a FetchSchedule implementation.
FIELD - Static variable in class org.creativecommons.nutch.CCIndexingFilter: The name of the document field we use.
FIELD_FILTER_ORDER - Static variable in class org.apache.nutch.indexer.field.FieldFilters
FIELD_INDEX_PREFIX - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
FIELD_PREFIX - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
FIELD_STORE_PREFIX - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
FIELD_VECTOR_PREFIX - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
FieldFilter - Interface in org.apache.nutch.indexer.field: Filter to manipulate FieldWritable objects for a given url during indexing.
FieldFilters - Class in org.apache.nutch.indexer.field: The FieldFilters class provides a standard way to collect, order, and run all FieldFilter implementations that are active in the plugin system.
FieldFilters(Configuration) - Constructor for class org.apache.nutch.indexer.field.FieldFilters: Configurable constructor.
FieldIndexer - Class in org.apache.nutch.indexer.field
FieldIndexer() - Constructor for class org.apache.nutch.indexer.field.FieldIndexer
FieldIndexer(Configuration) - Constructor for class org.apache.nutch.indexer.field.FieldIndexer
FieldIndexer.LuceneDocumentWrapper - Class in org.apache.nutch.indexer.field
FieldIndexer.LuceneDocumentWrapper(Document) - Constructor for class org.apache.nutch.indexer.field.FieldIndexer.LuceneDocumentWrapper
FieldIndexer.OutputFormat - Class in org.apache.nutch.indexer.field
FieldIndexer.OutputFormat() - Constructor for class org.apache.nutch.indexer.field.FieldIndexer.OutputFormat
FieldQueryFilter - Class in org.apache.nutch.searcher: Translate query fields to search the same-named field, as indexed by an IndexingFilter.
FieldQueryFilter(String) - Constructor for class org.apache.nutch.searcher.FieldQueryFilter: Construct for the named field.
FieldQueryFilter(String, float) - Constructor for class org.apache.nutch.searcher.FieldQueryFilter: Construct for the named field, boosting as specified.
Fields - Interface in org.apache.nutch.indexer.field
FIELDS - Static variable in class org.apache.nutch.searcher.response.SearchServlet
FieldsWritable - Class in org.apache.nutch.indexer.field: A class that holds a grouping of FieldWritable objects.
FieldsWritable() - Constructor for class org.apache.nutch.indexer.field.FieldsWritable
FieldType - Enum in org.apache.nutch.indexer.field: The different types of fields.
FieldWritable - Class in org.apache.nutch.indexer.field: A class that holds a single field of content to be placed into an index.
FieldWritable() - Constructor for class org.apache.nutch.indexer.field.FieldWritable
FieldWritable(String, String, FieldType, float) - Constructor for class org.apache.nutch.indexer.field.FieldWritable
FieldWritable(String, String, FieldType, boolean, boolean, boolean) - Constructor for class org.apache.nutch.indexer.field.FieldWritable
FieldWritable(String, String, FieldType, float, boolean, boolean, boolean) - Constructor for class org.apache.nutch.indexer.field.FieldWritable
File - Class in org.apache.nutch.protocol.file: File.java deals with file: scheme.
File() - Constructor for class org.apache.nutch.protocol.file.File
FileError - Exception in org.apache.nutch.protocol.file: Thrown for File error codes.
FileError(int) - Constructor for exception org.apache.nutch.protocol.file.FileError
FileException - Exception in org.apache.nutch.protocol.file
FileException() - Constructor for exception org.apache.nutch.protocol.file.FileException
FileException(String) - Constructor for exception org.apache.nutch.protocol.file.FileException
FileException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.file.FileException
FileException(Throwable) - Constructor for exception org.apache.nutch.protocol.file.FileException
fileExists(String) - Method in class org.apache.nutch.indexer.FsDirectory
fileLen - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
fileLength(String) - Method in class org.apache.nutch.indexer.FsDirectory
fileModified(String) - Method in class org.apache.nutch.indexer.FsDirectory
FileResponse - Class in org.apache.nutch.protocol.file: FileResponse.java mimics file replies as http response.
FileResponse(URL, CrawlDatum, File, Configuration) - Constructor for class org.apache.nutch.protocol.file.FileResponse
filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser: Scan the HTML document looking at possible indications of content language
1.
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
filter(String, Document, List<FieldWritable>) - Method in interface org.apache.nutch.indexer.field.FieldFilter: Returns the document to which fields are being added or null if we are to stop processing for this url and not add anything to the index.
filter(String, Document, List<FieldWritable>) - Method in class org.apache.nutch.indexer.field.FieldFilters: Runs all FieldFilter extensions.
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in interface org.apache.nutch.indexer.IndexingFilter: Adds fields or otherwise modifies the document that will be indexed for a parse.
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.IndexingFilters: Run all defined filters.
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.microformats.reltag.RelTagParser: Scan the HTML document looking at possible rel-tags
filter(String) - Method in interface org.apache.nutch.net.URLFilter
filter(String) - Method in class org.apache.nutch.net.URLFilters: Run all defined filters.
filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in interface org.apache.nutch.parse.HtmlParseFilter: Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.HtmlParseFilters: Run all defined filters.
filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.js.JSParseFilter
filter() - Method in class org.apache.nutch.parse.ParseResult: Remove all results where status is not successful (as determined by ParseStatus.isSuccess()).
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.FieldQueryFilter
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.more.DateQueryFilter
filter(Query, BooleanQuery) - Method in interface org.apache.nutch.searcher.QueryFilter: Adds clauses or otherwise modifies the BooleanQuery that will be searched.
filter(Query) - Method in class org.apache.nutch.searcher.QueryFilters: Run all defined filters.
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.RawFieldQueryFilter
filter(String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter(String) - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks) - Method in class org.creativecommons.nutch.CCIndexingFilter
filter(Content, ParseResult, HTMLMetaTags, DocumentFragment) - Method in class org.creativecommons.nutch.CCParseFilter: Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
FilteredStringWriter - Class in org.apache.nutch.parse.mspowerpoint: Writes to optimize ASCII output.
FilteredStringWriter() - Constructor for class org.apache.nutch.parse.mspowerpoint.FilteredStringWriter
FilteredStringWriter(int) - Constructor for class org.apache.nutch.parse.mspowerpoint.FilteredStringWriter
finalize() - Method in class org.apache.nutch.plugin.Plugin
finalize() - Method in class org.apache.nutch.plugin.PluginRepository
finalize() - Method in class org.apache.nutch.protocol.ftp.Ftp
findAuthentication(Metadata) - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
findLoops(Path) - Method in class org.apache.nutch.scoring.webgraph.Loops: Runs the various loop jobs.
forceRefetch(Text, CrawlDatum, boolean) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule: This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching.
forceRefetch(Text, CrawlDatum, boolean) - Method in interface org.apache.nutch.crawl.FetchSchedule: This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching.
FORMAT - Static variable in interface org.apache.nutch.metadata.DublinCore: Typically, Format may include the media-type or dimensions of the resource.
format - Static variable in class org.apache.nutch.net.protocols.HttpDateFormat
forName(String) - Method in class org.apache.nutch.util.MimeUtil: A facade interface to Tika's underlying MimeTypes.forName(String) method.
FreeGenerator - Class in org.apache.nutch.tools: This tool generates fetchlists (segments to be fetched) from plain text files containing one URL per line.
FreeGenerator() - Constructor for class org.apache.nutch.tools.FreeGenerator
FreeGenerator.FG - Class in org.apache.nutch.tools
FreeGenerator.FG() - Constructor for class org.apache.nutch.tools.FreeGenerator.FG
fromHexString(String) - Static method in class org.apache.nutch.util.StringUtil: Convert a String containing consecutive (no inside whitespace) hexadecimal digits into a corresponding byte array.
FsDirectory - Class in org.apache.nutch.indexer: Reads a Lucene index stored in DFS.
FsDirectory(FileSystem, Path, boolean, Configuration) - Constructor for class org.apache.nutch.indexer.FsDirectory
FSUtils - Class in org.apache.nutch.util: Utility methods for common filesystem operations.
FSUtils() - Constructor for class org.apache.nutch.util.FSUtils
Ftp - Class in org.apache.nutch.protocol.ftp: Ftp.java deals with ftp: scheme.
Ftp() - Constructor for class org.apache.nutch.protocol.ftp.Ftp
FtpError - Exception in org.apache.nutch.protocol.ftp: Thrown for Ftp error codes.
FtpError(int) - Constructor for exception org.apache.nutch.protocol.ftp.FtpError
FtpException - Exception in org.apache.nutch.protocol.ftp: Superclass for important exceptions thrown during FTP talk, that must be handled with care.
FtpException() - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpException(String) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpException(Throwable) - Constructor for exception org.apache.nutch.protocol.ftp.FtpException
FtpExceptionBadSystResponse - Exception in org.apache.nutch.protocol.ftp: Exception indicating bad reply of SYST command.
FtpExceptionCanNotHaveDataConnection - Exception in org.apache.nutch.protocol.ftp: Exception indicating failure of opening data connection.
FtpExceptionControlClosedByForcedDataClose - Exception in org.apache.nutch.protocol.ftp: Exception indicating control channel is closed by server end, due to forced closure of data channel at client (our) end.
FtpExceptionUnknownForcedDataClose - Exception in org.apache.nutch.protocol.ftp: Exception indicating unrecognizable reply from server after forced closure of data channel by client (our) side.
FtpResponse - Class in org.apache.nutch.protocol.ftp: FtpResponse.java mimics ftp replies as http response.
FtpResponse(URL, CrawlDatum, Ftp, Configuration) - Constructor for class org.apache.nutch.protocol.ftp.FtpResponse

G

generate(Path, Path, int, long, long) - Method in class org.apache.nutch.crawl.Generator
generate(Path, Path, int, long, long, boolean, boolean) - Method in class org.apache.nutch.crawl.Generator: old signature used for compatibility - does not specify whether or not to normalise and set the number of segments to 1
generate(Path, Path, int, long, long, boolean, boolean, boolean, int) - Method in class org.apache.nutch.crawl.Generator: Generate fetchlists in one or more segments.
GENERATE_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
GENERATE_MAX_PER_HOST - Static variable in class org.apache.nutch.crawl.Generator
GENERATE_MAX_PER_HOST_BY_IP - Static variable in class org.apache.nutch.crawl.Generator
GENERATE_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
GENERATE_UPDATE_CRAWLDB - Static variable in class org.apache.nutch.crawl.Generator
generated - Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
generateFileNameForKeyValue(FloatWritable, Generator.SelectorEntry, String) - Method in class org.apache.nutch.crawl.Generator.GeneratorOutputFormat
generateParseException() - Method in class org.apache.nutch.analysis.NutchAnalysis: Generate ParseException.
generateSegmentName() - Static method in class org.apache.nutch.crawl.Generator
generateSegmentName() - Static method in class org.apache.nutch.tools.arc.ArcSegmentCreator: Generates a random name for the segments.
Generator - Class in org.apache.nutch.crawl: Generates a subset of a crawl db to fetch.
Generator() - Constructor for class org.apache.nutch.crawl.Generator
Generator(Configuration) - Constructor for class org.apache.nutch.crawl.Generator
Generator.CrawlDbUpdater - Class in org.apache.nutch.crawl: Update the CrawlDB so that the next generate won't include the same URLs.
Generator.CrawlDbUpdater() - Constructor for class org.apache.nutch.crawl.Generator.CrawlDbUpdater
Generator.DecreasingFloatComparator - Class in org.apache.nutch.crawl
Generator.DecreasingFloatComparator() - Constructor for class org.apache.nutch.crawl.Generator.DecreasingFloatComparator
Generator.GeneratorOutputFormat - Class in org.apache.nutch.crawl
Generator.GeneratorOutputFormat() - Constructor for class org.apache.nutch.crawl.Generator.GeneratorOutputFormat
Generator.HashComparator - Class in org.apache.nutch.crawl: Sort fetch lists by hash of URL.
Generator.HashComparator() - Constructor for class org.apache.nutch.crawl.Generator.HashComparator
Generator.PartitionReducer - Class in org.apache.nutch.crawl
Generator.PartitionReducer() - Constructor for class org.apache.nutch.crawl.Generator.PartitionReducer
Generator.Selector - Class in org.apache.nutch.crawl: Selects entries due for fetch.
Generator.Selector() - Constructor for class org.apache.nutch.crawl.Generator.Selector
Generator.SelectorEntry - Class in org.apache.nutch.crawl
Generator.SelectorEntry() - Constructor for class org.apache.nutch.crawl.Generator.SelectorEntry
Generator.SelectorInverseMapper - Class in org.apache.nutch.crawl
Generator.SelectorInverseMapper() - Constructor for class org.apache.nutch.crawl.Generator.SelectorInverseMapper
GENERATOR_COUNT_MODE - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_COUNT_VALUE_DOMAIN - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_COUNT_VALUE_HOST - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_CUR_TIME - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_DELAY - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_FILTER - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_MAX_COUNT - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_MAX_NUM_SEGMENTS - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_MIN_SCORE - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_NORMALISE - Static variable in class org.apache.nutch.crawl.Generator
GENERATOR_TOP_N - Static variable in class org.apache.nutch.crawl.Generator
generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Use CrawlDatum.getScore().
generatorSortValue(Text, CrawlDatum, float) - Method in interface org.apache.nutch.scoring.ScoringFilter: This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
generatorSortValue(Text, CrawlDatum, float) - Method in class org.apache.nutch.scoring.ScoringFilters: Calculate a sort value for Generate.
GenericWritableConfigurable - Class in org.apache.nutch.util: A generic Writable wrapper that can inject Configuration to Configurables
GenericWritableConfigurable() - Constructor for class org.apache.nutch.util.GenericWritableConfigurable
get(Configuration) - Static method in class org.apache.nutch.analysis.AnalyzerFactory
get(String) - Method in class org.apache.nutch.analysis.AnalyzerFactory: Returns the appropriate analyzer implementation given a language code.
get(String, String, Configuration) - Method in class org.apache.nutch.crawl.CrawlDbReader
get(Writable) - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
get() - Method in class org.apache.nutch.indexer.field.FieldIndexer.LuceneDocumentWrapper
get(String) - Method in class org.apache.nutch.metadata.Metadata: Get the value associated to a metadata name.
get(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
get(String) - Method in class org.apache.nutch.parse.ParseResult: Retrieve a single parse output.
get(Text) - Method in class org.apache.nutch.parse.ParseResult: Retrieve a single parse output.
get(Configuration) - Static method in class org.apache.nutch.plugin.PluginRepository
get(ServletContext, Configuration) - Static method in class org.apache.nutch.searcher.NutchBean: Returns the cached instance in the servlet context.
get(String) - Method in class org.apache.nutch.searcher.QueryParams
get(FileSplit) - Static method in class org.apache.nutch.segment.SegmentPart: Create SegmentPart from a FileSplit.
get(String) - Static method in class org.apache.nutch.segment.SegmentPart: Create SegmentPart from a full path of a location inside any segment part.
get(Path, Text, Writer, Map<String, List<Writable>>) - Method in class org.apache.nutch.segment.SegmentReader
get(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes: Return the DomainSuffix object for the extension, if extension is a top level domain returned object will be an instance of TopLevelDomain
get(ServletContext) - Static method in class org.apache.nutch.util.NutchConfiguration: Create a Configuration for Nutch front-end.
get(Configuration) - Static method in class org.apache.nutch.util.ObjectCache
getAcceptedIssuers() - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
getAcceptLanguage() - Method in class org.apache.nutch.protocol.http.api.HttpBase: Value of "Accept-Language" request header sent by Nutch.
getAnchor() - Method in class org.apache.nutch.crawl.Inlink
getAnchor() - Method in class org.apache.nutch.parse.Outlink
getAnchor() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
getAnchors() - Method in class org.apache.nutch.crawl.Inlinks: Return the set of anchor texts.
getAnchors(Text) - Method in class org.apache.nutch.crawl.LinkDbReader
getAnchors(HitDetails) - Method in interface org.apache.nutch.searcher.HitInlinks: Returns the anchors of a hit document.
getAnchors(HitDetails) - Method in class org.apache.nutch.searcher.LinkDbInlinks
getAnchors(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
getArgs() - Method in class org.apache.nutch.parse.ParseStatus
getArgs() - Method in class org.apache.nutch.protocol.ProtocolStatus
getAttribute(String) - Method in class org.apache.nutch.plugin.Extension: Returns a attribute value, that is setuped in the manifest file and is definied by the extension point xml schema.
getAuthentication(String, Configuration) - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: This method is responsible for providing Basic authentication information.
getBase(Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: If Node contains a BASE tag then it's HREF is returned.
getBaseHref() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getBaseUrl() - Method in class org.apache.nutch.protocol.Content: The base url for relative links contained in the content.
getBasicPattern() - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Provides a pattern which can be used by an outside resource to determine if this class can provide credentials based on simple header information.
getBooleanParameter(HttpServletRequest, String) - Static method in class org.apache.nutch.searcher.response.RequestUtils
getBooleanParameter(HttpServletRequest, String, Boolean) - Static method in class org.apache.nutch.searcher.response.RequestUtils
getBoost() - Method in class org.apache.nutch.indexer.field.FieldWritable
getBoost() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
getBoost() - Method in class org.apache.nutch.util.domain.DomainSuffix
getChannels() - Method in class org.apache.nutch.parse.rss.FeedParserListenerImpl: Gets a Listof RSSChannels that the listener parsed from the RSS document.
getClassLoader() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a cached classloader for a plugin.
getClauses() - Method in class org.apache.nutch.searcher.Query: Return all clauses.
getClazz() - Method in class org.apache.nutch.plugin.Extension: Returns the full class name of the extension point implementation
getCode() - Method in interface org.apache.nutch.net.protocols.Response: Returns the response code.
getCode(int) - Method in exception org.apache.nutch.protocol.file.FileError
getCode() - Method in class org.apache.nutch.protocol.file.FileResponse: Returns the response code.
getCode(int) - Method in exception org.apache.nutch.protocol.ftp.FtpError
getCode() - Method in class org.apache.nutch.protocol.ftp.FtpResponse: Returns the response code.
getCode() - Method in class org.apache.nutch.protocol.http.HttpResponse
getCode() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getCode() - Method in class org.apache.nutch.protocol.ProtocolStatus
getCommand() - Method in class org.apache.nutch.util.CommandRunner
getComponentCapabilities() - Method in class org.apache.nutch.clustering.carrot2.NutchInputComponent: Returns the capabilities provided by this component.
getConf() - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
getConf() - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
getConf() - Method in class org.apache.nutch.analysis.lang.LanguageQueryFilter
getConf() - Method in class org.apache.nutch.analysis.NutchAnalyzer
getConf() - Method in class org.apache.nutch.clustering.carrot2.Clusterer: Implementation of Configurable
getConf() - Method in class org.apache.nutch.crawl.Signature
getConf() - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
getConf() - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
getConf() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagParser
getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagQueryFilter
getConf() - Method in class org.apache.nutch.parse.ext.ExtParser
getConf() - Method in class org.apache.nutch.parse.html.HtmlParser
getConf() - Method in class org.apache.nutch.parse.js.JSParseFilter
getConf() - Method in class org.apache.nutch.parse.ms.MSBaseParser
getConf() - Method in class org.apache.nutch.parse.oo.OOParser
getConf() - Method in class org.apache.nutch.parse.pdf.PdfParser
getConf() - Method in class org.apache.nutch.parse.rss.RSSParser
getConf() - Method in class org.apache.nutch.parse.swf.SWFParser
getConf() - Method in class org.apache.nutch.parse.text.TextParser
getConf() - Method in class org.apache.nutch.parse.zip.ZipParser
getConf() - Method in class org.apache.nutch.protocol.file.File
getConf() - Method in class org.apache.nutch.protocol.ftp.Ftp
getConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getConf() - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
getConf() - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
getConf() - Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
getConf() - Method in class org.apache.nutch.searcher.FieldQueryFilter
getConf() - Method in class org.apache.nutch.searcher.more.DateQueryFilter
getConf() - Method in class org.apache.nutch.searcher.more.TypeQueryFilter
getConf() - Method in class org.apache.nutch.searcher.Query
getConf() - Method in class org.apache.nutch.searcher.site.SiteQueryFilter
getConf() - Method in class org.apache.nutch.summary.basic.BasicSummarizer
getConf() - Method in class org.apache.nutch.summary.lucene.LuceneSummarizer
getConf() - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
getConf() - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
getConf() - Method in class org.apache.nutch.util.domain.DomainStatistics
getConf() - Method in class org.apache.nutch.util.GenericWritableConfigurable
getConf() - Method in class org.creativecommons.nutch.CCIndexingFilter
getConf() - Method in class org.creativecommons.nutch.CCParseFilter
getConf() - Method in class org.creativecommons.nutch.CCQueryFilter
getContent() - Method in class org.apache.nutch.fetcher.FetcherOutput
getContent() - Method in interface org.apache.nutch.net.protocols.Response: Returns the full content of the response.
getContent() - Method in class org.apache.nutch.protocol.Content: The binary content retrieved.
getContent() - Method in class org.apache.nutch.protocol.file.FileResponse
getContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
getContent() - Method in class org.apache.nutch.protocol.http.HttpResponse
getContent() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getContent() - Method in class org.apache.nutch.protocol.ProtocolOutput
getContent(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getContent(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
getContent(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent: Returns the content of a hit document.
getContent(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
getContentMeta() - Method in class org.apache.nutch.parse.ParseData: The original Metadata retrieved from content
getContentType() - Method in exception org.apache.nutch.parse.ParserNotFound
getContentType() - Method in class org.apache.nutch.protocol.Content: The media type of the retrieved content.
getCopyMap() - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
getCountryName() - Method in class org.apache.nutch.util.domain.TopLevelDomain: Returns the country name if TLD is Country Code TLD
getCrawlDatum() - Method in class org.apache.nutch.fetcher.FetcherOutput
getCrawlDelay() - Method in class org.apache.nutch.protocol.EmptyRobotRules
getCrawlDelay(HttpBase, URL) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
getCrawlDelay() - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Get Crawl-Delay, in milliseconds.
getCrawlDelay() - Method in interface org.apache.nutch.protocol.RobotRules: Get Crawl-Delay, in milliseconds.
getCredentials() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication: Gets the credentials generated by the HttpAuthentication object.
getCredentials() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Gets the Basic credentials generated by this HttpBasicAuthentication object
getCurrentNode() - Method in class org.apache.nutch.parse.html.DOMBuilder: Get the node currently being processed.
getData() - Method in interface org.apache.nutch.parse.Parse: Other data extracted from the page.
getData() - Method in class org.apache.nutch.parse.ParseImpl
getDebugStream(Log) - Static method in class org.apache.nutch.util.LogUtil
getDedupField() - Method in class org.apache.nutch.searcher.QueryParams
getDedupValue() - Method in class org.apache.nutch.searcher.Hit: Return the value of the field that hits should be deduplicated on.
getDefault() - Method in class org.apache.nutch.analysis.AnalyzerFactory: Method used by unit test
getDependencies() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array of plugin ids.
getDescription() - Method in class org.apache.nutch.parse.rss.structs.RSSChannel: Returns a Stringdescription of the RSS Channel.
getDescription() - Method in class org.apache.nutch.parse.rss.structs.RSSItem: Gets the Description of this RSS Item
getDescriptionLabels() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
getDescriptionLabels() - Method in interface org.apache.nutch.clustering.HitsCluster
getDescriptor() - Method in class org.apache.nutch.plugin.Extension: return the plugin descriptor.
getDescriptor() - Method in class org.apache.nutch.plugin.Plugin: Returns the plugin descriptor
getDetails(Hit) - Method in class org.apache.nutch.searcher.DistributedSearchBean
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.DistributedSearchBean
getDetails(Hit) - Method in interface org.apache.nutch.searcher.HitDetailer: Returns the details for a hit document.
getDetails(Hit[]) - Method in interface org.apache.nutch.searcher.HitDetailer: Returns the details for a set of hits.
getDetails(Hit) - Method in class org.apache.nutch.searcher.IndexSearcher
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.IndexSearcher
getDetails(Hit) - Method in class org.apache.nutch.searcher.LuceneSearchBean
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.LuceneSearchBean
getDetails(Hit) - Method in class org.apache.nutch.searcher.NutchBean
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.NutchBean
getDetails() - Method in class org.apache.nutch.searcher.response.SearchResults
getDetails(Hit) - Method in class org.apache.nutch.searcher.SolrSearchBean
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.SolrSearchBean
getDocBegin() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getDocumentMeta() - Method in class org.apache.nutch.indexer.NutchDocument
getDom(InputStream) - Static method in class org.apache.nutch.util.DomUtil: Returns parsed dom tree or null if any error
getDomain() - Method in class org.apache.nutch.util.domain.DomainSuffix
getDomainName(URL) - Static method in class org.apache.nutch.util.URLUtil: Returns the domain name of the url.
getDomainName(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the domain name of the url.
getDomainSuffix(URL) - Static method in class org.apache.nutch.util.URLUtil: Returns the DomainSuffix corresponding to the last public part of the hostname
getDomainSuffix(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the DomainSuffix corresponding to the last public part of the hostname
getEmptyParse(Configuration) - Method in class org.apache.nutch.parse.ParseStatus: A convenience method.
getEmptyParseResult(String, Configuration) - Method in class org.apache.nutch.parse.ParseStatus: A convenience method.
getEnd() - Method in class org.apache.nutch.searcher.response.SearchResults
getErrorStream(Log) - Static method in class org.apache.nutch.util.LogUtil
getExitValue() - Method in class org.apache.nutch.util.CommandRunner
getExpireTime() - Method in class org.apache.nutch.protocol.EmptyRobotRules
getExpireTime() - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Get expire time
getExpireTime() - Method in interface org.apache.nutch.protocol.RobotRules: Get expire time
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.DistributedSearchBean
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.IndexSearcher
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.LuceneSearchBean
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.NutchBean
getExplanation(Query, Hit) - Method in interface org.apache.nutch.searcher.Searcher: Return an HTML-formatted explanation of how a query scored.
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.SolrSearchBean
getExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array exported librareis as URLs
getExtensionInstance() - Method in class org.apache.nutch.plugin.Extension: Return an instance of the extension implementatio.
getExtensionPoint(String) - Method in class org.apache.nutch.plugin.PluginRepository: Returns a extension point indentified by a extension point id.
getExtensions(String) - Method in class org.apache.nutch.parse.ParserFactory: Finds the best-suited parse plugin for a given contentType.
getExtensions() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns a array of extensions that lsiten to this extension point
getExtensions() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns an array of extensions.
getExtenstionPoints() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array of extension points.
getFatalStream(Log) - Static method in class org.apache.nutch.util.LogUtil
getFetchDate(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getFetchDate(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
getFetchDate(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent: Returns the fetch date of a hit document.
getFetchDate(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
getFetchInterval() - Method in class org.apache.nutch.crawl.CrawlDatum
getFetchSchedule(Configuration) - Static method in class org.apache.nutch.crawl.FetchScheduleFactory: Return the FetchSchedule implementation.
getFetchTime() - Method in class org.apache.nutch.crawl.CrawlDatum: Returns either the time of the last fetch, or the next fetch time, depending on whether Fetcher or CrawlDbReducer set the time.
getField(String) - Method in class org.apache.nutch.indexer.field.FieldsWritable
getField(int) - Method in class org.apache.nutch.searcher.HitDetails: Returns the name of the i^th field.
getField() - Method in class org.apache.nutch.searcher.Query.Clause
getFieldNames() - Method in class org.apache.nutch.indexer.NutchDocument
getFields(String) - Method in class org.apache.nutch.indexer.field.FieldsWritable
getFields() - Method in class org.apache.nutch.searcher.response.SearchResults
getFieldsList() - Method in class org.apache.nutch.indexer.field.FieldsWritable
getFieldValue(String) - Method in class org.apache.nutch.indexer.NutchDocument
getFieldValues(String) - Method in class org.apache.nutch.indexer.NutchDocument
getFilter(TokenStream, String) - Method in class org.apache.nutch.analysis.CommonGrams: Construct a token filter that inserts n-grams for common terms.
getFragments() - Method in class org.apache.nutch.searcher.Summary: Returns an array of all of this summary's fragments.
getFromUrl() - Method in class org.apache.nutch.crawl.Inlink
getGeneralTags() - Method in class org.apache.nutch.parse.HTMLMetaTags: Returns all collected values of the general meta tags.
getHeader(String) - Method in interface org.apache.nutch.net.protocols.Response: Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.file.FileResponse: Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.ftp.FtpResponse: Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.http.HttpResponse
getHeader(String) - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getHeaders() - Method in interface org.apache.nutch.net.protocols.Response: Returns all the headers.
getHeaders() - Method in class org.apache.nutch.protocol.http.HttpResponse
getHeaders() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getHit(int) - Method in class org.apache.nutch.searcher.Hits: Returns the i^th hit in this list.
getHits() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
getHits() - Method in interface org.apache.nutch.clustering.HitsCluster
getHits(int, int) - Method in class org.apache.nutch.searcher.Hits: Returns a subset of the hit objects.
getHits() - Method in class org.apache.nutch.searcher.response.SearchResults
getHost(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the lowercased hostname for the url or null if the url is not well formed.
getHostSegments(URL) - Static method in class org.apache.nutch.util.URLUtil: Partitions of the hostname of the url by "."
getHostSegments(String) - Static method in class org.apache.nutch.util.URLUtil: Partitions of the hostname of the url by "."
getHttpEquivTags() - Method in class org.apache.nutch.parse.HTMLMetaTags: Returns all collected values of the "http-equiv" meta tags.
getId() - Method in class org.apache.nutch.clustering.carrot2.NutchDocument
getId() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
getId() - Method in class org.apache.nutch.plugin.Extension: Return the unique id of the extension.
getId() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns the unique id of the extension point.
getIndexNo() - Method in class org.apache.nutch.searcher.Hit: Return the index number that this hit came from.
getInfoStream(Log) - Static method in class org.apache.nutch.util.LogUtil
getInlinks(Text) - Method in class org.apache.nutch.crawl.LinkDbReader
getInlinks(HitDetails) - Method in interface org.apache.nutch.searcher.HitInlinks: Return the inlinks of a hit document.
getInlinks(HitDetails) - Method in class org.apache.nutch.searcher.LinkDbInlinks
getInlinks(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
getInlinkScore() - Method in class org.apache.nutch.scoring.webgraph.Node
getInstance(Configuration) - Static method in class org.apache.nutch.indexer.solr.SolrMappingReader
getInstance() - Static method in class org.apache.nutch.ontology.jena.OntologyImpl
getInstance() - Static method in class org.apache.nutch.util.domain.DomainSuffixes: Singleton instance, lazy instantination
getIntegerParameter(HttpServletRequest, String) - Static method in class org.apache.nutch.searcher.response.RequestUtils
getIntegerParameter(HttpServletRequest, String, Integer) - Static method in class org.apache.nutch.searcher.response.RequestUtils
getItems() - Method in class org.apache.nutch.parse.rss.structs.RSSChannel: Get the list of items for this channel.
getKeyMap() - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
getLang() - Method in class org.apache.nutch.searcher.response.SearchResults
getLastModified() - Method in class org.apache.nutch.protocol.ProtocolStatus
getLegalXml(String) - Static method in class org.apache.nutch.searcher.OpenSearchServlet
getLength() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getLength() - Method in class org.apache.nutch.searcher.HitDetails: Returns the number of fields contained in this.
getLength() - Method in class org.apache.nutch.searcher.Hits: Returns the number of hits included in this current listing.
getLink() - Method in class org.apache.nutch.parse.rss.structs.RSSChannel: Returns a link to the RSS Channel.
getLink() - Method in class org.apache.nutch.parse.rss.structs.RSSItem: Gets the link that this RSS Item points to.
getLinks() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
getLinkType() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
getLocations() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getLookingFor() - Method in class org.apache.nutch.scoring.webgraph.Loops.Route
getLoopSet() - Method in class org.apache.nutch.scoring.webgraph.Loops.LoopSet
getMajorCode() - Method in class org.apache.nutch.parse.ParseStatus
getMaxContent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getMaxDelays() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getMaxHitsPerDup() - Method in class org.apache.nutch.searcher.QueryParams
getMaxThreadsPerHost() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getMessage() - Method in exception org.apache.nutch.analysis.ParseException: This method has the standard behavior when this object has been created using the standard constructors.
getMessage() - Method in class org.apache.nutch.parse.ParseStatus: A convenience method.
getMessage() - Method in class org.apache.nutch.protocol.ProtocolStatus
getMeta(String) - Method in class org.apache.nutch.metadata.MetaWrapper: Get metadata.
getMeta(String) - Method in class org.apache.nutch.parse.ParseData: Get a metadata single value.
getMetaData() - Method in class org.apache.nutch.crawl.CrawlDatum: returns a MapWritable if it was set or read in @see readFields(DataInput), returns empty map in case CrawlDatum was freshly created (lazily instantiated).
getMetadata() - Method in class org.apache.nutch.metadata.MetaWrapper: Get all metadata.
getMetadata() - Method in class org.apache.nutch.protocol.Content: Other protocol-specific data.
getMetadata() - Method in class org.apache.nutch.scoring.webgraph.Node
getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.html.HTMLMetaProcessor: Sets the indicators in robotsMeta to appropriate values, based on any META tags found under the given node.
getMetaValues(String) - Method in class org.apache.nutch.metadata.MetaWrapper: Get multiple metadata.
getMimeType(String) - Method in class org.apache.nutch.util.MimeUtil: Facade interface to Tika's underlying MimeTypes.getMimeType(String) method.
getMimeType(File) - Method in class org.apache.nutch.util.MimeUtil: Facade interface to Tika's underlying MimeTypes.getMimeType(File) method.
getMinorCode() - Method in class org.apache.nutch.parse.ParseStatus
getModel() - Static method in class org.apache.nutch.ontology.jena.OntologyImpl
getModifiedTime() - Method in class org.apache.nutch.crawl.CrawlDatum
getName() - Method in class org.apache.nutch.analysis.lang.NGramProfile
getName() - Method in class org.apache.nutch.indexer.field.FieldWritable
getName() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns the name of the extension point.
getName() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the name of the plugin.
getName() - Method in class org.apache.nutch.protocol.ProtocolStatus
getNextToken() - Method in class org.apache.nutch.analysis.NutchAnalysis: Get the next Token.
getNextToken() - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager: Get the next Token.
getNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getNode() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
getNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getNormalizedName(String) - Static method in class org.apache.nutch.metadata.SpellCheckedMetadata: Get the normalized name of metadata attribute name.
getNotExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a array of libraries as URLs that are not exported by the plugin.
getNumDocs() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
getNumHits() - Method in class org.apache.nutch.searcher.QueryParams
getNumInlinks() - Method in class org.apache.nutch.scoring.webgraph.Node
getNumOutlinks() - Method in class org.apache.nutch.scoring.webgraph.Node
getNutchIndexWriters(Configuration) - Static method in class org.apache.nutch.indexer.NutchIndexWriterFactory
getObject(String) - Method in class org.apache.nutch.util.ObjectCache
getOnlineClusterer() - Method in class org.apache.nutch.clustering.OnlineClustererFactory
getOntology() - Method in class org.apache.nutch.ontology.OntologyFactory
getOutlinks(URL, ArrayList, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This method finds all anchors below the supplied DOM node, and creates appropriate Outlink records for each (relative to the supplied base URL), and adds them to the outlinks ArrayList.
getOutlinks(String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor: Extracts Outlink from given plain text.
getOutlinks(String, String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor: Extracts Outlink from given plain text and adds anchor to the extracted Outlinks
getOutlinks() - Method in class org.apache.nutch.parse.ParseData: The outlinks of the page.
getOutlinkScore() - Method in class org.apache.nutch.scoring.webgraph.Node
getOutlinkUrl() - Method in class org.apache.nutch.scoring.webgraph.Loops.Route
getPage(String) - Static method in class org.apache.nutch.util.URLUtil: Returns the page for the url.
getParams() - Method in class org.apache.nutch.searcher.Query
getParse() - Method in class org.apache.nutch.fetcher.FetcherOutput
getParse(Content) - Method in class org.apache.nutch.parse.ext.ExtParser
getParse(Content) - Method in class org.apache.nutch.parse.html.HtmlParser
getParse(Content) - Method in class org.apache.nutch.parse.js.JSParseFilter
getParse(MSExtractor, Content) - Method in class org.apache.nutch.parse.ms.MSBaseParser: Parses a Content with a specific Microsoft document extractor.
getParse(Content) - Method in class org.apache.nutch.parse.msexcel.MSExcelParser
getParse(Content) - Method in class org.apache.nutch.parse.mspowerpoint.MSPowerPointParser
getParse(Content) - Method in class org.apache.nutch.parse.msword.MSWordParser
getParse(Content) - Method in class org.apache.nutch.parse.oo.OOParser
getParse(Content) - Method in interface org.apache.nutch.parse.Parser: This method parses the given content and returns a map of <key, parse> pairs.
getParse(Content) - Method in class org.apache.nutch.parse.pdf.PdfParser
getParse(Content) - Method in class org.apache.nutch.parse.rss.RSSParser: Implementation method, parses the RSS content, and then returns a ParseImpl.
getParse(Content) - Method in class org.apache.nutch.parse.swf.SWFParser
getParse(Content) - Method in class org.apache.nutch.parse.text.TextParser: Parses plain text document.
getParse(Content) - Method in class org.apache.nutch.parse.zip.ZipParser
getParseData(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getParseData(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
getParseData(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent: Returns the ParseData of a hit document.
getParseData(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
getParseMeta() - Method in class org.apache.nutch.parse.ParseData: Other content properties.
getParser() - Static method in class org.apache.nutch.ontology.jena.OntologyImpl
getParserById(String) - Method in class org.apache.nutch.parse.ParserFactory: Function returns a Parser instance with the specified extId, representing its extension ID.
getParsers(String, String) - Method in class org.apache.nutch.parse.ParserFactory: Function returns an array of Parsers for a given content type.
getParseText(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getParseText(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
getParseText(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent: Returns the ParseText of a hit document.
getParseText(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
getPartition(FloatWritable, Writable, int) - Method in class org.apache.nutch.crawl.Generator.Selector: Partition by host / domain or IP.
getPartition(Text, Writable, int) - Method in class org.apache.nutch.crawl.URLPartitioner: Hash by domain name.
getPartition(MD5Hash, Writable, int) - Method in class org.apache.nutch.indexer.DeleteDuplicates.HashPartitioner
getPassAllFilter() - Static method in class org.apache.nutch.util.HadoopFSUtil: Returns PathFilter that passes all paths through.
getPassDirectoriesFilter(FileSystem) - Static method in class org.apache.nutch.util.HadoopFSUtil: Returns PathFilter that passes directories through.
getPaths(FileStatus[]) - Static method in class org.apache.nutch.util.HadoopFSUtil: Turns an array of FileStatus into an array of Paths.
getPermalink() - Method in class org.apache.nutch.parse.rss.structs.RSSItem: If this RSS Item points to a permanent link, then this method returns it.
getPhrase() - Method in class org.apache.nutch.searcher.Query.Clause
getPluginClass() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the fully qualified name of the class which implements the abstarct Plugin class.
getPluginDescriptor(String) - Method in class org.apache.nutch.plugin.PluginRepository: Returns the descriptor of one plugin identified by a plugin id.
getPluginDescriptors() - Method in class org.apache.nutch.plugin.PluginRepository: Returns all registed plugin descriptors.
getPluginFolder(String) - Method in class org.apache.nutch.plugin.PluginManifestParser: Return the named plugin folder.
getPluginId() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the unique identifier of the plug-in or null.
getPluginInstance(PluginDescriptor) - Method in class org.apache.nutch.plugin.PluginRepository: Returns a instance of a plugin.
getPluginPath() - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns the directory path of the plugin.
getPos() - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
getPos() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Returns the current position in the file.
getProgress() - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
getProgress() - Method in class org.apache.nutch.tools.arc.ArcRecordReader: Returns the percentage of progress in processing the file.
getProperties() - Method in class org.apache.nutch.parse.ms.MSExtractor: Get the Properties of the Microsoft document.
getProtocol(String) - Method in class org.apache.nutch.protocol.ProtocolFactory: Returns the appropriate Protocol implementation for a url.
getProtocolOutput(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.file.File
getProtocolOutput(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.ftp.Ftp
getProtocolOutput(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.http.api.HttpBase
getProtocolOutput(Text, CrawlDatum) - Method in interface org.apache.nutch.protocol.Protocol: Returns the Content for a fetchlist entry.
getProtocolVersion(String, long) - Method in class org.apache.nutch.searcher.FetchedSegments
getProtocolVersion(String, long) - Method in class org.apache.nutch.searcher.LuceneSearchBean
getProtocolVersion(String, long) - Method in class org.apache.nutch.searcher.NutchBean
getProviderName() - Method in class org.apache.nutch.plugin.PluginDescriptor
getProxyHost() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getProxyPort() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getQuery() - Method in class org.apache.nutch.searcher.response.SearchResults
getRealm() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication: Gets the realm used by the HttpAuthentication object during creation.
getRealm() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Gets the realm attribute of the HttpBasicAuthentication object.
getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat: Return each index as a split.
getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.segment.ContentAsTextInputFormat
getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.segment.SegmentMerger.ObjectInputFormat
getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.tools.arc.ArcInputFormat: Returns the RecordReader for reading the arc file.
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.fetcher.FetcherOutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.indexer.DeleteDuplicates: Write nothing.
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.indexer.field.FieldIndexer.OutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.indexer.IndexerOutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.parse.ParseOutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.segment.SegmentMerger.SegmentOutputFormat
getRecordWriter(FileSystem, JobConf, String, Progressable) - Method in class org.apache.nutch.segment.SegmentReader.TextOutputFormat
getRefresh() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getRefreshHref() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getRefreshTime() - Method in class org.apache.nutch.parse.HTMLMetaTags: A convenience method.
getRequiredSuccessorCapabilities() - Method in class org.apache.nutch.clustering.carrot2.NutchInputComponent: Returns the capabilities required from the successor component.
getResourceString(String, Locale) - Method in class org.apache.nutch.plugin.PluginDescriptor: Returns a I18N'd resource string.
getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.http.api.HttpBase
getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.http.Http
getResponse(URL, CrawlDatum, boolean) - Method in class org.apache.nutch.protocol.httpclient.Http: Fetches the url with a configured HTTP client and gets the response.
getResponseType() - Method in class org.apache.nutch.searcher.response.SearchResults
getResponseWriter(String) - Method in class org.apache.nutch.searcher.response.ResponseWriters: Return the correct ResponseWriter object for the response type.
getRetriesSinceFetch() - Method in class org.apache.nutch.crawl.CrawlDatum
getRobotRules(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.file.File
getRobotRules(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.ftp.Ftp
getRobotRules(Text, CrawlDatum) - Method in class org.apache.nutch.protocol.http.api.HttpBase
getRobotRules(Text, CrawlDatum) - Method in interface org.apache.nutch.protocol.Protocol: Retrieve robot rules applicable for this url.
getRobotRulesSet(HttpBase, Text) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
getRootNode() - Method in class org.apache.nutch.parse.html.DOMBuilder: Get the root node of the DOM being created.
getRows() - Method in class org.apache.nutch.searcher.response.SearchResults
getRulesFile(Configuration) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase: Returns the name of the file of rules to use for a particular implementation.
getRulesFile(Configuration) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
getRulesFile(Configuration) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
getSchema() - Method in class org.apache.nutch.plugin.ExtensionPoint: Returns a path to the xml schema of a extension point.
getScore() - Method in class org.apache.nutch.crawl.CrawlDatum
getScore() - Method in class org.apache.nutch.indexer.NutchDocument
getScore() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
getSegmentNames() - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getSegmentNames() - Method in class org.apache.nutch.searcher.FetchedSegments
getSegmentNames() - Method in class org.apache.nutch.searcher.NutchBean
getSegmentNames() - Method in interface org.apache.nutch.searcher.SegmentBean
getServerDelay() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getSignature() - Method in class org.apache.nutch.crawl.CrawlDatum
getSignature(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory: Return the default Signature implementation.
getSimilarity(NGramProfile) - Method in class org.apache.nutch.analysis.lang.NGramProfile: Calculate a score how well NGramProfiles match each other
getSort() - Method in class org.apache.nutch.searcher.response.SearchResults
getSorted() - Method in class org.apache.nutch.analysis.lang.NGramProfile: Return a sorted list of ngrams (sort done by 1.
getSortField() - Method in class org.apache.nutch.searcher.QueryParams
getSortValue() - Method in class org.apache.nutch.searcher.Hit: Return the value of the field that hits are sorted on.
getSplits(JobConf, int) - Method in class org.apache.nutch.fetcher.Fetcher.InputFormat: Don't split inputs, to keep things polite.
getSplits(JobConf, int) - Method in class org.apache.nutch.fetcher.OldFetcher.InputFormat: Don't split inputs, to keep things polite.
getSplits(JobConf, int) - Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat: Return each index as a split.
getSplits(JobConf, int) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat: Return each index as a split.
getStart() - Method in class org.apache.nutch.searcher.response.SearchResults
getStats(Path, SegmentReader.SegmentReaderStats) - Method in class org.apache.nutch.segment.SegmentReader
getStatus() - Method in class org.apache.nutch.crawl.CrawlDatum
getStatus() - Method in class org.apache.nutch.parse.ParseData: The status of parsing the page.
getStatus() - Method in class org.apache.nutch.protocol.ProtocolOutput
getStatus() - Method in class org.apache.nutch.util.domain.DomainSuffix
getStatusName(byte) - Static method in class org.apache.nutch.crawl.CrawlDatum
getStringParameter(HttpServletRequest, String) - Static method in class org.apache.nutch.searcher.response.RequestUtils
getStringParameter(HttpServletRequest, String, String) - Static method in class org.apache.nutch.searcher.response.RequestUtils
getSubclusters() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
getSubclusters() - Method in interface org.apache.nutch.clustering.HitsCluster
getSummaries() - Method in class org.apache.nutch.searcher.response.SearchResults
getSummarizer() - Method in class org.apache.nutch.searcher.SummarizerFactory: Get the first available Summarizer extension.
getSummary(HitDetails, Query) - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getSummary(HitDetails[], Query) - Method in class org.apache.nutch.searcher.DistributedSegmentBean
getSummary(HitDetails, Query) - Method in class org.apache.nutch.searcher.FetchedSegments
getSummary(HitDetails[], Query) - Method in class org.apache.nutch.searcher.FetchedSegments
getSummary(HitDetails, Query) - Method in interface org.apache.nutch.searcher.HitSummarizer: Returns a summary for the given hit details.
getSummary(HitDetails[], Query) - Method in interface org.apache.nutch.searcher.HitSummarizer: Returns summaries for a set of details.
getSummary(HitDetails, Query) - Method in class org.apache.nutch.searcher.NutchBean
getSummary(HitDetails[], Query) - Method in class org.apache.nutch.searcher.NutchBean
getSummary(String, Query) - Method in interface org.apache.nutch.searcher.Summarizer: Get a summary for a specified text.
getSummary(String, Query) - Method in class org.apache.nutch.summary.basic.BasicSummarizer
getSummary(String, Query) - Method in class org.apache.nutch.summary.lucene.LuceneSummarizer
getSystemName() - Method in class org.apache.nutch.protocol.ftp.Client: Fetches the system type name from the server and returns the string.
getTargetPoint() - Method in class org.apache.nutch.plugin.Extension: Returns the Id of the extension point, that is implemented by this extension.
getTerm() - Method in class org.apache.nutch.searcher.Query.Clause
getTerms() - Method in class org.apache.nutch.searcher.Query: Flattens a query into the set of text terms that it contains.
getTerms() - Method in class org.apache.nutch.searcher.Query.Phrase
getText(StringBuffer, Node, boolean) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This method takes a StringBuffer and a DOM Node, and will append all the content text found beneath the DOM node to the StringBuffer.
getText(StringBuffer, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This is a convinience method, equivalent to getText(sb, node, false).
getText() - Method in class org.apache.nutch.parse.ms.MSExtractor: Get the content text of the Microsoft document.
getText() - Method in interface org.apache.nutch.parse.Parse: The textual content of the page.
getText() - Method in class org.apache.nutch.parse.ParseImpl
getText() - Method in class org.apache.nutch.parse.ParseText
getText() - Method in class org.apache.nutch.searcher.Summary.Fragment: Returns the text of this fragment.
getTextRuns() - Method in class org.apache.nutch.parse.msword.chp.Word6CHPBinTable
getThrownError() - Method in class org.apache.nutch.util.CommandRunner
getTimeout() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getTimeout() - Method in class org.apache.nutch.util.CommandRunner
getTimestamp() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
getTitle(StringBuffer, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils: This method takes a StringBuffer and a DOM Node, and will append the content text found beneath the first title node to the StringBuffer.
getTitle() - Method in class org.apache.nutch.parse.ParseData: The title of the page.
getTitle() - Method in class org.apache.nutch.parse.rss.structs.RSSChannel: Returns the channel title
getTitle() - Method in class org.apache.nutch.parse.rss.structs.RSSItem: Get the title for this RSS Item
getToken(int) - Method in class org.apache.nutch.analysis.NutchAnalysis: Get the specific Token.
getTotal() - Method in class org.apache.nutch.searcher.Hits: Returns the total number of hits for this query.
getTotalHits() - Method in class org.apache.nutch.searcher.response.SearchResults
getToUrl() - Method in class org.apache.nutch.parse.Outlink
getTraceStream(Log) - Static method in class org.apache.nutch.util.LogUtil
getTstamp() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
getType() - Method in class org.apache.nutch.indexer.field.FieldWritable
getType() - Method in class org.apache.nutch.util.domain.TopLevelDomain
getTypes() - Method in class org.apache.nutch.crawl.NutchWritable
getUniqueKey() - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
getUniqueKey() - Method in class org.apache.nutch.searcher.Hit: Return the unique identifier of this hit within an index.
getUrl() - Method in interface org.apache.nutch.net.protocols.Response: Returns the URL used to retrieve this response.
getUrl() - Method in exception org.apache.nutch.parse.ParserNotFound
getUrl() - Method in class org.apache.nutch.protocol.Content: The url fetched.
getUrl() - Method in class org.apache.nutch.protocol.http.HttpResponse
getUrl() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
getUrl() - Method in exception org.apache.nutch.protocol.ProtocolNotFound
getUrl() - Method in class org.apache.nutch.scoring.webgraph.LinkDatum
getUrl() - Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
getUseHttp11() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getUserAgent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
getValue() - Method in class org.apache.nutch.indexer.field.FieldWritable
getValue(int) - Method in class org.apache.nutch.searcher.HitDetails: Returns the value of the i^th field.
getValue(String) - Method in class org.apache.nutch.searcher.HitDetails: Returns the value of the first field with the specified name.
getValues(String) - Method in class org.apache.nutch.metadata.Metadata: Get the values associated to a metadata name.
getValues(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
getValues(String) - Method in class org.apache.nutch.searcher.HitDetails: Returns all the values with the specified name.
getVersion() - Method in class org.apache.nutch.parse.ParseData
getVersion() - Method in class org.apache.nutch.parse.ParseStatus
getVersion() - Method in class org.apache.nutch.plugin.PluginDescriptor
getWaitForExit() - Method in class org.apache.nutch.util.CommandRunner
getWarnStream(Log) - Static method in class org.apache.nutch.util.LogUtil
getWeight() - Method in class org.apache.nutch.searcher.Query.Clause
getWriter() - Method in class org.apache.nutch.parse.html.DOMBuilder: Return null since there is no Writer for this class.
GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus: Resource is gone.
guessEncoding(Content, String) - Method in class org.apache.nutch.util.EncodingDetector: Guess the encoding with the previously specified list of clues.
GZIPUtils - Class in org.apache.nutch.util: A collection of utility methods for working on GZIPed data.
GZIPUtils() - Constructor for class org.apache.nutch.util.GZIPUtils

H

HadoopFSUtil - Class in org.apache.nutch.util
HadoopFSUtil() - Constructor for class org.apache.nutch.util.HadoopFSUtil
hasCopy(String) - Method in class org.apache.nutch.indexer.solr.SolrMappingReader
hasDbStatus(CrawlDatum) - Static method in class org.apache.nutch.crawl.CrawlDatum
hasFetchStatus(CrawlDatum) - Static method in class org.apache.nutch.crawl.CrawlDatum
hasField(String) - Method in class org.apache.nutch.indexer.field.FieldsWritable
hashCode() - Method in class org.apache.nutch.crawl.CrawlDatum
hashCode() - Method in class org.apache.nutch.crawl.Inlink
hashCode() - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
hashCode() - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
hashCode() - Method in class org.apache.nutch.searcher.Query.Clause
hashCode() - Method in class org.apache.nutch.searcher.Query
hashCode() - Method in class org.apache.nutch.searcher.Query.Phrase
hashCode() - Method in class org.apache.nutch.searcher.Query.Term
hasNext() - Method in class org.apache.nutch.util.NodeWalker: Returns true if there are more nodes on the current stack.
HighFreqTerms - Class in org.apache.nutch.indexer: Lists the most frequent terms in an index.
HighFreqTerms() - Constructor for class org.apache.nutch.indexer.HighFreqTerms
Hit - Class in org.apache.nutch.searcher: A document which matched a query in an index.
Hit() - Constructor for class org.apache.nutch.searcher.Hit
Hit(int, String) - Constructor for class org.apache.nutch.searcher.Hit
Hit(int, String, WritableComparable, String) - Constructor for class org.apache.nutch.searcher.Hit
Hit(String, WritableComparable, String) - Constructor for class org.apache.nutch.searcher.Hit
HitContent - Interface in org.apache.nutch.searcher: Service that returns the content of a hit.
HitDetailer - Interface in org.apache.nutch.searcher: Service that returns details of a hit within an index.
HitDetails - Class in org.apache.nutch.searcher: Data stored in the index for a hit.
HitDetails() - Constructor for class org.apache.nutch.searcher.HitDetails
HitDetails(String[], String[]) - Constructor for class org.apache.nutch.searcher.HitDetails: Construct from field names and values arrays.
HitDetails(String, String) - Constructor for class org.apache.nutch.searcher.HitDetails: Construct minimal details from a segment name and document number.
HitInlinks - Interface in org.apache.nutch.searcher: Service that returns information about incoming links to a hit.
Hits - Class in org.apache.nutch.searcher: A set of hits matching a query.
Hits() - Constructor for class org.apache.nutch.searcher.Hits
Hits(long, Hit[]) - Constructor for class org.apache.nutch.searcher.Hits
HitsCluster - Interface in org.apache.nutch.clustering: An interface representing a group (cluster) of related hits.
HitsClusterAdapter - Class in org.apache.nutch.clustering.carrot2: An adapter of Carrot2's RawCluster interface to HitsCluster interface.
HitsClusterAdapter(RawCluster, HitDetails[]) - Constructor for class org.apache.nutch.clustering.carrot2.HitsClusterAdapter: Creates a new adapter.
HitSummarizer - Interface in org.apache.nutch.searcher: Service that builds a summary for a hit on a query.
HOST - Static variable in interface org.apache.nutch.indexer.field.Fields
HTMLLanguageParser - Class in org.apache.nutch.analysis.lang: Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
HTMLLanguageParser() - Constructor for class org.apache.nutch.analysis.lang.HTMLLanguageParser
HTMLMetaProcessor - Class in org.apache.nutch.parse.html: Class for parsing META Directives from DOM trees.
HTMLMetaProcessor() - Constructor for class org.apache.nutch.parse.html.HTMLMetaProcessor
HTMLMetaTags - Class in org.apache.nutch.parse: This class holds the information about HTML "meta" tags extracted from a page.
HTMLMetaTags() - Constructor for class org.apache.nutch.parse.HTMLMetaTags
HtmlParseFilter - Interface in org.apache.nutch.parse: Extension point for DOM-based HTML parsers.
HTMLPARSEFILTER_ORDER - Static variable in class org.apache.nutch.parse.HtmlParseFilters
HtmlParseFilters - Class in org.apache.nutch.parse: Creates and caches HtmlParseFilter implementing plugins.
HtmlParseFilters(Configuration) - Constructor for class org.apache.nutch.parse.HtmlParseFilters
HtmlParser - Class in org.apache.nutch.parse.html
HtmlParser() - Constructor for class org.apache.nutch.parse.html.HtmlParser
Http - Class in org.apache.nutch.protocol.http
Http() - Constructor for class org.apache.nutch.protocol.http.Http
Http - Class in org.apache.nutch.protocol.httpclient: This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server.
Http() - Constructor for class org.apache.nutch.protocol.httpclient.Http: Constructs this plugin.
HttpAuthentication - Interface in org.apache.nutch.protocol.httpclient: The base level of services required for Http Authentication
HttpAuthenticationException - Exception in org.apache.nutch.protocol.httpclient: Can be used to identify problems during creation of Authentication objects.
HttpAuthenticationException() - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with null as its detail message.
HttpAuthenticationException(String) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with the specified detail message.
HttpAuthenticationException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with the specified message and cause.
HttpAuthenticationException(Throwable) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException: Constructs a new exception with the specified cause and detail message from given clause if it is not null.
HttpAuthenticationFactory - Class in org.apache.nutch.protocol.httpclient: Provides the Http protocol implementation with the ability to authenticate when prompted.
HttpAuthenticationFactory(Configuration) - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
HttpBase - Class in org.apache.nutch.protocol.http.api
HttpBase() - Constructor for class org.apache.nutch.protocol.http.api.HttpBase: Creates a new instance of HttpBase
HttpBase(Log) - Constructor for class org.apache.nutch.protocol.http.api.HttpBase: Creates a new instance of HttpBase
HttpBasicAuthentication - Class in org.apache.nutch.protocol.httpclient: Implementation of RFC 2617 Basic Authentication.
HttpBasicAuthentication(String, Configuration) - Constructor for class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication: Construct an HttpBasicAuthentication for the given challenge parameters.
HttpDateFormat - Class in org.apache.nutch.net.protocols: class to handle HTTP dates.
HttpDateFormat() - Constructor for class org.apache.nutch.net.protocols.HttpDateFormat
HttpException - Exception in org.apache.nutch.protocol.http.api
HttpException() - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpException(String) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpException(Throwable) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
HttpHeaders - Interface in org.apache.nutch.metadata: A collection of HTTP header names.
HttpResponse - Class in org.apache.nutch.protocol.http: An HTTP response.
HttpResponse(HttpBase, URL, CrawlDatum) - Constructor for class org.apache.nutch.protocol.http.HttpResponse
HttpResponse - Class in org.apache.nutch.protocol.httpclient: An HTTP response.

I

ID_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
IDENTIFIER - Static variable in interface org.apache.nutch.metadata.DublinCore: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system.
identify(String) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier: Identify language of a content.
identify(StringBuilder) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier: Identify language of a content.
identify(InputStream) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier: Identify language from input stream.
identify(InputStream, String) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier: Identify language from input stream.
ignorableWhitespace(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder: Receive notification of ignorable whitespace in element content.
in - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
incrementToken() - Method in class org.apache.nutch.analysis.NutchDocumentTokenizer: Lucene 3.0 API.
indent(PrintStream, int) - Static method in class org.apache.nutch.ontology.jena.OntologyImpl
index(Path[], Path) - Method in class org.apache.nutch.indexer.field.FieldIndexer
index(Path, Path, Path, List<Path>) - Method in class org.apache.nutch.indexer.Indexer
INDEX_NO - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
INDEX_NO_NORMS - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
INDEX_TOKENIZED - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
INDEX_UNTOKENIZED - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
Indexer - Class in org.apache.nutch.indexer: Create indexes for segments.
Indexer() - Constructor for class org.apache.nutch.indexer.Indexer
Indexer(Configuration) - Constructor for class org.apache.nutch.indexer.Indexer
IndexerMapReduce - Class in org.apache.nutch.indexer
IndexerMapReduce() - Constructor for class org.apache.nutch.indexer.IndexerMapReduce
IndexerOutputFormat - Class in org.apache.nutch.indexer
IndexerOutputFormat() - Constructor for class org.apache.nutch.indexer.IndexerOutputFormat
indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Dampen the boost value by scorePower.
indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in interface org.apache.nutch.scoring.ScoringFilter: This method calculates a Lucene document boost.
indexerScore(Text, NutchDocument, CrawlDatum, CrawlDatum, Parse, Inlinks, float) - Method in class org.apache.nutch.scoring.ScoringFilters
IndexingException - Exception in org.apache.nutch.indexer
IndexingException() - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingException(String) - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingException(String, Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingException(Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
IndexingFilter - Interface in org.apache.nutch.indexer: Extension point for indexing.
INDEXINGFILTER_ORDER - Static variable in class org.apache.nutch.indexer.IndexingFilters
IndexingFilters - Class in org.apache.nutch.indexer: Creates and caches IndexingFilter implementing plugins.
IndexingFilters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexingFilters
IndexMerger - Class in org.apache.nutch.indexer: IndexMerger creates an index for the output corresponding to a single fetcher run.
IndexMerger() - Constructor for class org.apache.nutch.indexer.IndexMerger
IndexMerger(Configuration) - Constructor for class org.apache.nutch.indexer.IndexMerger
IndexSearcher - Class in org.apache.nutch.searcher: Implements Searcher and HitDetailer for either a single merged index, or a set of indexes.
IndexSearcher(Path[], Configuration) - Constructor for class org.apache.nutch.searcher.IndexSearcher: Construct given a number of indexes.
IndexSearcher(Path, Configuration) - Constructor for class org.apache.nutch.searcher.IndexSearcher: Construct given a single merged index.
indexSolr(String, Path, Path, List<Path>) - Method in class org.apache.nutch.indexer.solr.SolrIndexer
IndexSorter - Class in org.apache.nutch.indexer: Sort a Nutch index by page score.
IndexSorter() - Constructor for class org.apache.nutch.indexer.IndexSorter
IndexSorter(Configuration) - Constructor for class org.apache.nutch.indexer.IndexSorter
infix() - Method in class org.apache.nutch.analysis.NutchAnalysis: Characters which can be used to form compound terms.
inflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils: Returns an inflated copy of the input array.
inflateBestEffort(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils: Returns an inflated copy of the input array.
inflateBestEffort(byte[], int) - Static method in class org.apache.nutch.util.DeflateUtils: Returns an inflated copy of the input array, truncated to sizeLimit bytes, if necessary.
init(Path) - Method in class org.apache.nutch.crawl.LinkDbReader
init(ServletConfig) - Method in class org.apache.nutch.searcher.OpenSearchServlet
init(ServletConfig) - Method in class org.apache.nutch.searcher.response.SearchServlet: Initializes servlet configuration default values.
init() - Method in class org.apache.nutch.servlet.Cached
init(Configuration) - Method in class org.apache.nutch.servlet.Cached
initFrom(int, int, String, String, boolean) - Method in class org.apache.nutch.searcher.QueryParams
initializeSchedule(Text, CrawlDatum) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule: Initialize fetch schedule related data.
initializeSchedule(Text, CrawlDatum) - Method in interface org.apache.nutch.crawl.FetchSchedule: Initialize fetch schedule related data.
initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level.
initialScore(Text, CrawlDatum) - Method in interface org.apache.nutch.scoring.ScoringFilter: Set an initial score for newly discovered pages.
initialScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.ScoringFilters: Calculate a new initial score, used when adding newly discovered pages.
initMRJob(Path, Path, Collection<Path>, JobConf) - Static method in class org.apache.nutch.indexer.IndexerMapReduce
inject(Path, Path) - Method in class org.apache.nutch.crawl.Injector
injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter: Set to the value defined in config, 1.0f by default.
injectedScore(Text, CrawlDatum) - Method in interface org.apache.nutch.scoring.ScoringFilter: Set an initial score for newly injected pages.
injectedScore(Text, CrawlDatum) - Method in class org.apache.nutch.scoring.ScoringFilters: Calculate a new initial score, used when injecting new pages.
Injector - Class in org.apache.nutch.crawl: This class takes a flat file of URLs and adds them to the of pages to be crawled.
Injector() - Constructor for class org.apache.nutch.crawl.Injector
Injector(Configuration) - Constructor for class org.apache.nutch.crawl.Injector
Injector.InjectMapper - Class in org.apache.nutch.crawl: Normalize and filter injected urls.
Injector.InjectMapper() - Constructor for class org.apache.nutch.crawl.Injector.InjectMapper
Injector.InjectReducer - Class in org.apache.nutch.crawl: Combine multiple new entries for a url.
Injector.InjectReducer() - Constructor for class org.apache.nutch.crawl.Injector.InjectReducer
Inlink - Class in org.apache.nutch.crawl
Inlink() - Constructor for class org.apache.nutch.crawl.Inlink
Inlink(String, String) - Constructor for class org.apache.nutch.crawl.Inlink
INLINK - Static variable in class org.apache.nutch.scoring.webgraph.LinkDatum
INLINK_DIR - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
Inlinks - Class in org.apache.nutch.crawl: A list of Inlinks.
Inlinks() - Constructor for class org.apache.nutch.crawl.Inlinks
input_stream - Variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
install(JobConf, Path) - Static method in class org.apache.nutch.crawl.CrawlDb
install(JobConf, Path) - Static method in class org.apache.nutch.crawl.LinkDb
INTER_ANCHOR_GAP - Static variable in class org.apache.nutch.analysis.NutchDocumentAnalyzer: The number of unused term positions between anchors in the anchor field.
invert(Path, Path, boolean, boolean, boolean) - Method in class org.apache.nutch.crawl.LinkDb
invert(Path, Path[], boolean, boolean, boolean) - Method in class org.apache.nutch.crawl.LinkDb
IRREGULAR_WORD - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
isAllowed(URL) - Method in class org.apache.nutch.protocol.EmptyRobotRules
isAllowed(HttpBase, URL) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
isAllowed(URL) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.
isAllowed(String) - Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet: Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
isAllowed(URL) - Method in interface org.apache.nutch.protocol.RobotRules: Returns false if the robots.txt file prohibits us from accessing the given url, or true otherwise.
isCanonical() - Method in interface org.apache.nutch.parse.Parse: Indicates if the parse is coming from a url or a sub-url
isCanonical() - Method in class org.apache.nutch.parse.ParseImpl
isClientTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
isDomainSuffix(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes: return whether the extension is a registered domain entry
isEllipsis() - Method in class org.apache.nutch.searcher.Summary.Ellipsis: Returns true.
isEllipsis() - Method in class org.apache.nutch.searcher.Summary.Fragment: Returns true iff this fragment is an ellipsis.
isEmpty() - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
isEmpty() - Method in class org.apache.nutch.parse.ParseResult: Checks whether the result is empty.
isEmpty(String) - Static method in class org.apache.nutch.util.StringUtil: Checks if a string is empty (ie is null or empty).
isField(String) - Method in class org.apache.nutch.searcher.QueryFilters
isFound() - Method in class org.apache.nutch.scoring.webgraph.Loops.Route
isHighlight() - Method in class org.apache.nutch.searcher.Summary.Fragment: Returns true iff this fragment is to be highlighted.
isHighlight() - Method in class org.apache.nutch.searcher.Summary.Highlight: Returns true.
isIndexed() - Method in class org.apache.nutch.indexer.field.FieldWritable
isJunkCluster() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
isJunkCluster() - Method in interface org.apache.nutch.clustering.HitsCluster: Returns true if this cluster constains documents that did not fit anywhere else (presentation layer may discard such clusters).
isMagic(byte[]) - Static method in class org.apache.nutch.tools.arc.ArcRecordReader: Returns true if the byte array passed matches the gzip header magic number.
isMultiValued(String) - Method in class org.apache.nutch.metadata.Metadata: Returns true if named value is multivalued.
isParsing(Configuration) - Static method in class org.apache.nutch.fetcher.Fetcher
isParsing(Configuration) - Static method in class org.apache.nutch.fetcher.OldFetcher
isPermanentFailure() - Method in class org.apache.nutch.protocol.ProtocolStatus
isPhrase() - Method in class org.apache.nutch.searcher.Query.Clause
isProhibited() - Method in class org.apache.nutch.searcher.Query.Clause
isPrunable(Query, IndexReader, int) - Method in class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker
isPrunable(Query, IndexReader, int) - Method in interface org.apache.nutch.tools.PruneIndexTool.PruneChecker: Check whether this document should be pruned.
isPrunable(Query, IndexReader, int) - Method in class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker
isRawField(String) - Method in class org.apache.nutch.searcher.QueryFilters
isRemoteVerificationEnabled() - Method in class org.apache.nutch.protocol.ftp.Client: Return whether or not verification of the remote host participating in data connections is enabled.
isRequired() - Method in class org.apache.nutch.searcher.Query.Clause
isReverse() - Method in class org.apache.nutch.searcher.QueryParams
isReverse() - Method in class org.apache.nutch.searcher.response.SearchResults
isSameDomainName(URL, URL) - Static method in class org.apache.nutch.util.URLUtil: Returns whether the given urls have the same domain name.
isSameDomainName(String, String) - Static method in class org.apache.nutch.util.URLUtil: Returns whether the given urls have the same domain name.
isServerTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
isStopWord(String) - Static method in class org.apache.nutch.analysis.NutchAnalysis: True iff word is a stop word.
isStored() - Method in class org.apache.nutch.indexer.field.FieldWritable
isStoringContent(Configuration) - Static method in class org.apache.nutch.fetcher.Fetcher
isStoringContent(Configuration) - Static method in class org.apache.nutch.fetcher.OldFetcher
isSuccess() - Method in class org.apache.nutch.parse.ParseResult: A convenience method which returns true only if all parses are successful.
isSuccess() - Method in class org.apache.nutch.parse.ParseStatus: A convenience method.
isSuccess() - Method in class org.apache.nutch.protocol.ProtocolStatus
isTokenized() - Method in class org.apache.nutch.indexer.field.FieldWritable
isTransientFailure() - Method in class org.apache.nutch.protocol.ProtocolStatus
isWhiteSpace(char) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Returns whether the specified ch conforms to the XML 1.0 definition of whitespace.
isWhiteSpace(char[], int, int) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Tell if the string is whitespace.
isWhiteSpace(StringBuffer) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Tell if the string is whitespace.
isWhiteSpace(String) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer: Tell if the string is whitespace.
isWithSummary() - Method in class org.apache.nutch.searcher.response.SearchResults
iterator() - Method in class org.apache.nutch.crawl.Inlinks
iterator() - Method in class org.apache.nutch.indexer.NutchDocument: Iterate over all fields.
iterator() - Method in class org.apache.nutch.parse.ParseResult: Iterate over all entries in the <url, Parse> map.

J

jj_nt - Variable in class org.apache.nutch.analysis.NutchAnalysis: Next token.
jjFillToken() - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
jjstrLiteralImages - Static variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager: Token literal values.
JSParseFilter - Class in org.apache.nutch.parse.js: This class is a heuristic link extractor for JavaScript files and code snippets.
JSParseFilter() - Constructor for class org.apache.nutch.parse.js.JSParseFilter

K

KEY - Static variable in class org.apache.nutch.searcher.NutchBean
keySet() - Method in class org.apache.nutch.crawl.MapWritable: Deprecated.
KEYWORDS - Static variable in interface org.apache.nutch.metadata.Office

L

LANG - Static variable in class org.apache.nutch.searcher.response.SearchServlet
LANGUAGE - Static variable in interface org.apache.nutch.metadata.DublinCore: A language of the intellectual content of the resource.
LanguageIdentifier - Class in org.apache.nutch.analysis.lang: Identify the language of a content, based on statistical analysis.
LanguageIdentifier(Configuration) - Constructor for class org.apache.nutch.analysis.lang.LanguageIdentifier: Constructs a new Language Identifier.
LanguageIndexingFilter - Class in org.apache.nutch.analysis.lang: An IndexingFilter that add a lang (language) field to the document.
LanguageIndexingFilter() - Constructor for class org.apache.nutch.analysis.lang.LanguageIndexingFilter: Constructs a new Language Indexing Filter.
LanguageQueryFilter - Class in org.apache.nutch.analysis.lang: Handles "lang:" query clauses, causing them to search the "lang" field indexed by LanguageIdentifier.
LanguageQueryFilter() - Constructor for class org.apache.nutch.analysis.lang.LanguageQueryFilter
LAST_AUTHOR - Static variable in interface org.apache.nutch.metadata.Office
LAST_MODIFIED - Static variable in interface org.apache.nutch.metadata.HttpHeaders
LAST_PRINTED - Static variable in interface org.apache.nutch.metadata.Office
LAST_SAVED - Static variable in interface org.apache.nutch.metadata.Office
leftPad(String, int) - Static method in class org.apache.nutch.util.StringUtil: Returns a copy of s padded with leading spaces so that it's length is length.
lengthNorm(String, int) - Method in class org.apache.nutch.indexer.NutchSimilarity: Normalize field by length.
LETTER - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants: RegularExpression Id.
lexStateNames - Static variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager: Lexer state names.
LICENSE_LOCATION - Static variable in interface org.apache.nutch.metadata.CreativeCommons
LICENSE_URL - Static variable in interface org.apache.nutch.metadata.CreativeCommons
LinkDatum - Class in org.apache.nutch.scoring.webgraph: A class for holding link information including the url, anchor text, a score, the timestamp of the link and a link type.
LinkDatum() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum: Default constructor, no url, timestamp, score, or link type.
LinkDatum(String) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum: Creates a LinkDatum with a given url.
LinkDatum(String, String) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum: Creates a LinkDatum with a url and an anchor text.
LinkDatum(String, String, long) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDatum
LinkDb - Class in org.apache.nutch.crawl: Maintains an inverted link map, listing incoming links for each url.
LinkDb() - Constructor for class org.apache.nutch.crawl.LinkDb
LinkDb(Configuration) - Constructor for class org.apache.nutch.crawl.LinkDb
LinkDbFilter - Class in org.apache.nutch.crawl: This class provides a way to separate the URL normalization and filtering steps from the rest of LinkDb manipulation code.
LinkDbFilter() - Constructor for class org.apache.nutch.crawl.LinkDbFilter
LinkDbInlinks - Class in org.apache.nutch.searcher
LinkDbInlinks(FileSystem, Path, Configuration) - Constructor for class org.apache.nutch.searcher.LinkDbInlinks
LinkDbMerger - Class in org.apache.nutch.crawl: This tool merges several LinkDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited URLs and links.
LinkDbMerger() - Constructor for class org.apache.nutch.crawl.LinkDbMerger
LinkDbMerger(Configuration) - Constructor for class org.apache.nutch.crawl.LinkDbMerger
LinkDbReader - Class in org.apache.nutch.crawl: .
LinkDbReader() - Constructor for class org.apache.nutch.crawl.LinkDbReader
LinkDbReader(Configuration, Path) - Constructor for class org.apache.nutch.crawl.LinkDbReader
LinkDumper - Class in org.apache.nutch.scoring.webgraph: The LinkDumper tool creates a database of node to inlink information that can be read using the nested Reader class.
LinkDumper() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper
LinkDumper.Inverter - Class in org.apache.nutch.scoring.webgraph: Inverts outlinks from the WebGraph to inlinks and attaches node information.
LinkDumper.Inverter() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
LinkDumper.LinkNode - Class in org.apache.nutch.scoring.webgraph: Bean class which holds url to node information.
LinkDumper.LinkNode() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
LinkDumper.LinkNode(String, Node) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
LinkDumper.LinkNodes - Class in org.apache.nutch.scoring.webgraph: Writable class which holds an array of LinkNode objects.
LinkDumper.LinkNodes() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
LinkDumper.LinkNodes(LinkDumper.LinkNode[]) - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
LinkDumper.Merger - Class in org.apache.nutch.scoring.webgraph: Merges LinkNode objects into a single array value per url.
LinkDumper.Merger() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
LinkDumper.Reader - Class in org.apache.nutch.scoring.webgraph: Reader class which will print out the url and all of its inlinks to system out.
LinkDumper.Reader() - Constructor for class org.apache.nutch.scoring.webgraph.LinkDumper.Reader
LinkRank - Class in org.apache.nutch.scoring.webgraph
LinkRank() - Constructor for class org.apache.nutch.scoring.webgraph.LinkRank: Default constructor.
LinkRank(Configuration) - Constructor for class org.apache.nutch.scoring.webgraph.LinkRank: Configurable constructor.
list(List<Path>, Writer) - Method in class org.apache.nutch.segment.SegmentReader
listAll() - Method in class org.apache.nutch.indexer.FsDirectory
load(InputStream) - Method in class org.apache.nutch.analysis.lang.NGramProfile: Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)
load(String[]) - Method in class org.apache.nutch.ontology.jena.OntologyImpl
load(String[]) - Method in interface org.apache.nutch.ontology.Ontology
LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
LOCK_NAME - Static variable in class org.apache.nutch.crawl.CrawlDb
LOCK_NAME - Static variable in class org.apache.nutch.crawl.LinkDb
LOCK_NAME - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
LockUtil - Class in org.apache.nutch.util: Utility methods for handling application-level locking.
LockUtil() - Constructor for class org.apache.nutch.util.LockUtil
LOG - Static variable in class org.apache.nutch.analysis.AnalyzerFactory
LOG - Static variable in class org.apache.nutch.analysis.lang.HTMLLanguageParser
LOG - Static variable in class org.apache.nutch.analysis.lang.NGramProfile
LOG - Static variable in class org.apache.nutch.clustering.OnlineClustererFactory
LOG - Static variable in class org.apache.nutch.crawl.Crawl
LOG - Static variable in class org.apache.nutch.crawl.CrawlDb
LOG - Static variable in class org.apache.nutch.crawl.CrawlDbFilter
LOG - Static variable in class org.apache.nutch.crawl.CrawlDbReader
LOG - Static variable in class org.apache.nutch.crawl.CrawlDbReducer
LOG - Static variable in class org.apache.nutch.crawl.FetchScheduleFactory
LOG - Static variable in class org.apache.nutch.crawl.Generator
LOG - Static variable in class org.apache.nutch.crawl.Injector
LOG - Static variable in class org.apache.nutch.crawl.LinkDb
LOG - Static variable in class org.apache.nutch.crawl.LinkDbFilter
LOG - Static variable in class org.apache.nutch.crawl.LinkDbReader
LOG - Static variable in class org.apache.nutch.crawl.MapWritable: Deprecated.
LOG - Static variable in class org.apache.nutch.fetcher.Fetcher
LOG - Static variable in class org.apache.nutch.fetcher.OldFetcher
LOG - Static variable in class org.apache.nutch.indexer.basic.BasicIndexingFilter
LOG - Static variable in class org.apache.nutch.indexer.field.AnchorFields
LOG - Static variable in class org.apache.nutch.indexer.field.BasicFields
LOG - Static variable in class org.apache.nutch.indexer.field.CustomFields
LOG - Static variable in class org.apache.nutch.indexer.field.FieldFilters
LOG - Static variable in class org.apache.nutch.indexer.field.FieldIndexer
LOG - Static variable in class org.apache.nutch.indexer.Indexer
LOG - Static variable in class org.apache.nutch.indexer.IndexerMapReduce
LOG - Static variable in class org.apache.nutch.indexer.IndexingFilters
LOG - Static variable in class org.apache.nutch.indexer.IndexMerger
LOG - Static variable in class org.apache.nutch.indexer.more.MoreIndexingFilter
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrIndexer
LOG - Static variable in class org.apache.nutch.indexer.solr.SolrMappingReader
LOG - Static variable in class org.apache.nutch.microformats.reltag.RelTagParser
LOG - Static variable in class org.apache.nutch.net.URLNormalizers
LOG - Static variable in class org.apache.nutch.ontology.jena.OntologyImpl
LOG - Static variable in class org.apache.nutch.ontology.OntologyFactory
LOG - Static variable in class org.apache.nutch.parse.ext.ExtParser
LOG - Static variable in class org.apache.nutch.parse.html.HtmlParser
LOG - Static variable in class org.apache.nutch.parse.js.JSParseFilter
LOG - Static variable in class org.apache.nutch.parse.ms.MSBaseParser
LOG - Static variable in class org.apache.nutch.parse.ms.MSExtractor
LOG - Static variable in class org.apache.nutch.parse.oo.OOParser
LOG - Static variable in class org.apache.nutch.parse.ParserChecker
LOG - Static variable in class org.apache.nutch.parse.ParseResult
LOG - Static variable in class org.apache.nutch.parse.ParserFactory
LOG - Static variable in class org.apache.nutch.parse.ParseSegment
LOG - Static variable in class org.apache.nutch.parse.ParseUtil
LOG - Static variable in class org.apache.nutch.parse.pdf.PdfParser
LOG - Static variable in class org.apache.nutch.parse.rss.RSSParser
LOG - Static variable in class org.apache.nutch.parse.swf.SWFParser
LOG - Static variable in class org.apache.nutch.parse.zip.ZipTextExtractor
LOG - Static variable in class org.apache.nutch.plugin.PluginDescriptor
LOG - Static variable in class org.apache.nutch.plugin.PluginManifestParser
LOG - Static variable in class org.apache.nutch.plugin.PluginRepository
LOG - Static variable in class org.apache.nutch.protocol.file.File
LOG - Static variable in class org.apache.nutch.protocol.ftp.Ftp
LOG - Static variable in class org.apache.nutch.protocol.http.api.RobotRulesParser
LOG - Static variable in class org.apache.nutch.protocol.http.Http
LOG - Static variable in class org.apache.nutch.protocol.httpclient.Http
LOG - Static variable in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
LOG - Static variable in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
LOG - Static variable in class org.apache.nutch.protocol.ProtocolFactory
LOG - Static variable in class org.apache.nutch.scoring.webgraph.LinkDumper
LOG - Static variable in class org.apache.nutch.scoring.webgraph.LinkRank
LOG - Static variable in class org.apache.nutch.scoring.webgraph.Loops
LOG - Static variable in class org.apache.nutch.scoring.webgraph.NodeDumper
LOG - Static variable in class org.apache.nutch.scoring.webgraph.ScoreUpdater
LOG - Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
LOG - Static variable in class org.apache.nutch.searcher.more.DateQueryFilter
LOG - Static variable in class org.apache.nutch.searcher.NutchBean
LOG - Static variable in class org.apache.nutch.searcher.Query
LOG - Static variable in class org.apache.nutch.searcher.response.SearchServlet
LOG - Static variable in interface org.apache.nutch.searcher.SearchBean
LOG - Static variable in class org.apache.nutch.searcher.SolrSearchBean
LOG - Static variable in class org.apache.nutch.searcher.SummarizerFactory: My logger
LOG - Static variable in class org.apache.nutch.segment.SegmentReader
LOG - Static variable in class org.apache.nutch.tools.arc.ArcRecordReader
LOG - Static variable in class org.apache.nutch.tools.arc.ArcSegmentCreator
LOG - Static variable in class org.apache.nutch.tools.compat.ReprUrlFixer
LOG - Static variable in class org.apache.nutch.tools.CrawlDBScanner
LOG - Static variable in class org.apache.nutch.tools.DmozParser
LOG - Static variable in class org.apache.nutch.tools.PruneIndexTool
LOG - Static variable in class org.apache.nutch.tools.ResolveUrls
LOG - Static variable in class org.apache.nutch.tools.SearchLoadTester
LOG - Static variable in class org.apache.nutch.util.EncodingDetector
LOG - Static variable in class org.creativecommons.nutch.CCIndexingFilter
LOG - Static variable in class org.creativecommons.nutch.CCParseFilter
LOG_STEP - Static variable in class org.apache.nutch.tools.PruneIndexTool: Log the progress every LOG_STEP number of processed documents.
logConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
logger - Static variable in class org.apache.nutch.clustering.carrot2.Clusterer
login(String, String) - Method in class org.apache.nutch.protocol.ftp.Client: Login to the FTP server using the provided username and password.
logout() - Method in class org.apache.nutch.protocol.ftp.Client: Logout of the FTP server by sending the QUIT command.
LogUtil - Class in org.apache.nutch.util: Utility class for logging.
LogUtil() - Constructor for class org.apache.nutch.util.LogUtil
longestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher: Returns the longest prefix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher: Returns the longest suffix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher: Returns the longest substring of input that is matched by a pattern in the trie, or null if no match exists.
LoopReader - Class in org.apache.nutch.scoring.webgraph: The LoopReader tool prints the loopset information for a single url.
LoopReader() - Constructor for class org.apache.nutch.scoring.webgraph.LoopReader
Loops - Class in org.apache.nutch.scoring.webgraph: The Loops job identifies cycles of loops inside of the web graph.
Loops() - Constructor for class org.apache.nutch.scoring.webgraph.Loops
Loops.Finalizer - Class in org.apache.nutch.scoring.webgraph: Finishes the Loops job by aggregating and collecting and found routes.
Loops.Finalizer() - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Finalizer: Default constructor.
Loops.Finalizer(Configuration) - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Finalizer: Configurable constructor.
Loops.Initializer - Class in org.apache.nutch.scoring.webgraph: Initializes the Loop routes.
Loops.Initializer() - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Initializer: Default constructor.
Loops.Initializer(Configuration) - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Initializer: Configurable constructor.
Loops.Looper - Class in org.apache.nutch.scoring.webgraph: Follows a route path looking for the start url of the route.
Loops.Looper() - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Looper: Default constructor.
Loops.Looper(Configuration) - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Looper: Configurable constructor.
Loops.LoopSet - Class in org.apache.nutch.scoring.webgraph: A set of loops.
Loops.LoopSet() - Constructor for class org.apache.nutch.scoring.webgraph.Loops.LoopSet
Loops.Route - Class in org.apache.nutch.scoring.webgraph: A link path or route looking to identify a link cycle.
Loops.Route() - Constructor for class org.apache.nutch.scoring.webgraph.Loops.Route
LOOPS_DIR - Static variable in class org.apache.nutch.scoring.webgraph.Loops
LUCENE_PREFIX - Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
LuceneConstants - Interface in org.apache.nutch.indexer.lucene
LuceneSearchBean - Class in org.apache.nutch.searcher
LuceneSearchBean(Configuration, Path, Path) - Constructor for class org.apache.nutch.searcher.LuceneSearchBean: Construct in a named directory.
LuceneSummarizer - Class in org.apache.nutch.summary.lucene: Implements hit summarization.
LuceneSummarizer() - Constructor for class org.apache.nutch.summary.lucene.LuceneSummarizer
LuceneWriter - Class in org.apache.nutch.indexer.lucene
LuceneWriter() - Constructor for class org.apache.nutch.indexer.lucene.LuceneWriter
LuceneWriter.INDEX - Enum in org.apache.nutch.indexer.lucene
LuceneWriter.STORE - Enum in org.apache.nutch.indexer.lucene
LuceneWriter.VECTOR - Enum in org.apache.nutch.indexer.lucene




M

m_currentNode - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Current node
m_doc - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Root document
m_docFrag - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
First node of document fragment or null if not a DocumentFragment
m_elemStack - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Vector of element nodes
m_inCData - 
Variable in class org.apache.nutch.parse.html.DOMBuilder
Flag indicating that we are processing a CData section
main(String[]) - 
Static method in class org.apache.nutch.analysis.CommonGrams
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Main method used for command line process.
main(String[]) - 
Static method in class org.apache.nutch.analysis.lang.NGramProfile
main method used for testing only
main(String[]) - 
Static method in class org.apache.nutch.analysis.NutchAnalysis
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.analysis.NutchDocumentTokenizer
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.Crawl
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.CrawlDb
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.CrawlDbMerger
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.CrawlDbReader
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.Generator
Generate a fetchlist from the crawldb.
main(String[]) - 
Static method in class org.apache.nutch.crawl.Injector
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.LinkDb
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.LinkDbMerger
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.LinkDbReader
 
main(String[]) - 
Static method in class org.apache.nutch.crawl.TextProfileSignature
 
main(String[]) - 
Static method in class org.apache.nutch.fetcher.Fetcher
Run the fetcher.
main(String[]) - 
Static method in class org.apache.nutch.fetcher.OldFetcher
Run the fetcher.
main(String[]) - 
Static method in class org.apache.nutch.indexer.DeleteDuplicates
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.field.AnchorFields
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.field.BasicFields
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.field.CustomFields
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.field.FieldIndexer
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.HighFreqTerms
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.Indexer
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.IndexMerger
Create an index for the input files in the named directory.
main(String[]) - 
Static method in class org.apache.nutch.indexer.IndexSorter
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
main(String[]) - 
Static method in class org.apache.nutch.indexer.solr.SolrIndexer
 
main(String[]) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
main(String[]) - 
Static method in class org.apache.nutch.net.URLFilterChecker
 
main(String[]) - 
Static method in class org.apache.nutch.net.URLNormalizerChecker
 
main(String[]) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
main(String[]) - 
Static method in class org.apache.nutch.parse.html.HtmlParser
 
main(String[]) - 
Static method in class org.apache.nutch.parse.js.JSParseFilter
 
main(String, MSBaseParser, String[]) - 
Static method in class org.apache.nutch.parse.ms.MSBaseParser
Main for testing.
main(String[]) - 
Static method in class org.apache.nutch.parse.msexcel.MSExcelParser
Main for testing.
main(String[]) - 
Static method in class org.apache.nutch.parse.mspowerpoint.MSPowerPointParser
Main for testing.
main(String[]) - 
Static method in class org.apache.nutch.parse.msword.MSWordParser
Main for testing.
main(String[]) - 
Static method in class org.apache.nutch.parse.oo.OOParser
 
main(String[]) - 
Static method in class org.apache.nutch.parse.ParseData
 
main(String[]) - 
Static method in class org.apache.nutch.parse.ParserChecker
 
main(String[]) - 
Static method in class org.apache.nutch.parse.ParseSegment
 
main(String[]) - 
Static method in class org.apache.nutch.parse.ParseText
 
main(String[]) - 
Static method in class org.apache.nutch.parse.rss.RSSParser
 
main(String[]) - 
Static method in class org.apache.nutch.parse.swf.SWFParser
Arguments are: 0.
main(String[]) - 
Static method in class org.apache.nutch.plugin.PluginRepository
Loads all necessary dependencies for a selected plugin, and then runs one
 of the classes' main() method.
main(String[]) - 
Static method in class org.apache.nutch.protocol.Content
 
main(String[]) - 
Static method in class org.apache.nutch.protocol.file.File
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.protocol.ftp.Ftp
For debugging.
main(HttpBase, String[]) - 
Static method in class org.apache.nutch.protocol.http.api.HttpBase
 
main(String[]) - 
Static method in class org.apache.nutch.protocol.http.api.RobotRulesParser
command-line main for testing
main(String[]) - 
Static method in class org.apache.nutch.protocol.http.Http
 
main(String[]) - 
Static method in class org.apache.nutch.protocol.httpclient.Http
Main method.
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.LinkDumper
 
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.LinkDumper.Reader
 
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.LinkRank
 
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.LoopReader
Runs the LoopReader tool.
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.Loops
 
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.NodeDumper
 
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.NodeReader
Runs the NodeReader tool.
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
 
main(String[]) - 
Static method in class org.apache.nutch.scoring.webgraph.WebGraph
 
main(String[]) - 
Static method in class org.apache.nutch.searcher.DistributedSearch.IndexServer
Runs a lucene search server.
main(String[]) - 
Static method in class org.apache.nutch.searcher.DistributedSearch.SegmentServer
Runs a summary server.
main(String[]) - 
Static method in class org.apache.nutch.searcher.DistributedSearch.Server
 
main(String[]) - 
Static method in class org.apache.nutch.searcher.NutchBean
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.searcher.Query
For debugging.
main(String[]) - 
Static method in class org.apache.nutch.segment.SegmentMerger
 
main(String[]) - 
Static method in class org.apache.nutch.segment.SegmentReader
 
main(String[]) - 
Static method in class org.apache.nutch.summary.basic.BasicSummarizer
Tests Summary-generation.
main(String[]) - 
Static method in class org.apache.nutch.tools.arc.ArcSegmentCreator
 
main(String[]) - 
Static method in class org.apache.nutch.tools.compat.CrawlDbConverter
 
main(String[]) - 
Static method in class org.apache.nutch.tools.compat.ReprUrlFixer
Runs The ReprUrlFixer.
main(String[]) - 
Static method in class org.apache.nutch.tools.CrawlDBScanner
 
main(String[]) - 
Static method in class org.apache.nutch.tools.DmozParser
Command-line access.
main(String[]) - 
Static method in class org.apache.nutch.tools.FreeGenerator
 
main(String[]) - 
Static method in class org.apache.nutch.tools.PruneIndexTool
 
main(String[]) - 
Static method in class org.apache.nutch.tools.ResolveUrls
Runs the resolve urls tool.
main(String[]) - 
Static method in class org.apache.nutch.tools.SearchLoadTester
 
main(RegexURLFilterBase, String[]) - 
Static method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Filter the standard input using a RegexURLFilterBase.
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
main(String[]) - 
Static method in class org.apache.nutch.util.CommandRunner
 
main(String[]) - 
Static method in class org.apache.nutch.util.domain.DomainStatistics
 
main(String[]) - 
Static method in class org.apache.nutch.util.EncodingDetector
 
main(String[]) - 
Static method in class org.apache.nutch.util.PrefixStringMatcher
 
main(String[]) - 
Static method in class org.apache.nutch.util.StringUtil
 
main(String[]) - 
Static method in class org.apache.nutch.util.SuffixStringMatcher
 
main(String[]) - 
Static method in class org.apache.nutch.util.URLUtil
For testing
main(String[]) - 
Static method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete duplicates in the indexes in the named directory.
majorCodes - 
Static variable in class org.apache.nutch.parse.ParseStatus
 
makeIOException(SolrServerException) - 
Static method in class org.apache.nutch.indexer.solr.SolrWriter
 
makeLock(String) - 
Method in class org.apache.nutch.indexer.FsDirectory
 
map(Text, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbFilter
 
map(Text, CrawlDatum, OutputCollector<Text, LongWritable>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatMapper
 
map(Text, CrawlDatum, OutputCollector<FloatWritable, Text>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNMapper
 
map(Text, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.Generator.CrawlDbUpdater
 
map(Text, CrawlDatum, OutputCollector<FloatWritable, Generator.SelectorEntry>, Reporter) - 
Method in class org.apache.nutch.crawl.Generator.Selector
Select & invert subset due for fetch.
map(FloatWritable, Generator.SelectorEntry, OutputCollector<Text, Generator.SelectorEntry>, Reporter) - 
Method in class org.apache.nutch.crawl.Generator.SelectorInverseMapper
 
map(WritableComparable, Text, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.Injector.InjectMapper
 
map(Text, ParseData, OutputCollector<Text, Inlinks>, Reporter) - 
Method in class org.apache.nutch.crawl.LinkDb
 
map(Text, Inlinks, OutputCollector<Text, Inlinks>, Reporter) - 
Method in class org.apache.nutch.crawl.LinkDbFilter
 
map(WritableComparable, Writable, OutputCollector<Text, IntWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates
Map [*,IndexDoc] pairs to [index,doc] pairs.
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.AnchorFields.Collector
Wraps values in ObjectWritable
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.AnchorFields.Extractor
Wraps values in ObjectWritable
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.BasicFields.Flipper
Breaks out the collection of fields for url and redirects if necessary.
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.BasicFields.Scorer
Wraps values in ObjectWritable.
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.CustomFields.Collector
 
map(LongWritable, Text, OutputCollector<Text, FieldWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.CustomFields.Converter
 
map(Text, Writable, OutputCollector<Text, FieldWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.FieldIndexer
 
map(Text, Writable, OutputCollector<Text, NutchWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.IndexerMapReduce
 
map(WritableComparable, Content, OutputCollector<Text, ParseImpl>, Reporter) - 
Method in class org.apache.nutch.parse.ParseSegment
 
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
Wraps all values in ObjectWritables.
map(Text, Loops.Route, OutputCollector<Text, Loops.Route>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Finalizer
Maps out and found routes, those will be the link cycles.
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Initializer
Wraps values in ObjectWritable.
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Looper
Wrap values in ObjectWritable.
map(Text, Node, OutputCollector<FloatWritable, Text>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
Outputs the url with the appropriate number of inlinks, outlinks, or for
 score.
map(Text, Writable, OutputCollector<Text, ObjectWritable>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
Changes input into ObjectWritables.
map(Text, Writable, OutputCollector<Text, LinkDatum>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
Passes through existing LinkDatum objects from an existing OutlinkDb and
 maps out new LinkDatum objects from new crawls ParseData.
map(Text, MetaWrapper, OutputCollector<Text, MetaWrapper>, Reporter) - 
Method in class org.apache.nutch.segment.SegmentMerger
 
map(WritableComparable, Writable, OutputCollector<Text, NutchWritable>, Reporter) - 
Method in class org.apache.nutch.segment.SegmentReader.InputCompatMapper
 
map(Text, BytesWritable, OutputCollector<Text, NutchWritable>, Reporter) - 
Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
Runs the Map job to translate an arc record into output for Nutch 
 segments.
map(WritableComparable, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.tools.compat.CrawlDbConverter
 
map(Text, CrawlDatum, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.tools.CrawlDBScanner
 
map(WritableComparable, Text, OutputCollector<Text, Generator.SelectorEntry>, Reporter) - 
Method in class org.apache.nutch.tools.FreeGenerator.FG
 
map(Text, CrawlDatum, OutputCollector<Text, LongWritable>, Reporter) - 
Method in class org.apache.nutch.util.domain.DomainStatistics
 
mapCopyKey(String) - 
Method in class org.apache.nutch.indexer.solr.SolrMappingReader
 
mapKey(String) - 
Method in class org.apache.nutch.indexer.solr.SolrMappingReader
 
MapWritable - Class in org.apache.nutch.crawl
Deprecated. Use org.apache.hadoop.io.MapWritable instead.
MapWritable() - 
Constructor for class org.apache.nutch.crawl.MapWritable
Deprecated.  
MapWritable(MapWritable) - 
Constructor for class org.apache.nutch.crawl.MapWritable
Deprecated. Copy constructor.
match(String) - 
Method in class org.apache.nutch.urlfilter.api.RegexRule
Checks if a url matches this rule.
matchChar(TrieStringMatcher.TrieNode, String, int) - 
Method in class org.apache.nutch.util.TrieStringMatcher
Returns the next TrieStringMatcher.TrieNode visited, given that you are at
 node, and the the next character in the input is 
 the idx'th character of s.
matches(String) - 
Method in class org.apache.nutch.util.PrefixStringMatcher
Returns true if the given String is matched by a
 prefix in the trie
matches(String) - 
Method in class org.apache.nutch.util.SuffixStringMatcher
Returns true if the given String is matched by a
 suffix in the trie
matches(String) - 
Method in class org.apache.nutch.util.TrieStringMatcher
Returns true if the given String is matched by a
 pattern in the trie
maxContent - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The length limit for downloaded content, in bytes.
maxCrawlDelay - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
Skip page if Crawl-Delay longer than this value.
maxDelays - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The number of times a thread will delay when trying to fetch a page.
maxInterval - 
Variable in class org.apache.nutch.crawl.AbstractFetchSchedule
 
maxThreadsPerHost - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The maximum number of threads that should be allowed
 to access a host at one time.
MD5Signature - Class in org.apache.nutch.crawl
Default implementation of a page signature.
MD5Signature() - 
Constructor for class org.apache.nutch.crawl.MD5Signature
 
merge(Path, Path[], boolean, boolean) - 
Method in class org.apache.nutch.crawl.CrawlDbMerger
 
merge(Path, Path[], boolean, boolean) - 
Method in class org.apache.nutch.crawl.LinkDbMerger
 
merge(Path[], Path, Path) - 
Method in class org.apache.nutch.indexer.IndexMerger
Merge all input indexes to the single output index
merge(Path, Path[], boolean, boolean, long) - 
Method in class org.apache.nutch.segment.SegmentMerger
 
Metadata - Class in org.apache.nutch.metadata
A multi-valued metadata container.
Metadata() - 
Constructor for class org.apache.nutch.metadata.Metadata
Constructs a new, empty metadata.
MetaWrapper - Class in org.apache.nutch.metadata
This is a simple decorator that adds metadata to any Writable-s that can be
 serialized by NutchWritable.
MetaWrapper() - 
Constructor for class org.apache.nutch.metadata.MetaWrapper
 
MetaWrapper(Writable, Configuration) - 
Constructor for class org.apache.nutch.metadata.MetaWrapper
 
MetaWrapper(Metadata, Writable, Configuration) - 
Constructor for class org.apache.nutch.metadata.MetaWrapper
 
MIME_TYPE - 
Static variable in class org.apache.nutch.parse.msexcel.MSExcelParser
Associated Mime type for Excel files
 (application/vnd.ms-excel).
MIME_TYPE - 
Static variable in class org.apache.nutch.parse.mspowerpoint.MSPowerPointParser
Associated Mime type for PowerPoint files
 (application/vnd.ms-powerpoint).
MIME_TYPE - 
Static variable in class org.apache.nutch.parse.msword.MSWordParser
Associated Mime type for Word files
 (application/msword).
MimeUtil - Class in org.apache.nutch.util
 
MimeUtil(Configuration) - 
Constructor for class org.apache.nutch.util.MimeUtil
 
MIN_CONFIDENCE_KEY - 
Static variable in class org.apache.nutch.util.EncodingDetector
 
MINUS - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
MissingDependencyException - Exception in org.apache.nutch.plugin
MissingDependencyException will be thrown if a plugin
 dependency cannot be found.
MissingDependencyException(Throwable) - 
Constructor for exception org.apache.nutch.plugin.MissingDependencyException
 
MissingDependencyException(String) - 
Constructor for exception org.apache.nutch.plugin.MissingDependencyException
 
MODIFIED - 
Static variable in interface org.apache.nutch.metadata.DublinCore
Date on which the resource was changed.
moreFromDupExcluded() - 
Method in class org.apache.nutch.searcher.Hit
True if other, lower-scoring, hits with the same dedup value have been
 excluded from the list which contains this hit..
MoreIndexingFilter - Class in org.apache.nutch.indexer.more
Add (or reset) a few metaData properties as respective fields
 (if they are available), so that they can be displayed by more.jsp
 (called by search.jsp).
MoreIndexingFilter() - 
Constructor for class org.apache.nutch.indexer.more.MoreIndexingFilter
 
MOVED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource has moved permanently.
MSBaseParser - Class in org.apache.nutch.parse.ms
A generic Microsoft document parser.
MSBaseParser() - 
Constructor for class org.apache.nutch.parse.ms.MSBaseParser
 
MSExcelParser - Class in org.apache.nutch.parse.msexcel
An Excel document parser.
MSExcelParser() - 
Constructor for class org.apache.nutch.parse.msexcel.MSExcelParser
 
MSExtractor - Class in org.apache.nutch.parse.ms
Defines a Microsoft document content extractor.
MSExtractor() - 
Constructor for class org.apache.nutch.parse.ms.MSExtractor
Constructs a new Microsoft document extractor.
MSPowerPointParser - Class in org.apache.nutch.parse.mspowerpoint
Nutch-Parser for parsing MS PowerPoint slides ( mime type:
 application/vnd.ms-powerpoint).
MSPowerPointParser() - 
Constructor for class org.apache.nutch.parse.mspowerpoint.MSPowerPointParser
 
MSWordParser - Class in org.apache.nutch.parse.msword
Parser for mime type application/msword.
MSWordParser() - 
Constructor for class org.apache.nutch.parse.msword.MSWordParser
 



N

names() - 
Method in class org.apache.nutch.metadata.Metadata
Returns an array of the names contained in the metadata.
next(Text, DeleteDuplicates.IndexDoc) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates.InputFormat.DDRecordReader
 
next(Text, BytesWritable) - 
Method in class org.apache.nutch.tools.arc.ArcRecordReader
Returns true if the next record in the split is read into the key and 
 value pair.
nextNode() - 
Method in class org.apache.nutch.util.NodeWalker
Returns the next Node on the stack and pushes all of its
 children onto the stack, allowing us to walk the node tree without the
 use of recursion.
NGramProfile - Class in org.apache.nutch.analysis.lang
This class runs a ngram analysis over submitted text, results might be used
 for automatic language identifiaction.
NGramProfile(String, int, int) - 
Constructor for class org.apache.nutch.analysis.lang.NGramProfile
Construct a new ngram profile
NO_THRESHOLD - 
Static variable in class org.apache.nutch.util.EncodingDetector
 
Node - Class in org.apache.nutch.scoring.webgraph
A class which holds the number of inlinks and outlinks for a given url along
 with an inlink score from a link analysis program and any metadata.
Node() - 
Constructor for class org.apache.nutch.scoring.webgraph.Node
 
NODE_DIR - 
Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
 
nodeChar - 
Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
NodeDumper - Class in org.apache.nutch.scoring.webgraph
A tools that dumps out the top urls by number of inlinks, number of outlinks,
 or by score, to a text file.
NodeDumper() - 
Constructor for class org.apache.nutch.scoring.webgraph.NodeDumper
 
NodeDumper.Sorter - Class in org.apache.nutch.scoring.webgraph
Outputs the top urls sorted in descending order.
NodeDumper.Sorter() - 
Constructor for class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
 
NodeReader - Class in org.apache.nutch.scoring.webgraph
Reads and prints to system out information for a single node from the NodeDb 
 in the WebGraph.
NodeReader() - 
Constructor for class org.apache.nutch.scoring.webgraph.NodeReader
 
NodeWalker - Class in org.apache.nutch.util
A utility class that allows the walking of any DOM tree using a stack 
 instead of recursion.
NodeWalker(Node) - 
Constructor for class org.apache.nutch.util.NodeWalker
Starts the Node tree from the root node.
nonOpInfix() - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Parse infix characters except plus and minus.
nonOpOrTerm() - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Parse anything but a term or an operator (plur or minus or quote).
nonTerm() - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Parse anything but a term or a quote.
nonTermOrEOF() - 
Method in class org.apache.nutch.analysis.NutchAnalysis
 
normalize() - 
Method in class org.apache.nutch.analysis.lang.NGramProfile
Normalize the profile (calculates the ngrams frequencies)
normalize(String, String) - 
Method in interface org.apache.nutch.net.URLNormalizer
 
normalize(String, String) - 
Method in class org.apache.nutch.net.URLNormalizers
Normalize
NOTFETCHING - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Not fetching.
NOTFOUND - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource was not found.
NOTMODIFIED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Unchanged since the last fetch.
NOTPARSED - 
Static variable in class org.apache.nutch.parse.ParseStatus
Parsing was not performed.
NUM_DUPES - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
numTerms - 
Static variable in class org.apache.nutch.indexer.HighFreqTerms
 
Nutch - Interface in org.apache.nutch.metadata
A collection of Nutch internal metadata constants.
NUTCH_INPUT_HIT_DETAILS_ARRAY - 
Static variable in class org.apache.nutch.clustering.carrot2.NutchInputComponent
 
NUTCH_INPUT_SUMMARIES_ARRAY - 
Static variable in class org.apache.nutch.clustering.carrot2.NutchInputComponent
 
NutchAnalysis - Class in org.apache.nutch.analysis
The JavaCC-generated Nutch lexical analyzer and query parser.
NutchAnalysis(String, Analyzer) - 
Constructor for class org.apache.nutch.analysis.NutchAnalysis
Constructs a nutch analysis.
NutchAnalysis(CharStream) - 
Constructor for class org.apache.nutch.analysis.NutchAnalysis
Constructor with user supplied CharStream.
NutchAnalysis(NutchAnalysisTokenManager) - 
Constructor for class org.apache.nutch.analysis.NutchAnalysis
Constructor with generated Token Manager.
NutchAnalysisConstants - Interface in org.apache.nutch.analysis
Token literal values and constants.
NutchAnalysisTokenManager - Class in org.apache.nutch.analysis
Token Manager.
NutchAnalysisTokenManager(Reader) - 
Constructor for class org.apache.nutch.analysis.NutchAnalysisTokenManager
Constructs a token manager for the provided Reader.
NutchAnalysisTokenManager(CharStream) - 
Constructor for class org.apache.nutch.analysis.NutchAnalysisTokenManager
Constructor.
NutchAnalysisTokenManager(CharStream, int) - 
Constructor for class org.apache.nutch.analysis.NutchAnalysisTokenManager
Constructor.
NutchAnalyzer - Class in org.apache.nutch.analysis
Extension point for analysis.
NutchAnalyzer() - 
Constructor for class org.apache.nutch.analysis.NutchAnalyzer
 
NutchBean - Class in org.apache.nutch.searcher
One stop shopping for search-related functionality.
NutchBean(Configuration) - 
Constructor for class org.apache.nutch.searcher.NutchBean
 
NutchBean(Configuration, Path) - 
Constructor for class org.apache.nutch.searcher.NutchBean
Construct in a named directory.
NutchBean.NutchBeanConstructor - Class in org.apache.nutch.searcher
Responsible for constructing a NutchBean singleton instance and
  caching it in the servlet context.
NutchBean.NutchBeanConstructor() - 
Constructor for class org.apache.nutch.searcher.NutchBean.NutchBeanConstructor
 
NutchConfiguration - Class in org.apache.nutch.util
Utility to create Hadoop Configurations that include Nutch-specific
 resources.
NutchDocument - Class in org.apache.nutch.clustering.carrot2
An adapter class that implements RawDocument required for Carrot2.
NutchDocument(int, HitDetails, String, String) - 
Constructor for class org.apache.nutch.clustering.carrot2.NutchDocument
Creates a new document with the given id, summary and wrapping
 a details hit details.
NutchDocument - Class in org.apache.nutch.indexer
A NutchDocument is the unit of indexing.
NutchDocument() - 
Constructor for class org.apache.nutch.indexer.NutchDocument
 
NutchDocumentAnalyzer - Class in org.apache.nutch.analysis
The analyzer used for Nutch documents.
NutchDocumentAnalyzer(Configuration) - 
Constructor for class org.apache.nutch.analysis.NutchDocumentAnalyzer
 
NutchDocumentTokenizer - Class in org.apache.nutch.analysis
The tokenizer used for Nutch document text.
NutchDocumentTokenizer(Reader) - 
Constructor for class org.apache.nutch.analysis.NutchDocumentTokenizer
Construct a tokenizer for the text in a Reader.
nutchFetchIntervalMDName - 
Static variable in class org.apache.nutch.crawl.Injector
metadata key reserved for setting a custom fetchInterval for a specific URL
NutchIndexWriter - Interface in org.apache.nutch.indexer
 
NutchIndexWriterFactory - Class in org.apache.nutch.indexer
 
NutchIndexWriterFactory() - 
Constructor for class org.apache.nutch.indexer.NutchIndexWriterFactory
 
NutchInputComponent - Class in org.apache.nutch.clustering.carrot2
An input component that ignores the query passed from the
 controller and instead looks for data stored in the request context.
NutchInputComponent(String) - 
Constructor for class org.apache.nutch.clustering.carrot2.NutchInputComponent
Creates an input component with the given default language code.
NutchJob - Class in org.apache.nutch.util
A JobConf for Nutch jobs.
NutchJob(Configuration) - 
Constructor for class org.apache.nutch.util.NutchJob
 
nutchScoreMDName - 
Static variable in class org.apache.nutch.crawl.Injector
metadata key reserved for setting a custom score for a specific URL
NutchSimilarity - Class in org.apache.nutch.indexer
Similarity implementatation used by Nutch indexing and search.
NutchSimilarity() - 
Constructor for class org.apache.nutch.indexer.NutchSimilarity
 
NutchWritable - Class in org.apache.nutch.crawl
 
NutchWritable() - 
Constructor for class org.apache.nutch.crawl.NutchWritable
 
NutchWritable(Writable) - 
Constructor for class org.apache.nutch.crawl.NutchWritable
 



O

ObjectCache - Class in org.apache.nutch.util
 
Office - Interface in org.apache.nutch.metadata
A collection of "Office" documents properties names.
OldFetcher - Class in org.apache.nutch.fetcher
The fetcher.
OldFetcher() - 
Constructor for class org.apache.nutch.fetcher.OldFetcher
 
OldFetcher(Configuration) - 
Constructor for class org.apache.nutch.fetcher.OldFetcher
 
OldFetcher.InputFormat - Class in org.apache.nutch.fetcher
 
OldFetcher.InputFormat() - 
Constructor for class org.apache.nutch.fetcher.OldFetcher.InputFormat
 
onChannel(FeedParserState, String, String, String) - 
Method in class org.apache.nutch.parse.rss.FeedParserListenerImpl

 Callback method when the parser encounters an RSS Channel.
onItem(FeedParserState, String, String, String, String) - 
Method in class org.apache.nutch.parse.rss.FeedParserListenerImpl

 Callback method when the parser encounters an RSS Item.
OnlineClusterer - Interface in org.apache.nutch.clustering
An extension point interface for online search results clustering
 algorithms.
OnlineClustererFactory - Class in org.apache.nutch.clustering
A factory for retrieving OnlineClusterer extensions.
OnlineClustererFactory(Configuration) - 
Constructor for class org.apache.nutch.clustering.OnlineClustererFactory
Create an instance of the clustering factory bound to
 a given configuration.
Ontology - Interface in org.apache.nutch.ontology
 
OntologyFactory - Class in org.apache.nutch.ontology
A factory for retrieving Ontology extensions.
OntologyFactory(Configuration) - 
Constructor for class org.apache.nutch.ontology.OntologyFactory
 
OntologyImpl - Class in org.apache.nutch.ontology.jena
this class wraps about a model, 
 built from a list of ontologies,
 uses HP's Jena
OntologyImpl() - 
Constructor for class org.apache.nutch.ontology.jena.OntologyImpl
 
OOParser - Class in org.apache.nutch.parse.oo
Parser for OpenOffice and OpenDocument formats.
OOParser() - 
Constructor for class org.apache.nutch.parse.oo.OOParser
 
open(JobConf, String) - 
Method in class org.apache.nutch.indexer.lucene.LuceneWriter
 
open(JobConf, String) - 
Method in interface org.apache.nutch.indexer.NutchIndexWriter
 
open(JobConf, String) - 
Method in class org.apache.nutch.indexer.solr.SolrWriter
 
openInput(String) - 
Method in class org.apache.nutch.indexer.FsDirectory
 
OpenSearchServlet - Class in org.apache.nutch.searcher
Present search results using A9's OpenSearch extensions to RSS, plus a few
 Nutch-specific extensions.
OpenSearchServlet() - 
Constructor for class org.apache.nutch.searcher.OpenSearchServlet
 
OPICScoringFilter - Class in org.apache.nutch.scoring.opic
This plugin implements a variant of an Online Page Importance Computation
 (OPIC) score, described in this paper:
 
 Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003),
 Adaptive On-Line Page Importance Computation
 .
OPICScoringFilter() - 
Constructor for class org.apache.nutch.scoring.opic.OPICScoringFilter
 
optimizePhrase(Query.Phrase, String) - 
Method in class org.apache.nutch.analysis.CommonGrams
Optimizes phrase queries to use n-grams when possible.
org.apache.nutch.analysis - package org.apache.nutch.analysis
Tokenizer for documents and query parser.
org.apache.nutch.analysis.lang - package org.apache.nutch.analysis.lang
Text document language identifier.
org.apache.nutch.clustering - package org.apache.nutch.clustering
 
org.apache.nutch.clustering.carrot2 - package org.apache.nutch.clustering.carrot2
 
org.apache.nutch.crawl - package org.apache.nutch.crawl
Crawl control code.
org.apache.nutch.fetcher - package org.apache.nutch.fetcher
The Nutch robot.
org.apache.nutch.html - package org.apache.nutch.html
 
org.apache.nutch.indexer - package org.apache.nutch.indexer
Maintain Lucene full-text indexes.
org.apache.nutch.indexer.basic - package org.apache.nutch.indexer.basic
A basic indexing plugin.
org.apache.nutch.indexer.field - package org.apache.nutch.indexer.field
 
org.apache.nutch.indexer.lucene - package org.apache.nutch.indexer.lucene
 
org.apache.nutch.indexer.more - package org.apache.nutch.indexer.more
A more indexing plugin.
org.apache.nutch.indexer.solr - package org.apache.nutch.indexer.solr
 
org.apache.nutch.metadata - package org.apache.nutch.metadata
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
org.apache.nutch.microformats.reltag - package org.apache.nutch.microformats.reltag

A microformats Rel-Tag
Parser/Indexer/Querier plugin.
org.apache.nutch.net - package org.apache.nutch.net
 
org.apache.nutch.net.protocols - package org.apache.nutch.net.protocols
 
org.apache.nutch.ontology - package org.apache.nutch.ontology
 
org.apache.nutch.ontology.jena - package org.apache.nutch.ontology.jena
 
org.apache.nutch.parse - package org.apache.nutch.parse
 
org.apache.nutch.parse.ext - package org.apache.nutch.parse.ext
 
org.apache.nutch.parse.html - package org.apache.nutch.parse.html
An HTML document parsing plugin.
org.apache.nutch.parse.js - package org.apache.nutch.parse.js
 
org.apache.nutch.parse.ms - package org.apache.nutch.parse.ms
Common API for Microsoft © documents parsing.
org.apache.nutch.parse.msexcel - package org.apache.nutch.parse.msexcel
A Microsoft © Excel document parsing plugin.
org.apache.nutch.parse.mspowerpoint - package org.apache.nutch.parse.mspowerpoint
A Microsoft © PowerPoint document parsing plugin.
org.apache.nutch.parse.msword - package org.apache.nutch.parse.msword
A Microsoft © Word document parsing plugin.
org.apache.nutch.parse.msword.chp - package org.apache.nutch.parse.msword.chp
 
org.apache.nutch.parse.oo - package org.apache.nutch.parse.oo
 
org.apache.nutch.parse.pdf - package org.apache.nutch.parse.pdf
A pdf parsing plugin.
org.apache.nutch.parse.rss - package org.apache.nutch.parse.rss
 
org.apache.nutch.parse.rss.structs - package org.apache.nutch.parse.rss.structs
 
org.apache.nutch.parse.swf - package org.apache.nutch.parse.swf
 
org.apache.nutch.parse.text - package org.apache.nutch.parse.text
A plain text parsing plugin.
org.apache.nutch.parse.zip - package org.apache.nutch.parse.zip
 
org.apache.nutch.plugin - package org.apache.nutch.plugin
The Nutch Plugin System.
org.apache.nutch.protocol - package org.apache.nutch.protocol
 
org.apache.nutch.protocol.file - package org.apache.nutch.protocol.file
Protocol plugin which supports retrieving local file resources.
org.apache.nutch.protocol.ftp - package org.apache.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol.
org.apache.nutch.protocol.http - package org.apache.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol.
org.apache.nutch.protocol.http.api - package org.apache.nutch.protocol.http.api
Common API used by HTTP plugins (http,
httpclient)
org.apache.nutch.protocol.httpclient - package org.apache.nutch.protocol.httpclient
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
org.apache.nutch.scoring - package org.apache.nutch.scoring
 
org.apache.nutch.scoring.opic - package org.apache.nutch.scoring.opic
 
org.apache.nutch.scoring.webgraph - package org.apache.nutch.scoring.webgraph
 
org.apache.nutch.searcher - package org.apache.nutch.searcher
Search API
org.apache.nutch.searcher.basic - package org.apache.nutch.searcher.basic
 
org.apache.nutch.searcher.more - package org.apache.nutch.searcher.more
A more query plugin.
org.apache.nutch.searcher.response - package org.apache.nutch.searcher.response
 
org.apache.nutch.searcher.site - package org.apache.nutch.searcher.site
 
org.apache.nutch.searcher.url - package org.apache.nutch.searcher.url
 
org.apache.nutch.segment - package org.apache.nutch.segment
 
org.apache.nutch.servlet - package org.apache.nutch.servlet
 
org.apache.nutch.summary.basic - package org.apache.nutch.summary.basic

A basic summarizer implementation.
org.apache.nutch.summary.lucene - package org.apache.nutch.summary.lucene

A Lucene Highlighter based summarizer implementation.
org.apache.nutch.tools - package org.apache.nutch.tools
 
org.apache.nutch.tools.arc - package org.apache.nutch.tools.arc
 
org.apache.nutch.tools.compat - package org.apache.nutch.tools.compat
 
org.apache.nutch.urlfilter.api - package org.apache.nutch.urlfilter.api
 
org.apache.nutch.urlfilter.automaton - package org.apache.nutch.urlfilter.automaton

A url filter plugin based on
dk.brics.automaton Finite-State
Automata for Java^TM.
org.apache.nutch.urlfilter.prefix - package org.apache.nutch.urlfilter.prefix
A url filter plugin.
org.apache.nutch.urlfilter.regex - package org.apache.nutch.urlfilter.regex
A url filter plugin.
org.apache.nutch.util - package org.apache.nutch.util
 
org.apache.nutch.util.domain - package org.apache.nutch.util.domain
 org.apache.nutch.util.domain
org.creativecommons.nutch - package org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
ORIG_URL - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
ORIGINAL_CHAR_ENCODING - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
Outlink - Class in org.apache.nutch.parse
 
Outlink() - 
Constructor for class org.apache.nutch.parse.Outlink
 
Outlink(String, String) - 
Constructor for class org.apache.nutch.parse.Outlink
 
OUTLINK - 
Static variable in class org.apache.nutch.scoring.webgraph.LinkDatum
 
OUTLINK_DIR - 
Static variable in class org.apache.nutch.scoring.webgraph.WebGraph
 
OutlinkExtractor - Class in org.apache.nutch.parse
Extractor to extract Outlinks 
 / URLs from plain text using Regular Expressions.
OutlinkExtractor() - 
Constructor for class org.apache.nutch.parse.OutlinkExtractor
 
OwlParser - Class in org.apache.nutch.ontology.jena
implementation of parser for w3c's OWL files
OwlParser() - 
Constructor for class org.apache.nutch.ontology.jena.OwlParser
 



P

PAGE_COUNT - 
Static variable in interface org.apache.nutch.metadata.Office
 
parameterExists(HttpServletRequest, String) - 
Static method in class org.apache.nutch.searcher.response.RequestUtils
 
parse(Configuration) - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Parse a query.
parse(OntModel) - 
Method in class org.apache.nutch.ontology.jena.OwlParser
parse owl ontology files using jena
parse(OntModel) - 
Method in interface org.apache.nutch.ontology.jena.Parser
 
Parse - Interface in org.apache.nutch.parse
The result of parsing a page's raw content.
parse(Path) - 
Method in class org.apache.nutch.parse.ParseSegment
 
parse(Content) - 
Method in class org.apache.nutch.parse.ParseUtil
Performs a parse by iterating through a List of preferred Parsers
 until a successful parse is performed and a Parse object is
 returned.
parse(String, String, Configuration) - 
Static method in class org.apache.nutch.searcher.Query
Parse a query from a string using a language specific analyzer.
parse(String, Configuration) - 
Static method in class org.apache.nutch.searcher.Query
Parse a query from a string.
parse(String) - 
Static method in class org.apache.nutch.segment.SegmentPart
Create SegmentPart from a String in format "segmentName/partName".
PARSE_DIR_NAME - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
 
parseByExtensionId(String, Content) - 
Method in class org.apache.nutch.parse.ParseUtil
Method parses a Content object using the Parser specified
 by the parameter extId, i.e., the Parser's extension ID.
parseCharacterEncoding(String) - 
Static method in class org.apache.nutch.util.EncodingDetector
Parse the character encoding from the specified content type header.
parseClass(OntClass, List, int) - 
Method in class org.apache.nutch.ontology.jena.OwlParser
 
parsed - 
Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
 
ParseData - Class in org.apache.nutch.parse
Data extracted from a page's content.
ParseData() - 
Constructor for class org.apache.nutch.parse.ParseData
 
ParseData(ParseStatus, String, Outlink[], Metadata) - 
Constructor for class org.apache.nutch.parse.ParseData
 
ParseData(ParseStatus, String, Outlink[], Metadata, Metadata) - 
Constructor for class org.apache.nutch.parse.ParseData
 
parseDmozFile(File, int, boolean, int, Pattern) - 
Method in class org.apache.nutch.tools.DmozParser
Iterate through all the items in this structured DMOZ file.
parseErrors - 
Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
 
ParseException - Exception in org.apache.nutch.analysis
This exception is thrown when parse errors are encountered.
ParseException(Token, int[][], String[]) - 
Constructor for exception org.apache.nutch.analysis.ParseException
This constructor is used by the method "generateParseException"
 in the generated parser.
ParseException() - 
Constructor for exception org.apache.nutch.analysis.ParseException
The following constructors are for use by you for whatever
 purpose you can think of.
ParseException(String) - 
Constructor for exception org.apache.nutch.analysis.ParseException
Constructor with message.
ParseException - Exception in org.apache.nutch.parse
 
ParseException() - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseException(String) - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseException(String, Throwable) - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseException(Throwable) - 
Constructor for exception org.apache.nutch.parse.ParseException
 
ParseImpl - Class in org.apache.nutch.parse
The result of parsing a page's raw content.
ParseImpl() - 
Constructor for class org.apache.nutch.parse.ParseImpl
 
ParseImpl(Parse) - 
Constructor for class org.apache.nutch.parse.ParseImpl
 
ParseImpl(String, ParseData) - 
Constructor for class org.apache.nutch.parse.ParseImpl
 
ParseImpl(ParseText, ParseData) - 
Constructor for class org.apache.nutch.parse.ParseImpl
 
ParseImpl(ParseText, ParseData, boolean) - 
Constructor for class org.apache.nutch.parse.ParseImpl
 
ParseOutputFormat - Class in org.apache.nutch.parse
 
ParseOutputFormat() - 
Constructor for class org.apache.nutch.parse.ParseOutputFormat
 
parsePluginFolder(String[]) - 
Method in class org.apache.nutch.plugin.PluginManifestParser
Returns a list of all found plugin descriptors.
parseQueries(InputStream) - 
Static method in class org.apache.nutch.tools.PruneIndexTool
Read a list of Lucene queries from the stream (UTF-8 encoding is assumed).
parseQuery(String, Configuration) - 
Static method in class org.apache.nutch.analysis.NutchAnalysis
Construct a query parser for the text in a reader.
parseQuery(String, Analyzer, Configuration) - 
Static method in class org.apache.nutch.analysis.NutchAnalysis
Construct a query parser for the text in a reader.
Parser - Interface in org.apache.nutch.ontology.jena
interface for the parser
Parser - Interface in org.apache.nutch.parse
A parser for content generated by a Protocol
 implementation.
ParserChecker - Class in org.apache.nutch.parse
Parser checker, useful for testing parser.
ParserChecker() - 
Constructor for class org.apache.nutch.parse.ParserChecker
 
ParseResult - Class in org.apache.nutch.parse
A utility class that stores result of a parse.
ParseResult(String) - 
Constructor for class org.apache.nutch.parse.ParseResult
Create a container for parse results.
ParserFactory - Class in org.apache.nutch.parse
Creates and caches Parser plugins.
ParserFactory(Configuration) - 
Constructor for class org.apache.nutch.parse.ParserFactory
 
ParserNotFound - Exception in org.apache.nutch.parse
 
ParserNotFound(String) - 
Constructor for exception org.apache.nutch.parse.ParserNotFound
 
ParserNotFound(String, String) - 
Constructor for exception org.apache.nutch.parse.ParserNotFound
 
ParserNotFound(String, String, String) - 
Constructor for exception org.apache.nutch.parse.ParserNotFound
 
ParseSegment - Class in org.apache.nutch.parse
 
ParseSegment() - 
Constructor for class org.apache.nutch.parse.ParseSegment
 
ParseSegment(Configuration) - 
Constructor for class org.apache.nutch.parse.ParseSegment
 
ParseStatus - Class in org.apache.nutch.parse
 
ParseStatus() - 
Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, int, String[]) - 
Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int) - 
Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, String[]) - 
Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, int) - 
Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, int, String) - 
Constructor for class org.apache.nutch.parse.ParseStatus
Simplified constructor for passing just a text message.
ParseStatus(int, String) - 
Constructor for class org.apache.nutch.parse.ParseStatus
Simplified constructor for passing just a text message.
ParseStatus(Throwable) - 
Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseText - Class in org.apache.nutch.parse
 
ParseText() - 
Constructor for class org.apache.nutch.parse.ParseText
 
ParseText(String) - 
Constructor for class org.apache.nutch.parse.ParseText
 
ParseUtil - Class in org.apache.nutch.parse
A Utility class containing methods to simply perform parsing utilities such
 as iterating through a preferred list of Parsers to obtain
 Parse objects.
ParseUtil(Configuration) - 
Constructor for class org.apache.nutch.parse.ParseUtil
 
PARTITION_MODE_DOMAIN - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_MODE_HOST - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_MODE_IP - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
PARTITION_MODE_KEY - 
Static variable in class org.apache.nutch.crawl.URLPartitioner
 
partName - 
Variable in class org.apache.nutch.segment.SegmentPart
Name of the segment part (ie.
passScoreAfterParsing(Text, Content, Parse) - 
Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData.
passScoreAfterParsing(Text, Content, Parse) - 
Method in interface org.apache.nutch.scoring.ScoringFilter
Currently a part of score distribution is performed using only data coming
 from the parsing process.
passScoreAfterParsing(Text, Content, Parse) - 
Method in class org.apache.nutch.scoring.ScoringFilters
 
passScoreBeforeParsing(Text, CrawlDatum, Content) - 
Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
Store a float value of CrawlDatum.getScore() under Fetcher.SCORE_KEY.
passScoreBeforeParsing(Text, CrawlDatum, Content) - 
Method in interface org.apache.nutch.scoring.ScoringFilter
This method takes all relevant score information from the current datum
 (coming from a generated fetchlist) and stores it into
 Content metadata.
passScoreBeforeParsing(Text, CrawlDatum, Content) - 
Method in class org.apache.nutch.scoring.ScoringFilters
 
PasswordProtectedException - Exception in org.apache.nutch.parse.msword
 
PasswordProtectedException(String) - 
Constructor for exception org.apache.nutch.parse.msword.PasswordProtectedException
 
PdfParser - Class in org.apache.nutch.parse.pdf
parser for mime type application/pdf.
PdfParser() - 
Constructor for class org.apache.nutch.parse.pdf.PdfParser
 
PERM_REFRESH_TIME - 
Static variable in class org.apache.nutch.fetcher.Fetcher
 
PERM_REFRESH_TIME - 
Static variable in class org.apache.nutch.fetcher.OldFetcher
 
phrase(String) - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Parse an explcitly quoted phrase query.
ping() - 
Method in class org.apache.nutch.searcher.DistributedSearchBean
 
ping() - 
Method in class org.apache.nutch.searcher.LuceneSearchBean
 
ping() - 
Method in class org.apache.nutch.searcher.NutchBean
 
ping() - 
Method in interface org.apache.nutch.searcher.SearchBean
 
ping() - 
Method in class org.apache.nutch.searcher.SolrSearchBean
 
Pluggable - Interface in org.apache.nutch.plugin
Defines the capability of a class to be plugged into Nutch.
Plugin - Class in org.apache.nutch.plugin
A nutch-plugin is an container for a set of custom logic that provide
 extensions to the nutch core functionality or another plugin that provides an
 API for extending.
Plugin(PluginDescriptor, Configuration) - 
Constructor for class org.apache.nutch.plugin.Plugin
Constructor
PluginClassLoader - Class in org.apache.nutch.plugin
The PluginClassLoader contains only classes of the runtime
 libraries setuped in the plugin manifest file and exported libraries of
 plugins that are required pluguin.
PluginClassLoader(URL[], ClassLoader) - 
Constructor for class org.apache.nutch.plugin.PluginClassLoader
Construtor
PluginDescriptor - Class in org.apache.nutch.plugin
The PluginDescriptor provide access to all meta information of
 a nutch-plugin, as well to the internationalizable resources and the plugin
 own classloader.
PluginDescriptor(String, String, String, String, String, String, Configuration) - 
Constructor for class org.apache.nutch.plugin.PluginDescriptor
Constructor
PluginManifestParser - Class in org.apache.nutch.plugin
The PluginManifestParser parser just parse the manifest file
 in all plugin directories.
PluginManifestParser(Configuration, PluginRepository) - 
Constructor for class org.apache.nutch.plugin.PluginManifestParser
 
PluginRepository - Class in org.apache.nutch.plugin
The plugin repositority is a registry of all plugins.
PluginRepository(Configuration) - 
Constructor for class org.apache.nutch.plugin.PluginRepository
 
PluginRuntimeException - Exception in org.apache.nutch.plugin
PluginRuntimeException will be thrown until a exception in the
 plugin managemnt occurs.
PluginRuntimeException(Throwable) - 
Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
 
PluginRuntimeException(String) - 
Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
 
PLUS - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
pos - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
PrefixStringMatcher - Class in org.apache.nutch.util
A class for efficiently matching Strings against a set
 of prefixes.
PrefixStringMatcher(String[]) - 
Constructor for class org.apache.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any prefix in the supplied array.
PrefixStringMatcher(Collection) - 
Constructor for class org.apache.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any prefix in the supplied    
 Collection.
PrefixURLFilter - Class in org.apache.nutch.urlfilter.prefix
Filters URLs based on a file of URL prefixes.
PrefixURLFilter() - 
Constructor for class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
PrefixURLFilter(String) - 
Constructor for class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
PrintCommandListener - Class in org.apache.nutch.protocol.ftp
This is a support class for logging all ftp command/reply traffic.
PrintCommandListener(Log) - 
Constructor for class org.apache.nutch.protocol.ftp.PrintCommandListener
 
processDeflateEncoded(byte[], URL) - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
processDumpJob(String, String, Configuration, int) - 
Method in class org.apache.nutch.crawl.CrawlDbReader
 
processDumpJob(String, String) - 
Method in class org.apache.nutch.crawl.LinkDbReader
 
processGzipEncoded(byte[], URL) - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
processingInstruction(String, String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of a processing instruction.
processStatJob(String, Configuration, boolean) - 
Method in class org.apache.nutch.crawl.CrawlDbReader
 
processTopNJob(String, long, float, String, Configuration) - 
Method in class org.apache.nutch.crawl.CrawlDbReader
 
PROTO_NOT_FOUND - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
This protocol was not found.
PROTO_STATUS_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
Protocol - Interface in org.apache.nutch.protocol
A retriever of url content.
PROTOCOL_REDIR - 
Static variable in class org.apache.nutch.fetcher.Fetcher
 
PROTOCOL_REDIR - 
Static variable in class org.apache.nutch.fetcher.OldFetcher
 
protocolCommandSent(ProtocolCommandEvent) - 
Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
 
ProtocolException - Exception in org.apache.nutch.net.protocols
Deprecated. Use ProtocolException instead.
ProtocolException() - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException(String) - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException(String, Throwable) - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException(Throwable) - 
Constructor for exception org.apache.nutch.net.protocols.ProtocolException
Deprecated.  
ProtocolException - Exception in org.apache.nutch.protocol
 
ProtocolException() - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolException(String) - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolException(String, Throwable) - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolException(Throwable) - 
Constructor for exception org.apache.nutch.protocol.ProtocolException
 
ProtocolFactory - Class in org.apache.nutch.protocol
Creates and caches Protocol plugins.
ProtocolFactory(Configuration) - 
Constructor for class org.apache.nutch.protocol.ProtocolFactory
 
ProtocolNotFound - Exception in org.apache.nutch.protocol
 
ProtocolNotFound(String) - 
Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
 
ProtocolNotFound(String, String) - 
Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
 
ProtocolOutput - Class in org.apache.nutch.protocol
Simple aggregate to pass from protocol plugins both content and
 protocol status.
ProtocolOutput(Content, ProtocolStatus) - 
Constructor for class org.apache.nutch.protocol.ProtocolOutput
 
ProtocolOutput(Content) - 
Constructor for class org.apache.nutch.protocol.ProtocolOutput
 
protocolReplyReceived(ProtocolCommandEvent) - 
Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
 
ProtocolStatus - Class in org.apache.nutch.protocol
 
ProtocolStatus() - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, String[]) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, String[], long) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, long) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, Object) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, Object, long) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(Throwable) - 
Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
proxyHost - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The proxy hostname.
proxyPort - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The proxy port.
PruneIndexTool - Class in org.apache.nutch.tools
This tool prunes existing Nutch indexes of unwanted content.
PruneIndexTool(File[], Query[], PruneIndexTool.PruneChecker[], boolean, boolean) - 
Constructor for class org.apache.nutch.tools.PruneIndexTool
Create an instance of the tool, and open all input indexes.
PruneIndexTool.PrintFieldsChecker - Class in org.apache.nutch.tools
This checker's main function is just to print out
 selected field values from each document, just before
 they are deleted.
PruneIndexTool.PrintFieldsChecker(PrintStream, String[]) - 
Constructor for class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
PruneIndexTool.PruneChecker - Interface in org.apache.nutch.tools
This interface can be used to implement additional checking on matching
 documents.
PruneIndexTool.StoreUrlsChecker - Class in org.apache.nutch.tools
This checker's main function is just to store
 the URLs of each document to be deleted in a text file.
PruneIndexTool.StoreUrlsChecker(File, boolean) - 
Constructor for class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker
Store the list in a file
PUBLISHER - 
Static variable in interface org.apache.nutch.metadata.DublinCore
An entity responsible for making the resource available.
put(Writable, Writable) - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
put(Text, ParseText, ParseData) - 
Method in class org.apache.nutch.parse.ParseResult
Store a result of parsing.
put(String, ParseText, ParseData) - 
Method in class org.apache.nutch.parse.ParseResult
Store a result of parsing.
put(String, String) - 
Method in class org.apache.nutch.searcher.QueryParams
 
putAll(MapWritable) - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
putAllMetaData(CrawlDatum) - 
Method in class org.apache.nutch.crawl.CrawlDatum
Add all metadata from other CrawlDatum to this CrawlDatum.



Q

Query - Class in org.apache.nutch.searcher
A Nutch query.
Query() - 
Constructor for class org.apache.nutch.searcher.Query
 
Query(Configuration) - 
Constructor for class org.apache.nutch.searcher.Query
 
QUERY - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
Query.Clause - Class in org.apache.nutch.searcher
A query clause.
Query.Clause(Query.Term, String, boolean, boolean, Configuration) - 
Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Clause(Query.Term, boolean, boolean, Configuration) - 
Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, String, boolean, boolean, Configuration) - 
Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, boolean, boolean, Configuration) - 
Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Phrase - Class in org.apache.nutch.searcher
A phrase query clause.
Query.Phrase(Query.Term[]) - 
Constructor for class org.apache.nutch.searcher.Query.Phrase
 
Query.Phrase(String[]) - 
Constructor for class org.apache.nutch.searcher.Query.Phrase
 
Query.Term - Class in org.apache.nutch.searcher
A single-term query clause.
Query.Term(String) - 
Constructor for class org.apache.nutch.searcher.Query.Term
 
QueryException - Exception in org.apache.nutch.searcher
 
QueryException(String) - 
Constructor for exception org.apache.nutch.searcher.QueryException
 
QueryFilter - Interface in org.apache.nutch.searcher
Extension point for query translation.
QueryFilters - Class in org.apache.nutch.searcher
Creates and caches QueryFilter implementing plugins.
QueryFilters(Configuration) - 
Constructor for class org.apache.nutch.searcher.QueryFilters
 
QueryParams - Class in org.apache.nutch.searcher
Query context object that describes the context of the query.
QueryParams() - 
Constructor for class org.apache.nutch.searcher.QueryParams
 
QueryParams(int, int, String, String, boolean) - 
Constructor for class org.apache.nutch.searcher.QueryParams
 
QUOTE - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.



R

RawFieldQueryFilter - Class in org.apache.nutch.searcher
Translate raw query fields to search the same-named field, as indexed by an
 IndexingFilter.
RawFieldQueryFilter(String) - 
Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, float) - 
Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, boolean) - 
Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
RawFieldQueryFilter(String, boolean, float) - 
Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
rdfidToLabel(String) - 
Method in class org.apache.nutch.ontology.jena.OwlParser
 
read(DataInput) - 
Static method in class org.apache.nutch.crawl.CrawlDatum
 
read(DataInput) - 
Static method in class org.apache.nutch.crawl.Inlink
 
read(DataInput) - 
Static method in class org.apache.nutch.parse.Outlink
 
read(DataInput) - 
Static method in class org.apache.nutch.parse.ParseData
 
read(DataInput) - 
Static method in class org.apache.nutch.parse.ParseImpl
 
read(DataInput) - 
Static method in class org.apache.nutch.parse.ParseStatus
 
read(DataInput) - 
Static method in class org.apache.nutch.parse.ParseText
 
read(DataInput) - 
Static method in class org.apache.nutch.protocol.Content
 
read(DataInput) - 
Static method in class org.apache.nutch.protocol.ProtocolStatus
 
read(DataInput) - 
Static method in class org.apache.nutch.searcher.HitDetails
Constructs, reads and returns an instance.
read(DataInput, Configuration) - 
Static method in class org.apache.nutch.searcher.Query.Clause
 
read(DataInput) - 
Static method in class org.apache.nutch.searcher.Query.Phrase
 
read(DataInput, Configuration) - 
Static method in class org.apache.nutch.searcher.Query
 
read(DataInput) - 
Static method in class org.apache.nutch.searcher.Query.Term
 
read(DataInput) - 
Static method in class org.apache.nutch.searcher.Summary
 
readAddresses(Path, Configuration) - 
Static method in class org.apache.nutch.searcher.NutchBean
 
readConfig(Path, Configuration) - 
Static method in class org.apache.nutch.searcher.NutchBean
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.Generator.SelectorEntry
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.Inlink
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.Inlinks
 
readFields(DataInput) - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
readFields(DataInput) - 
Method in class org.apache.nutch.fetcher.FetcherOutput
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.field.FieldIndexer.LuceneDocumentWrapper
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.field.FieldsWritable
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
 
readFields(DataInput) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
readFields(DataInput) - 
Method in class org.apache.nutch.metadata.Metadata
 
readFields(DataInput) - 
Method in class org.apache.nutch.metadata.MetaWrapper
 
readFields(DataInput) - 
Method in class org.apache.nutch.parse.Outlink
 
readFields(DataInput) - 
Method in class org.apache.nutch.parse.ParseData
 
readFields(DataInput) - 
Method in class org.apache.nutch.parse.ParseImpl
 
readFields(DataInput) - 
Method in class org.apache.nutch.parse.ParseStatus
 
readFields(DataInput) - 
Method in class org.apache.nutch.parse.ParseText
 
readFields(DataInput) - 
Method in class org.apache.nutch.protocol.Content
 
readFields(DataInput) - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.LoopSet
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Route
 
readFields(DataInput) - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
readFields(DataInput) - 
Method in class org.apache.nutch.searcher.Hit
 
readFields(DataInput) - 
Method in class org.apache.nutch.searcher.HitDetails
 
readFields(DataInput) - 
Method in class org.apache.nutch.searcher.Hits
 
readFields(DataInput) - 
Method in class org.apache.nutch.searcher.Query
 
readFields(DataInput) - 
Method in class org.apache.nutch.searcher.QueryParams
 
readFields(DataInput) - 
Method in class org.apache.nutch.searcher.Summary
 
readFields(DataInput) - 
Method in class org.apache.nutch.util.GenericWritableConfigurable
 
readSolrDocument(SolrDocument) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
readUrl(String, String, Configuration) - 
Method in class org.apache.nutch.crawl.CrawlDbReader
 
REDIR_EXCEEDED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Too many redirects.
reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
 
reduce(Text, Iterator<LongWritable>, OutputCollector<Text, LongWritable>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
 
reduce(Text, Iterator<LongWritable>, OutputCollector<Text, LongWritable>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
 
reduce(FloatWritable, Iterator<Text>, OutputCollector<FloatWritable, Text>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
 
reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.CrawlDbReducer
 
reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.Generator.CrawlDbUpdater
 
reduce(Text, Iterator<Generator.SelectorEntry>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.Generator.PartitionReducer
 
reduce(FloatWritable, Iterator<Generator.SelectorEntry>, OutputCollector<FloatWritable, Generator.SelectorEntry>, Reporter) - 
Method in class org.apache.nutch.crawl.Generator.Selector
Collect until limit is reached.
reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.crawl.Injector.InjectReducer
 
reduce(Text, Iterator<Inlinks>, OutputCollector<Text, Inlinks>, Reporter) - 
Method in class org.apache.nutch.crawl.LinkDbMerger
 
reduce(MD5Hash, Iterator<DeleteDuplicates.IndexDoc>, OutputCollector<Text, DeleteDuplicates.IndexDoc>, Reporter) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates.HashReducer
 
reduce(Text, Iterator<IntWritable>, OutputCollector<WritableComparable, Writable>, Reporter) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates
Delete docs named in values from index named in key.
reduce(Text, Iterator<DeleteDuplicates.IndexDoc>, OutputCollector<MD5Hash, DeleteDuplicates.IndexDoc>, Reporter) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates.UrlsReducer
 
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, FieldWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.AnchorFields.Collector
Aggregates and sorts inlinks.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, LinkDatum>, Reporter) - 
Method in class org.apache.nutch.indexer.field.AnchorFields.Extractor
Extracts and inverts outlinks, ignores empty anchors.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, LinkDatum>, Reporter) - 
Method in class org.apache.nutch.indexer.field.BasicFields.Flipper
Collects redirect and original links for a given url key.
reduce(Text, Iterator<FieldsWritable>, OutputCollector<Text, FieldsWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.BasicFields.Merger
Collects the most recent set of fields for any url.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, FieldsWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.BasicFields.Scorer
Sets a document boost field from the NodeDb and determines the best 
 scoring url for pages that have rediects.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, FieldWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.CustomFields.Collector
 
reduce(Text, Iterator<FieldWritable>, OutputCollector<Text, FieldWritable>, Reporter) - 
Method in class org.apache.nutch.indexer.field.CustomFields.Converter
 
reduce(Text, Iterator<FieldWritable>, OutputCollector<Text, FieldIndexer.LuceneDocumentWrapper>, Reporter) - 
Method in class org.apache.nutch.indexer.field.FieldIndexer
 
reduce(Text, Iterator<NutchWritable>, OutputCollector<Text, NutchDocument>, Reporter) - 
Method in class org.apache.nutch.indexer.IndexerMapReduce
 
reduce(Text, Iterator<SolrDeleteDuplicates.SolrRecord>, OutputCollector<Text, SolrDeleteDuplicates.SolrRecord>, Reporter) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
reduce(Text, Iterator<Writable>, OutputCollector<Text, Writable>, Reporter) - 
Method in class org.apache.nutch.parse.ParseSegment
 
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, LinkDumper.LinkNode>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Inverter
Inverts outlinks to inlinks while attaching node information to the 
 outlink.
reduce(Text, Iterator<LinkDumper.LinkNode>, OutputCollector<Text, LinkDumper.LinkNodes>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.Merger
Aggregate all LinkNode objects for a given url.
reduce(Text, Iterator<Loops.Route>, OutputCollector<Text, Loops.LoopSet>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Finalizer
Aggregates all found routes for a given start url into a loopset and 
 collects the loopset.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, Loops.Route>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Initializer
Takes any node that has inlinks and sets up a route for all of its
 outlinks.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, Loops.Route>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Looper
Performs a single loop pass looking for loop cycles within routes.
reduce(FloatWritable, Iterator<Text>, OutputCollector<Text, FloatWritable>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.NodeDumper.Sorter
Flips and collects the url and numeric sort value.
reduce(Text, Iterator<ObjectWritable>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
Creates new CrawlDatum objects with the updated score from the NodeDb or
 with a cleared score.
reduce(Text, Iterator<LinkDatum>, OutputCollector<Text, LinkDatum>, Reporter) - 
Method in class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
 
reduce(Text, Iterator<MetaWrapper>, OutputCollector<Text, MetaWrapper>, Reporter) - 
Method in class org.apache.nutch.segment.SegmentMerger
NOTE: in selecting the latest version we rely exclusively on the segment
 name (not all segment data contain time information).
reduce(Text, Iterator<NutchWritable>, OutputCollector<Text, Text>, Reporter) - 
Method in class org.apache.nutch.segment.SegmentReader
 
reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.tools.compat.ReprUrlFixer
Runs the new ReprUrl logic on all crawldatums.
reduce(Text, Iterator<CrawlDatum>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.tools.CrawlDBScanner
 
reduce(Text, Iterator<Generator.SelectorEntry>, OutputCollector<Text, CrawlDatum>, Reporter) - 
Method in class org.apache.nutch.tools.FreeGenerator.FG
 
reduce(Text, Iterator<LongWritable>, OutputCollector<Text, LongWritable>, Reporter) - 
Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsCombiner
 
reduce(Text, Iterator<LongWritable>, OutputCollector<LongWritable, Text>, Reporter) - 
Method in class org.apache.nutch.util.domain.DomainStatistics
 
RegexRule - Class in org.apache.nutch.urlfilter.api
A generic regular expression rule.
RegexRule(boolean, String) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexRule
Constructs a new regular expression rule.
RegexURLFilter - Class in org.apache.nutch.urlfilter.regex
Filters URLs based on a file of regular expressions using the
 Java Regex implementation.
RegexURLFilter() - 
Constructor for class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
RegexURLFilter(String) - 
Constructor for class org.apache.nutch.urlfilter.regex.RegexURLFilter
 
RegexURLFilterBase - Class in org.apache.nutch.urlfilter.api
Generic URL filter based on
 regular expressions.
RegexURLFilterBase() - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new empty RegexURLFilterBase
RegexURLFilterBase(String) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new RegexURLFilter and init it with a file of rules.
RegexURLFilterBase(Reader) - 
Constructor for class org.apache.nutch.urlfilter.api.RegexURLFilterBase
Constructs a new RegexURLFilter and init it with a Reader of rules.
ReInit(CharStream) - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Reinitialise.
ReInit(NutchAnalysisTokenManager) - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Reinitialise.
ReInit(CharStream) - 
Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
Reinitialise parser.
ReInit(CharStream, int) - 
Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
Reinitialise parser.
REL_TAG - 
Static variable in class org.apache.nutch.microformats.reltag.RelTagParser
 
RELATION - 
Static variable in interface org.apache.nutch.metadata.DublinCore
A reference to a related resource.
RelTagIndexingFilter - Class in org.apache.nutch.microformats.reltag
An IndexingFilter that 
 add tag field(s) to the document.
RelTagIndexingFilter() - 
Constructor for class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
 
RelTagParser - Class in org.apache.nutch.microformats.reltag
Adds microformat rel-tags of document if found.
RelTagParser() - 
Constructor for class org.apache.nutch.microformats.reltag.RelTagParser
 
RelTagQueryFilter - Class in org.apache.nutch.microformats.reltag
Handles "tag:" query clauses.
RelTagQueryFilter() - 
Constructor for class org.apache.nutch.microformats.reltag.RelTagQueryFilter
 
remove(Writable) - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
remove(String) - 
Method in class org.apache.nutch.metadata.Metadata
Remove a metadata and all its associated values.
remove(String) - 
Method in class org.apache.nutch.metadata.SpellCheckedMetadata
 
removeField(String) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
removeLockFile(FileSystem, Path) - 
Static method in class org.apache.nutch.util.LockUtil
Remove lock file.
renameFile(String, String) - 
Method in class org.apache.nutch.indexer.FsDirectory
 
renderAnonymous(PrintStream, Resource, String) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
renderClassDescription(PrintStream, OntClass, int) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
renderHierarchy(PrintStream, OntClass, List, int) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
renderRestriction(PrintStream, Restriction) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
renderURI(PrintStream, PrefixMapping, String) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
replace(FileSystem, Path, Path, boolean) - 
Static method in class org.apache.nutch.util.FSUtils
Replaces the current path with the new path and if set removes the old
 path.
REPR_URL_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
ReprUrlFixer - Class in org.apache.nutch.tools.compat

 Significant changes were made to representative url logic used for redirects.
ReprUrlFixer() - 
Constructor for class org.apache.nutch.tools.compat.ReprUrlFixer
 
RequestUtils - Class in org.apache.nutch.searcher.response
A set of utility methods for getting request paramters.
RequestUtils() - 
Constructor for class org.apache.nutch.searcher.response.RequestUtils
 
reset() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets all boolean values to false.
resolveEncodingAlias(String) - 
Static method in class org.apache.nutch.util.EncodingDetector
 
ResolveUrls - Class in org.apache.nutch.tools
A simple tool that will spin up multiple threads to resolve urls to ip
 addresses.
ResolveUrls(String) - 
Constructor for class org.apache.nutch.tools.ResolveUrls
Create a new ResolveUrls with a file from the local file system.
ResolveUrls(String, int) - 
Constructor for class org.apache.nutch.tools.ResolveUrls
Create a new ResolveUrls with a urls file and a number of threads for the
 Thread pool.
resolveUrls() - 
Method in class org.apache.nutch.tools.ResolveUrls
Creates a thread pool for resolving urls.
Response - Interface in org.apache.nutch.net.protocols
A response inteface.
RESPONSE_TYPE - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
ResponseWriter - Interface in org.apache.nutch.searcher.response
Nutch extension point which allow writing search results in many different
 output formats.
ResponseWriters - Class in org.apache.nutch.searcher.response
Utility class for getting all ResponseWriter implementations and for
 returning the correct ResponseWriter for a given request type.
ResponseWriters(Configuration) - 
Constructor for class org.apache.nutch.searcher.response.ResponseWriters
Constructor that configures the cache of ResponseWriter objects.
retrieve(String) - 
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
 
retrieveFile(String, OutputStream, int) - 
Method in class org.apache.nutch.protocol.ftp.Client
 
retrieveList(String, List, int, FTPFileEntryParser) - 
Method in class org.apache.nutch.protocol.ftp.Client
 
RETRY - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Temporary failure.
REVERSE - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
REVISION_NUMBER - 
Static variable in interface org.apache.nutch.metadata.Office
 
rightPad(String, int) - 
Static method in class org.apache.nutch.util.StringUtil
Returns a copy of s padded with trailing spaces so
 that it's length is length.
RIGHTS - 
Static variable in interface org.apache.nutch.metadata.DublinCore
Information about rights held in and over the resource.
RobotRules - Interface in org.apache.nutch.protocol
This class holds the rules which were parsed from a robots.txt file, and can
 test paths against those rules.
RobotRulesParser - Class in org.apache.nutch.protocol.http.api
This class handles the parsing of robots.txt files.
RobotRulesParser(Configuration) - 
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser
 
RobotRulesParser.RobotRuleSet - Class in org.apache.nutch.protocol.http.api
This class holds the rules which were parsed from a robots.txt
 file, and can test paths against those rules.
RobotRulesParser.RobotRuleSet() - 
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
 
ROBOTS_DENIED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Access denied by robots.txt rules.
root - 
Variable in class org.apache.nutch.util.TrieStringMatcher
 
rootClasses(OntModel) - 
Method in class org.apache.nutch.ontology.jena.OwlParser
 
rootClasses(OntModel) - 
Method in interface org.apache.nutch.ontology.jena.Parser
 
ROUTES_DIR - 
Static variable in class org.apache.nutch.scoring.webgraph.Loops
 
ROWS - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
RPCSearchBean - Interface in org.apache.nutch.searcher
 
RPCSegmentBean - Interface in org.apache.nutch.searcher
 
RSSChannel - Class in org.apache.nutch.parse.rss.structs

 Data class for holding RSS Channels to send to Nutch's indexer
RSSChannel(String, String, String, List) - 
Constructor for class org.apache.nutch.parse.rss.structs.RSSChannel

 Default Constructor
RSSChannel(String, String, String) - 
Constructor for class org.apache.nutch.parse.rss.structs.RSSChannel

 Constructor if you don't have the list of RSS Items ready yet.
RSSItem - Class in org.apache.nutch.parse.rss.structs

 Data class for holding RSS Items to send to Nutch's indexer
RSSItem(String, String, String, String) - 
Constructor for class org.apache.nutch.parse.rss.structs.RSSItem
 
RSSParser - Class in org.apache.nutch.parse.rss
 
RSSParser() - 
Constructor for class org.apache.nutch.parse.rss.RSSParser
 
RULES - 
Static variable in class org.apache.nutch.protocol.EmptyRobotRules
 
run(String[]) - 
Method in class org.apache.nutch.crawl.CrawlDb
 
run(String[]) - 
Method in class org.apache.nutch.crawl.CrawlDbMerger
 
run(String[]) - 
Method in class org.apache.nutch.crawl.Generator
 
run(String[]) - 
Method in class org.apache.nutch.crawl.Injector
 
run(String[]) - 
Method in class org.apache.nutch.crawl.LinkDb
 
run(String[]) - 
Method in class org.apache.nutch.crawl.LinkDbMerger
 
run(String[]) - 
Method in class org.apache.nutch.crawl.LinkDbReader
 
run(RecordReader<Text, CrawlDatum>, OutputCollector<Text, NutchWritable>, Reporter) - 
Method in class org.apache.nutch.fetcher.Fetcher
 
run(String[]) - 
Method in class org.apache.nutch.fetcher.Fetcher
 
run(RecordReader<WritableComparable, Writable>, OutputCollector<Text, NutchWritable>, Reporter) - 
Method in class org.apache.nutch.fetcher.OldFetcher
 
run(String[]) - 
Method in class org.apache.nutch.fetcher.OldFetcher
 
run(String[]) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates
 
run(String[]) - 
Method in class org.apache.nutch.indexer.field.AnchorFields
Runs the AnchorFields job.
run(String[]) - 
Method in class org.apache.nutch.indexer.field.BasicFields
Runs the BasicFields tool.
run(String[]) - 
Method in class org.apache.nutch.indexer.field.CustomFields
Runs the CustomFields job.
run(String[]) - 
Method in class org.apache.nutch.indexer.field.FieldIndexer
 
run(String[]) - 
Method in class org.apache.nutch.indexer.Indexer
 
run(String[]) - 
Method in class org.apache.nutch.indexer.IndexMerger
 
run(String[]) - 
Method in class org.apache.nutch.indexer.IndexSorter
 
run(String[]) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
run(String[]) - 
Method in class org.apache.nutch.indexer.solr.SolrIndexer
 
run(String[]) - 
Method in class org.apache.nutch.parse.ParseSegment
 
run(String[]) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper
Runs the LinkDumper tool.
run(String[]) - 
Method in class org.apache.nutch.scoring.webgraph.LinkRank
Runs the LinkRank tool.
run(String[]) - 
Method in class org.apache.nutch.scoring.webgraph.Loops
Runs the Loops tool.
run(String[]) - 
Method in class org.apache.nutch.scoring.webgraph.NodeDumper
Runs the node dumper tool.
run(String[]) - 
Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
Runs the ScoreUpdater tool.
run(String[]) - 
Method in class org.apache.nutch.scoring.webgraph.WebGraph
Parses command link arguments and runs the WebGraph jobs.
run(String[]) - 
Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
 
run(String[]) - 
Method in class org.apache.nutch.tools.compat.CrawlDbConverter
 
run(String[]) - 
Method in class org.apache.nutch.tools.compat.ReprUrlFixer
Parse command line options and execute the main update logic.
run(String[]) - 
Method in class org.apache.nutch.tools.CrawlDBScanner
 
run(String[]) - 
Method in class org.apache.nutch.tools.FreeGenerator
 
run() - 
Method in class org.apache.nutch.tools.PruneIndexTool
For each query, find all matching documents and delete them from all input
 indexes.
run(String[]) - 
Method in class org.apache.nutch.util.domain.DomainStatistics
 



S

save(OutputStream) - 
Method in class org.apache.nutch.analysis.lang.NGramProfile
Writes NGramProfile content into OutputStream, content is outputted with
 UTF-8 encoding
saveDom(OutputStream, Element) - 
Static method in class org.apache.nutch.util.DomUtil
save dom into ouputstream
SCOPE_CRAWLDB - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used when updating the CrawlDb with new URLs.
SCOPE_DEFAULT - 
Static variable in class org.apache.nutch.net.URLNormalizers
Default scope.
SCOPE_FETCHER - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by Fetcher when processing
 redirect URLs.
SCOPE_GENERATE_HOST_COUNT - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by Generator.
SCOPE_INJECT - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by Injector.
SCOPE_LINKDB - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used when updating the LinkDb with new URLs.
SCOPE_OUTLINK - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used when constructing new Outlink instances.
SCOPE_PARTITION - 
Static variable in class org.apache.nutch.net.URLNormalizers
Scope used by URLPartitioner.
SCORE_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
ScoreUpdater - Class in org.apache.nutch.scoring.webgraph
Updates the score from the WebGraph node database into the crawl database.
ScoreUpdater() - 
Constructor for class org.apache.nutch.scoring.webgraph.ScoreUpdater
 
ScoringFilter - Interface in org.apache.nutch.scoring
A contract defining behavior of scoring plugins.
ScoringFilterException - Exception in org.apache.nutch.scoring
Specialized exception for errors during scoring.
ScoringFilterException() - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilterException(String) - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilterException(String, Throwable) - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilterException(Throwable) - 
Constructor for exception org.apache.nutch.scoring.ScoringFilterException
 
ScoringFilters - Class in org.apache.nutch.scoring
Creates and caches ScoringFilter implementing plugins.
ScoringFilters(Configuration) - 
Constructor for class org.apache.nutch.scoring.ScoringFilters
 
search(Query) - 
Method in class org.apache.nutch.searcher.DistributedSearchBean
 
search(Query, int, String, String, boolean) - 
Method in class org.apache.nutch.searcher.DistributedSearchBean
Deprecated. 
search(Query, int, String, String, boolean) - 
Method in class org.apache.nutch.searcher.IndexSearcher
Deprecated. 
search(Query) - 
Method in class org.apache.nutch.searcher.IndexSearcher
 
search(Query, int, String, String, boolean) - 
Method in class org.apache.nutch.searcher.LuceneSearchBean
Deprecated. 
search(Query) - 
Method in class org.apache.nutch.searcher.LuceneSearchBean
 
search(Query, int) - 
Method in class org.apache.nutch.searcher.NutchBean
Deprecated. since 1.1, use NutchBean.search(Query) instead
search(Query, int, String, String, boolean) - 
Method in class org.apache.nutch.searcher.NutchBean
Deprecated. since 1.1, use NutchBean.search(Query) instead
search(Query) - 
Method in class org.apache.nutch.searcher.NutchBean
 
search(Query, int, int) - 
Method in class org.apache.nutch.searcher.NutchBean
Deprecated. since 1.1, use NutchBean.search(Query) instead
search(Query, int, int, String) - 
Method in class org.apache.nutch.searcher.NutchBean
Deprecated. since 1.1, use NutchBean.search(Query) instead
search(Query, int, int, String, String, boolean) - 
Method in class org.apache.nutch.searcher.NutchBean
Deprecated. since 1.1, use NutchBean.search(Query) instead
search(Query, int, String, String, boolean) - 
Method in interface org.apache.nutch.searcher.Searcher
Deprecated. since 1.1, use Searcher.search(Query) instead.
search(Query) - 
Method in interface org.apache.nutch.searcher.Searcher
Return the top-scoring hits for a query.
search(Query) - 
Method in class org.apache.nutch.searcher.SolrSearchBean
 
search(Query, int, String, String, boolean) - 
Method in class org.apache.nutch.searcher.SolrSearchBean
Deprecated. 
SearchBean - Interface in org.apache.nutch.searcher
 
Searcher - Interface in org.apache.nutch.searcher
Service that searches.
SearchLoadTester - Class in org.apache.nutch.tools
A simple tool to perform load testing on configured search servers.
SearchLoadTester(String) - 
Constructor for class org.apache.nutch.tools.SearchLoadTester
 
SearchLoadTester(String, int, boolean) - 
Constructor for class org.apache.nutch.tools.SearchLoadTester
 
SearchResults - Class in org.apache.nutch.searcher.response
 
SearchResults() - 
Constructor for class org.apache.nutch.searcher.response.SearchResults
 
SearchServlet - Class in org.apache.nutch.searcher.response
Servlet that allows returning search results in multiple different formats
 through a ResponseWriter Nutch extension point.
SearchServlet() - 
Constructor for class org.apache.nutch.searcher.response.SearchServlet
 
SECONDS_PER_DAY - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
 
SEG_URL - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
SEGMENT - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
SEGMENT_NAME_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
SegmentBean - Interface in org.apache.nutch.searcher
 
SegmentMerger - Class in org.apache.nutch.segment
This tool takes several segments and merges their data together.
SegmentMerger() - 
Constructor for class org.apache.nutch.segment.SegmentMerger
 
SegmentMerger(Configuration) - 
Constructor for class org.apache.nutch.segment.SegmentMerger
 
SegmentMerger.ObjectInputFormat - Class in org.apache.nutch.segment
Wraps inputs in an MetaWrapper, to permit merging different
 types in reduce and use additional metadata.
SegmentMerger.ObjectInputFormat() - 
Constructor for class org.apache.nutch.segment.SegmentMerger.ObjectInputFormat
 
SegmentMerger.SegmentOutputFormat - Class in org.apache.nutch.segment
 
SegmentMerger.SegmentOutputFormat() - 
Constructor for class org.apache.nutch.segment.SegmentMerger.SegmentOutputFormat
 
segmentName - 
Variable in class org.apache.nutch.segment.SegmentPart
Name of the segment (just the last path component).
SegmentPart - Class in org.apache.nutch.segment
Utility class for handling information about segment parts.
SegmentPart() - 
Constructor for class org.apache.nutch.segment.SegmentPart
 
SegmentPart(String, String) - 
Constructor for class org.apache.nutch.segment.SegmentPart
 
SegmentReader - Class in org.apache.nutch.segment
Dump the content of a segment.
SegmentReader() - 
Constructor for class org.apache.nutch.segment.SegmentReader
 
SegmentReader(Configuration, boolean, boolean, boolean, boolean, boolean, boolean) - 
Constructor for class org.apache.nutch.segment.SegmentReader
 
SegmentReader.InputCompatMapper - Class in org.apache.nutch.segment
 
SegmentReader.InputCompatMapper() - 
Constructor for class org.apache.nutch.segment.SegmentReader.InputCompatMapper
 
SegmentReader.SegmentReaderStats - Class in org.apache.nutch.segment
 
SegmentReader.SegmentReaderStats() - 
Constructor for class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
 
SegmentReader.TextOutputFormat - Class in org.apache.nutch.segment
Implements a text output format
SegmentReader.TextOutputFormat() - 
Constructor for class org.apache.nutch.segment.SegmentReader.TextOutputFormat
 
segnum - 
Variable in class org.apache.nutch.crawl.Generator.SelectorEntry
 
sendNoOp() - 
Method in class org.apache.nutch.protocol.ftp.Client
Sends a NOOP command to the FTP server.
SERVER_URL - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
serverDelay - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The number of seconds the fetcher will delay between
 successive requests to the same server.
set(CrawlDatum) - 
Method in class org.apache.nutch.crawl.CrawlDatum
Copy the contents of another instance into this instance.
set(String, String) - 
Method in class org.apache.nutch.metadata.Metadata
Set metadata name/value.
set(String, String) - 
Method in class org.apache.nutch.metadata.SpellCheckedMetadata
 
setAll(Properties) - 
Method in class org.apache.nutch.metadata.Metadata
Copy All key-value pairs from properties.
setAnchor(String) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
setAnchorBoost(float) - 
Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
Set the boost factor for title/anchor matches, relative to url and
 content matches.
setArgs(String[]) - 
Method in class org.apache.nutch.parse.ParseStatus
 
setArgs(String[]) - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
setBaseHref(URL) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the baseHref.
setBoost(float) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setBoost(float) - 
Method in class org.apache.nutch.searcher.RawFieldQueryFilter
 
setClazz(String) - 
Method in class org.apache.nutch.plugin.Extension
Sets the Class that implement the concret extension and is only used until
 model creation at system start up.
setCode(int) - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
setCommand(String) - 
Method in class org.apache.nutch.util.CommandRunner
 
setConf(Configuration) - 
Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.analysis.lang.LanguageQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.analysis.NutchAnalyzer
 
setConf(Configuration) - 
Method in class org.apache.nutch.clustering.carrot2.Clusterer
Implementation of Configurable
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
 
setConf(Configuration) - 
Method in class org.apache.nutch.crawl.Signature
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
setConf(Configuration) - 
Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.microformats.reltag.RelTagParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.microformats.reltag.RelTagQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.ext.ExtParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.html.DOMContentUtils
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.html.HtmlParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.js.JSParseFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.ms.MSBaseParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.oo.OOParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.pdf.PdfParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.rss.RSSParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.swf.SWFParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.text.TextParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.parse.zip.ZipParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.file.File
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.http.Http
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.httpclient.Http
Reads the configuration from the Nutch configuration files and sets
 the configuration.
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
 
setConf(Configuration) - 
Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
 
setConf(Configuration) - 
Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.FieldQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.more.DateQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.more.TypeQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.Query
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.site.SiteQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.searcher.url.URLQueryFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.segment.SegmentMerger
 
setConf(Configuration) - 
Method in class org.apache.nutch.summary.basic.BasicSummarizer
 
setConf(Configuration) - 
Method in class org.apache.nutch.summary.lucene.LuceneSummarizer
 
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
 
setConf(Configuration) - 
Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
 
setConf(Configuration) - 
Method in class org.apache.nutch.util.domain.DomainStatistics
 
setConf(Configuration) - 
Method in class org.apache.nutch.util.GenericWritableConfigurable
 
setConf(Configuration) - 
Method in class org.creativecommons.nutch.CCIndexingFilter
 
setConf(Configuration) - 
Method in class org.creativecommons.nutch.CCParseFilter
 
setConf(Configuration) - 
Method in class org.creativecommons.nutch.CCQueryFilter
 
setContent(byte[]) - 
Method in class org.apache.nutch.protocol.Content
 
setContent(Content) - 
Method in class org.apache.nutch.protocol.ProtocolOutput
 
setContentType(String) - 
Method in class org.apache.nutch.protocol.Content
 
setContentType(String) - 
Method in interface org.apache.nutch.searcher.response.ResponseWriter
Sets the returned content MIME type.
setCrawlDelay(long) - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
Set Crawl-Delay, in milliseconds
setDataTimeout(int) - 
Method in class org.apache.nutch.protocol.ftp.Client
Sets the timeout in milliseconds to use for data connection.
setDebugStream(PrintStream) - 
Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
Set debug output.
setDedupField(String) - 
Method in class org.apache.nutch.searcher.QueryParams
 
setDescription(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSChannel

 Sets the description of this RSSChannel
setDescription(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSItem

 Sets the description of this RSS Item.
setDescriptor(PluginDescriptor) - 
Method in class org.apache.nutch.plugin.Extension
Sets the plugin descriptor and is only used until model creation at system
 start up.
setDetails(HitDetails[]) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setDocumentLocator(Locator) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive an object for locating the origin of SAX document events.
setEnd(int) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setExpireTime(long) - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
Change when the ruleset goes stale.
setFetchInterval(int) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setFetchInterval(float) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
Sets the fetchInterval and fetchTime on a
 successfully fetched page.
setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - 
Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
 
setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - 
Method in class org.apache.nutch.crawl.DefaultFetchSchedule
 
setFetchSchedule(Text, CrawlDatum, long, long, long, long, int) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
Sets the fetchInterval and fetchTime on a
 successfully fetched page.
setFetchTime(long) - 
Method in class org.apache.nutch.crawl.CrawlDatum
Sets either the time of the last fetch or the next fetch time,
 depending on whether Fetcher or CrawlDbReducer set the time.
setFields(String[]) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setFieldsList(List<FieldWritable>) - 
Method in class org.apache.nutch.indexer.field.FieldsWritable
 
setFileType(int) - 
Method in class org.apache.nutch.protocol.ftp.Client
Sets the file type to be transferred.
setFollowTalk(boolean) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set followTalk
setFound(boolean) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Route
 
setHits(Hit[]) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setId(String) - 
Method in class org.apache.nutch.plugin.Extension
Sets the unique extension Id and is only used until model creation at
 system start up.
setIDAttribute(String, Element) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Set an ID string to node association in the ID table.
setIndexed(boolean) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setIndexNo(int) - 
Method in class org.apache.nutch.searcher.Hit
 
setInlinkScore(float) - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
setInputStream(InputStream) - 
Method in class org.apache.nutch.util.CommandRunner
 
setItems(List) - 
Method in class org.apache.nutch.parse.rss.structs.RSSChannel

 Sets the list of RSS items for this channel.
setKeepConnection(boolean) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set keepConnection
setLang(String) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setLastModified(long) - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
setLink(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSChannel

 Sets the link to this RSSChannel
setLink(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSItem

 Sets the link that this RSS Item points to.
setLinks(LinkDumper.LinkNode[]) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
 
setLinkType(byte) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
setLookingFor(String) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Route
 
setLoopSet(Set<String>) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.LoopSet
 
setMajorCode(byte) - 
Method in class org.apache.nutch.parse.ParseStatus
 
setMaxContentLength(int) - 
Method in class org.apache.nutch.protocol.file.File
Set the point at which content is truncated.
setMaxContentLength(int) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set the point at which content is truncated.
setMaxHitsPerDup(int) - 
Method in class org.apache.nutch.searcher.QueryParams
 
setMessage(String) - 
Method in class org.apache.nutch.parse.ParseStatus
 
setMessage(String) - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
setMeta(String, String) - 
Method in class org.apache.nutch.metadata.MetaWrapper
Set metadata.
setMetaData(MapWritable) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setMetadata(Metadata) - 
Method in class org.apache.nutch.protocol.Content
Other protocol-specific data.
setMetadata(Metadata) - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
setMinorCode(short) - 
Method in class org.apache.nutch.parse.ParseStatus
 
setModifiedTime(long) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setMoreFromDupExcluded(boolean) - 
Method in class org.apache.nutch.searcher.Hit
True if other, lower-scoring, hits with the same dedup value have been
 excluded from the list which contains this hit..
setName(String) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setNoCache() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noCache to true.
setNode(Node) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
 
setNoFollow() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noFollow to true.
setNoIndex() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noIndex to true.
setNumHits(int) - 
Method in class org.apache.nutch.searcher.QueryParams
 
setNumInlinks(int) - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
setNumOutlinks(int) - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
setObject(String, Object) - 
Method in class org.apache.nutch.util.ObjectCache
 
setOutlinkUrl(String) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Route
 
setPageGoneSchedule(Text, CrawlDatum, long, long, long) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
This method specifies how to schedule refetching of pages
 marked as GONE.
setPageGoneSchedule(Text, CrawlDatum, long, long, long) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
This method specifies how to schedule refetching of pages
 marked as GONE.
setPageRetrySchedule(Text, CrawlDatum, long, long, long) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
This method adjusts the fetch schedule if fetching needs to be
 re-tried due to transient errors.
setPageRetrySchedule(Text, CrawlDatum, long, long, long) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
This method adjusts the fetch schedule if fetching needs to be
 re-tried due to transient errors.
setParams(QueryParams) - 
Method in class org.apache.nutch.searcher.Query
 
setParseMeta(Metadata) - 
Method in class org.apache.nutch.parse.ParseData
 
setPermalink(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSItem

 Sets the permanent link that this RSS Item points to.
setPhraseBoost(float) - 
Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
Set the boost factor for sloppy phrase matches relative to unordered term
 matches.
setQuery(String) - 
Method in class org.apache.nutch.clustering.carrot2.NutchInputComponent
 
setQuery(String) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setRefresh(boolean) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets refresh to the supplied value.
setRefreshHref(URL) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the refreshHref.
setRefreshTime(int) - 
Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the refreshTime.
setRemoteVerificationEnabled(boolean) - 
Method in class org.apache.nutch.protocol.ftp.Client
Enable or disable verification that the remote host taking part
 of a data connection is the same as the host to which the control
 connection is attached.
setResponseType(String) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setRetriesSinceFetch(int) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setReverse(boolean) - 
Method in class org.apache.nutch.searcher.QueryParams
 
setReverse(boolean) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setRows(int) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setScore(float) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setScore(float) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
setScore(float) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
setSignature(byte[]) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setSlop(int) - 
Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
Set the maximum number of terms permitted between matching terms in a
 sloppy phrase match.
setSort(String) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setSortField(String) - 
Method in class org.apache.nutch.searcher.QueryParams
 
setStart(int) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setStatus(int) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
setStatus(ProtocolStatus) - 
Method in class org.apache.nutch.protocol.ProtocolOutput
 
setStdErrorStream(OutputStream) - 
Method in class org.apache.nutch.util.CommandRunner
 
setStdOutputStream(OutputStream) - 
Method in class org.apache.nutch.util.CommandRunner
 
setStored(boolean) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setSummaries(Summary[]) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setTimeout(int) - 
Method in class org.apache.nutch.protocol.ftp.Ftp
Set the timeout.
setTimeout(int) - 
Method in class org.apache.nutch.util.CommandRunner
 
setTimestamp(long) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
setTitle(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSChannel

 Sets the Title for this RSS Channel.
setTitle(String) - 
Method in class org.apache.nutch.parse.rss.structs.RSSItem

 Sets the title for this RSS Item.
setTokenized(boolean) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setTotalHits(long) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
setTotalIsExact(boolean) - 
Method in class org.apache.nutch.searcher.Hits
Set Hits.totalIsExact().
setType(FieldType) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setUrl(String) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
setUrl(String) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
 
setUrlBoost(float) - 
Method in class org.apache.nutch.searcher.basic.BasicQueryFilter
Set the boost factor for url matches, relative to content and anchor
 matches
setValue(String) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
setWaitForExit(boolean) - 
Method in class org.apache.nutch.util.CommandRunner
 
setWeight(float) - 
Method in class org.apache.nutch.searcher.Query.Clause
 
setWithSummary(boolean) - 
Method in class org.apache.nutch.searcher.response.SearchResults
 
shortestMatch(String) - 
Method in class org.apache.nutch.util.PrefixStringMatcher
Returns the shortest prefix of input that is matched,
 or null if no match exists.
shortestMatch(String) - 
Method in class org.apache.nutch.util.SuffixStringMatcher
Returns the shortest suffix of input that is matched,
 or null if no match exists.
shortestMatch(String) - 
Method in class org.apache.nutch.util.TrieStringMatcher
Returns the shortest substring of input that is
 matched by a pattern in the trie, or null if no match
 exists.
shouldFetch(Text, CrawlDatum, long) - 
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
This method provides information whether the page is suitable for
 selection in the current fetchlist.
shouldFetch(Text, CrawlDatum, long) - 
Method in interface org.apache.nutch.crawl.FetchSchedule
This method provides information whether the page is suitable for
 selection in the current fetchlist.
shutDown() - 
Method in class org.apache.nutch.plugin.Plugin
Shutdown the plugin.
Signature - Class in org.apache.nutch.crawl
 
Signature() - 
Constructor for class org.apache.nutch.crawl.Signature
 
SIGNATURE_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
SignatureComparator - Class in org.apache.nutch.crawl
 
SignatureComparator() - 
Constructor for class org.apache.nutch.crawl.SignatureComparator
 
SignatureFactory - Class in org.apache.nutch.crawl
Factory class, which instantiates a Signature implementation according to the
 current Configuration configuration.
SIGRAM - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
SITE - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
SiteQueryFilter - Class in org.apache.nutch.searcher.site
Handles "site:" query clauses, causing them to search the field indexed by
 SiteIndexingFilter.
SiteQueryFilter() - 
Constructor for class org.apache.nutch.searcher.site.SiteQueryFilter
 
size() - 
Method in class org.apache.nutch.crawl.Inlinks
 
size() - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
size() - 
Method in class org.apache.nutch.metadata.Metadata
Returns the number of metadata names in this metadata.
size() - 
Method in class org.apache.nutch.parse.ParseResult
Return the number of parse outputs (both successful and failed)
skip(DataInput) - 
Static method in class org.apache.nutch.crawl.Inlink
Skips over one Inlink in the input.
skip(DataInput) - 
Static method in class org.apache.nutch.parse.Outlink
Skips over one Outlink in the input.
skipChildren() - 
Method in class org.apache.nutch.util.NodeWalker
Skips over and removes from the node stack the children of the last
 node.
skippedEntity(String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of a skipped entity.
SLASH - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
SOLR_PREFIX - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
SolrConstants - Interface in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates - Class in org.apache.nutch.indexer.solr
Utility class for deleting duplicate documents from a solr index.
SolrDeleteDuplicates() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
 
SolrDeleteDuplicates.SolrInputFormat - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrInputFormat() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
 
SolrDeleteDuplicates.SolrInputSplit - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrInputSplit() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
 
SolrDeleteDuplicates.SolrInputSplit(int, int) - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
 
SolrDeleteDuplicates.SolrRecord - Class in org.apache.nutch.indexer.solr
 
SolrDeleteDuplicates.SolrRecord() - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
SolrDeleteDuplicates.SolrRecord(String, float, long) - 
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
SolrIndexer - Class in org.apache.nutch.indexer.solr
 
SolrIndexer() - 
Constructor for class org.apache.nutch.indexer.solr.SolrIndexer
 
SolrIndexer(Configuration) - 
Constructor for class org.apache.nutch.indexer.solr.SolrIndexer
 
SolrMappingReader - Class in org.apache.nutch.indexer.solr
 
SolrMappingReader(Configuration) - 
Constructor for class org.apache.nutch.indexer.solr.SolrMappingReader
 
SolrSearchBean - Class in org.apache.nutch.searcher
 
SolrSearchBean(Configuration, String) - 
Constructor for class org.apache.nutch.searcher.SolrSearchBean
 
SolrWriter - Class in org.apache.nutch.indexer.solr
 
SolrWriter() - 
Constructor for class org.apache.nutch.indexer.solr.SolrWriter
 
sort(File) - 
Method in class org.apache.nutch.indexer.IndexSorter
 
SORT - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
SOURCE - 
Static variable in interface org.apache.nutch.metadata.DublinCore
A reference to a resource from which the present resource is derived.
specialConstructor - 
Variable in exception org.apache.nutch.analysis.ParseException
This variable determines which constructor was used to create
 this object and thereby affects the semantics of the
 "getMessage" method (see below).
SpellCheckedMetadata - Class in org.apache.nutch.metadata
A decorator to Metadata that adds spellchecking capabilities to property
 names.
SpellCheckedMetadata() - 
Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
 
splitEnd - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
splitLen - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
splitStart - 
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
 
START - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
start - 
Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
 
startCDATA() - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Report the start of a CDATA section.
startDocument() - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the beginning of a document.
startDTD(String, String, String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Report the start of DTD declarations, if any.
startElement(String, String, String, Attributes) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the beginning of an element.
startEntity(String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Report the beginning of an entity.
startPrefixMapping(String, String) - 
Method in class org.apache.nutch.parse.html.DOMBuilder
Begin the scope of a prefix-URI Namespace mapping.
startProcessing(RequestContext) - 
Method in class org.apache.nutch.clustering.carrot2.NutchInputComponent
A callback hook that starts the processing.
startUp() - 
Method in class org.apache.nutch.plugin.Plugin
Will be invoked until plugin start up.
statNames - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
 
STATUS_BLOCKED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_DB_FETCHED - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page was successfully fetched.
STATUS_DB_GONE - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page no longer exists.
STATUS_DB_MAX - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Maximum value of DB-related status.
STATUS_DB_NOTMODIFIED - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page was successfully fetched and found not modified.
STATUS_DB_REDIR_PERM - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page permanently redirects to other page.
STATUS_DB_REDIR_TEMP - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page temporarily redirects to other page.
STATUS_DB_UNFETCHED - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page was not fetched yet.
STATUS_FAILED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_FAILURE - 
Static variable in class org.apache.nutch.parse.ParseStatus
 
STATUS_FETCH_GONE - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Fetching unsuccessful - page is gone.
STATUS_FETCH_MAX - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Maximum value of fetch-related status.
STATUS_FETCH_NOTMODIFIED - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Fetching successful - page is not modified.
STATUS_FETCH_REDIR_PERM - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Fetching permanently redirected to other page.
STATUS_FETCH_REDIR_TEMP - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Fetching temporarily redirected to other page.
STATUS_FETCH_RETRY - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Fetching unsuccessful, needs to be retried (transient errors).
STATUS_FETCH_SUCCESS - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Fetching was successful.
STATUS_GONE - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_INJECTED - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page was newly injected.
STATUS_LINKED - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page discovered through a link.
STATUS_MODIFIED - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
Page is known to have been modified since our last visit.
STATUS_NOTFETCHING - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTFOUND - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTMODIFIED - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
Page is known to remain unmodified since our last visit.
STATUS_NOTMODIFIED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTPARSED - 
Static variable in class org.apache.nutch.parse.ParseStatus
 
STATUS_PARSE_META - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page got metadata from a parser
STATUS_REDIR_EXCEEDED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_RETRY - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_ROBOTS_DENIED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_SIGNATURE - 
Static variable in class org.apache.nutch.crawl.CrawlDatum
Page signature.
STATUS_SUCCESS - 
Static variable in class org.apache.nutch.parse.ParseStatus
 
STATUS_SUCCESS - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_UNKNOWN - 
Static variable in interface org.apache.nutch.crawl.FetchSchedule
It is unknown whether page was changed since our last visit.
STATUS_WOULDBLOCK - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STD_FORMAT - 
Static variable in class org.apache.nutch.crawl.CrawlDbReader
 
STORE_COMPRESS - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
STORE_NO - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
STORE_YES - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
StringUtil - Class in org.apache.nutch.util
A collection of String processing utility methods.
StringUtil() - 
Constructor for class org.apache.nutch.util.StringUtil
 
subclasses(String) - 
Method in class org.apache.nutch.ontology.jena.OntologyImpl
retrieve all subclasses of entity(ies) hashed to searchTerm
subclasses(String) - 
Method in interface org.apache.nutch.ontology.Ontology
 
SUBJECT - 
Static variable in interface org.apache.nutch.metadata.DublinCore
The topic of the content of the resource.
SUCCESS - 
Static variable in class org.apache.nutch.parse.ParseStatus
Parsing succeeded.
SUCCESS - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Content was retrieved without errors.
SUCCESS_REDIRECT - 
Static variable in class org.apache.nutch.parse.ParseStatus
Parsed content contains a directive to redirect to another URL.
SuffixStringMatcher - Class in org.apache.nutch.util
A class for efficiently matching Strings against a set
 of suffixes.
SuffixStringMatcher(String[]) - 
Constructor for class org.apache.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any suffix in the supplied array.
SuffixStringMatcher(Collection) - 
Constructor for class org.apache.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match
 Strings with any suffix in the supplied
 Collection
Summarizer - Interface in org.apache.nutch.searcher
Extension point for summarizer.
SummarizerFactory - Class in org.apache.nutch.searcher
A factory for retrieving Summarizer extensions.
SummarizerFactory(Configuration) - 
Constructor for class org.apache.nutch.searcher.SummarizerFactory
 
SUMMARY - 
Static variable in class org.apache.nutch.searcher.response.SearchServlet
 
Summary - Class in org.apache.nutch.searcher
A document summary dynamically generated to match a query.
Summary() - 
Constructor for class org.apache.nutch.searcher.Summary
Constructs an empty Summary.
Summary.Ellipsis - Class in org.apache.nutch.searcher
An ellipsis fragment within a summary.
Summary.Ellipsis() - 
Constructor for class org.apache.nutch.searcher.Summary.Ellipsis
Constructs an ellipsis fragment for the given text.
Summary.Fragment - Class in org.apache.nutch.searcher
A fragment of text within a summary.
Summary.Fragment(String) - 
Constructor for class org.apache.nutch.searcher.Summary.Fragment
Constructs a fragment for the given text.
Summary.Highlight - Class in org.apache.nutch.searcher
A highlighted fragment of text within a summary.
Summary.Highlight(String) - 
Constructor for class org.apache.nutch.searcher.Summary.Highlight
Constructs a highlighted fragment for the given text.
SWFParser - Class in org.apache.nutch.parse.swf
Parser for Flash SWF files.
SWFParser() - 
Constructor for class org.apache.nutch.parse.swf.SWFParser
 
SwitchTo(int) - 
Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
Switch to specified lex state.
synonyms(String) - 
Method in class org.apache.nutch.ontology.jena.OntologyImpl
retrieves synonyms from wordnet via sweet's web interface
synonyms(String) - 
Method in interface org.apache.nutch.ontology.Ontology
 



T

TEMP_MOVED - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource has moved temporarily.
TEMPLATE - 
Static variable in interface org.apache.nutch.metadata.Office
 
term() - 
Method in class org.apache.nutch.analysis.NutchAnalysis
Parse a single term.
terminal - 
Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
testSearch() - 
Method in class org.apache.nutch.tools.SearchLoadTester
 
TextParser - Class in org.apache.nutch.parse.text
 
TextParser() - 
Constructor for class org.apache.nutch.parse.text.TextParser
 
TextProfileSignature - Class in org.apache.nutch.crawl
An implementation of a page signature.
TextProfileSignature() - 
Constructor for class org.apache.nutch.crawl.TextProfileSignature
 
timeout - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The network timeout in millisecond
TIMESTAMP_FIELD - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
TITLE - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
TITLE - 
Static variable in interface org.apache.nutch.metadata.DublinCore
A name given to the resource.
toContent() - 
Method in class org.apache.nutch.protocol.file.FileResponse
 
toContent() - 
Method in class org.apache.nutch.protocol.ftp.FtpResponse
 
toDate(String) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toHexString(byte[]) - 
Static method in class org.apache.nutch.util.StringUtil
Convenience call for StringUtil.toHexString(byte[], String, int), where
 sep = null; lineLen = Integer.MAX_VALUE.
toHexString(byte[], String, int) - 
Static method in class org.apache.nutch.util.StringUtil
Get a text representation of a byte[] as hexadecimal String, where each
 pair of hexadecimal digits corresponds to consecutive bytes in the array.
toHtml() - 
Method in class org.apache.nutch.searcher.HitDetails
Display as HTML.
toHtml(boolean) - 
Method in class org.apache.nutch.searcher.Summary
Returns a HTML representation of this Summary.
token - 
Variable in class org.apache.nutch.analysis.NutchAnalysis
Current token.
token_source - 
Variable in class org.apache.nutch.analysis.NutchAnalysis
Generated Token Manager.
tokenImage - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
Literal token values.
tokenImage - 
Variable in exception org.apache.nutch.analysis.ParseException
This is a reference to the "tokenImage" array of the generated
 parser within which the parse error occurred.
tokenStream(String, Reader) - 
Method in class org.apache.nutch.analysis.NutchAnalyzer
Creates a TokenStream which tokenizes all the text in the provided Reader.
tokenStream(String, Reader) - 
Method in class org.apache.nutch.analysis.NutchDocumentAnalyzer
Returns a new token stream for text from the named field.
toLong(String) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
TopLevelDomain - Class in org.apache.nutch.util.domain
(From wikipedia) A top-level domain (TLD) is the last part of an 
 Internet domain name; that is, the letters which follow the final 
 dot of any domain name.
TopLevelDomain(String, TopLevelDomain.Type, DomainSuffix.Status, float) - 
Constructor for class org.apache.nutch.util.domain.TopLevelDomain
 
TopLevelDomain(String, DomainSuffix.Status, float, String) - 
Constructor for class org.apache.nutch.util.domain.TopLevelDomain
 
TopLevelDomain.Type - Enum in org.apache.nutch.util.domain
 
toString() - 
Method in class org.apache.nutch.analysis.lang.NGramProfile
 
toString() - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
toString() - 
Method in class org.apache.nutch.crawl.Generator.SelectorEntry
 
toString() - 
Method in class org.apache.nutch.crawl.Inlink
 
toString() - 
Method in class org.apache.nutch.crawl.Inlinks
 
toString() - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
toString() - 
Method in class org.apache.nutch.fetcher.FetcherOutput
 
toString() - 
Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
 
toString() - 
Method in class org.apache.nutch.indexer.FsDirectory
 
toString() - 
Method in class org.apache.nutch.metadata.Metadata
 
toString(Date) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
Get the HTTP format of the specified date.
toString(Calendar) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString(long) - 
Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString() - 
Method in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 
toString() - 
Method in class org.apache.nutch.parse.HTMLMetaTags
 
toString() - 
Method in class org.apache.nutch.parse.msword.WordTextBuffer
 
toString() - 
Method in class org.apache.nutch.parse.Outlink
 
toString() - 
Method in class org.apache.nutch.parse.ParseData
 
toString() - 
Method in class org.apache.nutch.parse.ParseStatus
 
toString() - 
Method in class org.apache.nutch.parse.ParseText
 
toString() - 
Method in class org.apache.nutch.protocol.Content
 
toString() - 
Method in class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
 
toString() - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
toString() - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
toString() - 
Method in class org.apache.nutch.scoring.webgraph.Loops.LoopSet
 
toString() - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
toString() - 
Method in class org.apache.nutch.searcher.Hit
Display as a string.
toString() - 
Method in class org.apache.nutch.searcher.HitDetails
Display as a string.
toString() - 
Method in class org.apache.nutch.searcher.Query.Clause
 
toString() - 
Method in class org.apache.nutch.searcher.Query.Phrase
 
toString() - 
Method in class org.apache.nutch.searcher.Query.Term
 
toString() - 
Method in class org.apache.nutch.searcher.Query
 
toString() - 
Method in class org.apache.nutch.searcher.Summary.Fragment
Returns a textual representation of this fragment.
toString() - 
Method in class org.apache.nutch.searcher.Summary
Returns a String representation of this Summary.
toString() - 
Method in class org.apache.nutch.segment.SegmentPart
Return a String representation of this class, in the form
 "segmentName/partName".
toString() - 
Method in class org.apache.nutch.util.domain.DomainSuffix
 
toStrings(Summary[]) - 
Static method in class org.apache.nutch.searcher.Summary
Helper method that return a String representation for each
 specified Summary.
totalIsExact() - 
Method in class org.apache.nutch.searcher.Hits
True if Hits.getTotal() gives the exact number of hits, or false if
 it is only an estimate of the total number of hits.
touchFile(String) - 
Method in class org.apache.nutch.indexer.FsDirectory
 
TrieStringMatcher - Class in org.apache.nutch.util
TrieStringMatcher is a base class for simple tree-based string
 matching.
TrieStringMatcher() - 
Constructor for class org.apache.nutch.util.TrieStringMatcher
 
TrieStringMatcher.TrieNode - Class in org.apache.nutch.util
Node class for the character tree.
TSTAMP - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
TYPE - 
Static variable in interface org.apache.nutch.metadata.DublinCore
The nature or genre of the content of the resource.
TypeQueryFilter - Class in org.apache.nutch.searcher.more
Handles "type:" query clauses, causing them to search the field
 indexed by MoreIndexingFilter.
TypeQueryFilter() - 
Constructor for class org.apache.nutch.searcher.more.TypeQueryFilter
 



U

unzip(byte[]) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[]) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[], int) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array, truncated to
 sizeLimit bytes, if necessary.
update(Path, Path[], boolean, boolean) - 
Method in class org.apache.nutch.crawl.CrawlDb
 
update(Path, Path[], boolean, boolean, boolean, boolean) - 
Method in class org.apache.nutch.crawl.CrawlDb
 
update(Path, Path) - 
Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
Updates the inlink score in the web graph node databsae into the crawl 
 database.
update(Path, Path[]) - 
Method in class org.apache.nutch.tools.compat.ReprUrlFixer
Run the fixer on any crawl database and segments specified.
updateDbScore(Text, CrawlDatum, CrawlDatum, List) - 
Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
Increase the score by a sum of inlinked scores.
updateDbScore(Text, CrawlDatum, CrawlDatum, List<CrawlDatum>) - 
Method in interface org.apache.nutch.scoring.ScoringFilter
This method calculates a new score of CrawlDatum during CrawlDb update, based on the
 initial value of the original CrawlDatum, and also score values contributed by
 inlinked pages.
updateDbScore(Text, CrawlDatum, CrawlDatum, List<CrawlDatum>) - 
Method in class org.apache.nutch.scoring.ScoringFilters
Calculate updated page score during CrawlDb.update().
url - 
Variable in class org.apache.nutch.crawl.Generator.SelectorEntry
 
URL - 
Static variable in interface org.apache.nutch.indexer.field.Fields
 
URL_FIELD - 
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
 
URL_FILTERING - 
Static variable in class org.apache.nutch.crawl.CrawlDbFilter
 
URL_FILTERING - 
Static variable in class org.apache.nutch.crawl.LinkDbFilter
 
URL_NORMALIZING - 
Static variable in class org.apache.nutch.crawl.CrawlDbFilter
 
URL_NORMALIZING - 
Static variable in class org.apache.nutch.crawl.LinkDbFilter
 
URL_NORMALIZING_SCOPE - 
Static variable in class org.apache.nutch.crawl.CrawlDbFilter
 
URL_NORMALIZING_SCOPE - 
Static variable in class org.apache.nutch.crawl.LinkDbFilter
 
URL_VERSION - 
Static variable in class org.apache.nutch.tools.arc.ArcSegmentCreator
 
URLFilter - Interface in org.apache.nutch.net
Interface used to limit which URLs enter Nutch.
URLFILTER_ORDER - 
Static variable in class org.apache.nutch.net.URLFilters
 
URLFilterChecker - Class in org.apache.nutch.net
Checks one given filter or all filters.
URLFilterChecker(Configuration) - 
Constructor for class org.apache.nutch.net.URLFilterChecker
 
URLFilterException - Exception in org.apache.nutch.net
 
URLFilterException() - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilterException(String) - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilterException(String, Throwable) - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilterException(Throwable) - 
Constructor for exception org.apache.nutch.net.URLFilterException
 
URLFilters - Class in org.apache.nutch.net
Creates and caches URLFilter implementing plugins.
URLFilters(Configuration) - 
Constructor for class org.apache.nutch.net.URLFilters
 
URLNormalizer - Interface in org.apache.nutch.net
Interface used to convert URLs to normal form and optionally perform substitutions
URLNormalizerChecker - Class in org.apache.nutch.net
Checks one given normalizer or all normalizers.
URLNormalizerChecker(Configuration) - 
Constructor for class org.apache.nutch.net.URLNormalizerChecker
 
URLNormalizers - Class in org.apache.nutch.net
This class uses a "chained filter" pattern to run defined normalizers.
URLNormalizers(Configuration, String) - 
Constructor for class org.apache.nutch.net.URLNormalizers
 
URLPartitioner - Class in org.apache.nutch.crawl
Partition urls by host, domain name or IP depending on the value of the
 parameter 'partition.url.mode' which can be 'byHost', 'byDomain' or 'byIP'
URLPartitioner() - 
Constructor for class org.apache.nutch.crawl.URLPartitioner
 
URLQueryFilter - Class in org.apache.nutch.searcher.url
Handles "url:" query clauses, causing them to search the field indexed by
 BasicIndexingFilter.
URLQueryFilter() - 
Constructor for class org.apache.nutch.searcher.url.URLQueryFilter
 
URLUtil - Class in org.apache.nutch.util
Utility class for URL analysis
URLUtil() - 
Constructor for class org.apache.nutch.util.URLUtil
 
useHttp11 - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
Do we use HTTP/1.1?
useProxy - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
Indicates if a proxy is used
useProxy() - 
Method in class org.apache.nutch.protocol.http.api.HttpBase
 
userAgent - 
Variable in class org.apache.nutch.protocol.http.api.HttpBase
The Nutch 'User-Agent' request header



V

valueOf(String) - 
Static method in enum org.apache.nutch.indexer.field.FieldType
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.indexer.lucene.LuceneWriter.INDEX
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.indexer.lucene.LuceneWriter.STORE
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.indexer.lucene.LuceneWriter.VECTOR
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
Returns the enum constant of this type with the specified name.
valueOf(String) - 
Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
Returns the enum constant of this type with the specified name.
values() - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
values() - 
Static method in enum org.apache.nutch.indexer.field.FieldType
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.indexer.lucene.LuceneWriter.INDEX
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.indexer.lucene.LuceneWriter.STORE
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.indexer.lucene.LuceneWriter.VECTOR
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
Returns an array containing the constants of this enum type, in
the order they are declared.
values() - 
Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
Returns an array containing the constants of this enum type, in
the order they are declared.
VECTOR_NO - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
VECTOR_OFFSET - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
VECTOR_POS - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
VECTOR_POS_OFFSET - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
VECTOR_YES - 
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
 
VERSION - 
Static variable in class org.apache.nutch.indexer.NutchDocument
 
VERSION - 
Static variable in class org.apache.nutch.searcher.FetchedSegments
 
VERSION - 
Static variable in class org.apache.nutch.searcher.LuceneSearchBean
 



W

walk(Node, URL, Metadata, Configuration) - 
Static method in class org.creativecommons.nutch.CCParseFilter.Walker
Scan the document adding attributes to metadata.
WebGraph - Class in org.apache.nutch.scoring.webgraph
Creates three databases, one for inlinks, one for outlinks, and a node
 database that holds the number of in and outlinks to a url and the current
 score for the url.
WebGraph() - 
Constructor for class org.apache.nutch.scoring.webgraph.WebGraph
 
WebGraph.OutlinkDb - Class in org.apache.nutch.scoring.webgraph
The OutlinkDb creates a database of all outlinks.
WebGraph.OutlinkDb() - 
Constructor for class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
Default constructor.
WebGraph.OutlinkDb(Configuration) - 
Constructor for class org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
Configurable constructor.
WHITE - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
WORD - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
Word6CHPBinTable - Class in org.apache.nutch.parse.msword.chp
This class holds all of the character formatting properties from a Word
 6.0/95 document.
Word6CHPBinTable(byte[], int, int, int, TextPieceTable) - 
Constructor for class org.apache.nutch.parse.msword.chp.Word6CHPBinTable
Constructor used to read a binTable in from a Word document.
WORD_COUNT - 
Static variable in interface org.apache.nutch.metadata.Office
 
WORD_PUNCT - 
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
RegularExpression Id.
WordTextBuffer - Class in org.apache.nutch.parse.msword
This class acts as a StringBuffer for text from a word document.
WordTextBuffer() - 
Constructor for class org.apache.nutch.parse.msword.WordTextBuffer
 
WORK_TYPE - 
Static variable in interface org.apache.nutch.metadata.CreativeCommons
 
WOULDBLOCK - 
Static variable in class org.apache.nutch.protocol.ProtocolStatus
Request was refused by protocol plugins, because it would block.
WRITABLE_GENERATE_TIME_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
WRITABLE_PROTO_STATUS_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
WRITABLE_REPR_URL_KEY - 
Static variable in interface org.apache.nutch.metadata.Nutch
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.CrawlDatum
 
write(Text, CrawlDatum) - 
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.Generator.SelectorEntry
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.Inlink
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.Inlinks
 
write(DataOutput) - 
Method in class org.apache.nutch.crawl.MapWritable
Deprecated.  
write(DataOutput) - 
Method in class org.apache.nutch.fetcher.FetcherOutput
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.field.FieldIndexer.LuceneDocumentWrapper
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.field.FieldsWritable
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.field.FieldWritable
 
write(NutchDocument) - 
Method in class org.apache.nutch.indexer.lucene.LuceneWriter
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.NutchDocument
 
write(NutchDocument) - 
Method in interface org.apache.nutch.indexer.NutchIndexWriter
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
 
write(DataOutput) - 
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
 
write(NutchDocument) - 
Method in class org.apache.nutch.indexer.solr.SolrWriter
 
write(DataOutput) - 
Method in class org.apache.nutch.metadata.Metadata
 
write(DataOutput) - 
Method in class org.apache.nutch.metadata.MetaWrapper
 
write(int) - 
Method in class org.apache.nutch.parse.mspowerpoint.FilteredStringWriter
Chars which are not useful for Nutch indexing are filtered (ignored) on
 writing to the writer.
write(DataOutput) - 
Method in class org.apache.nutch.parse.Outlink
 
write(DataOutput) - 
Method in class org.apache.nutch.parse.ParseData
 
write(DataOutput) - 
Method in class org.apache.nutch.parse.ParseImpl
 
write(DataOutput) - 
Method in class org.apache.nutch.parse.ParseStatus
 
write(DataOutput) - 
Method in class org.apache.nutch.parse.ParseText
 
write(DataOutput) - 
Method in class org.apache.nutch.protocol.Content
 
write(DataOutput) - 
Method in class org.apache.nutch.protocol.ProtocolStatus
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDatum
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNode
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.webgraph.LinkDumper.LinkNodes
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.LoopSet
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.webgraph.Loops.Route
 
write(DataOutput) - 
Method in class org.apache.nutch.scoring.webgraph.Node
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Hit
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.HitDetails
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Hits
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Query.Clause
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Query.Phrase
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Query.Term
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Query
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.QueryParams
 
write(DataOutput) - 
Method in class org.apache.nutch.searcher.Summary
 
writeResponse(SearchResults, HttpServletRequest, HttpServletResponse) - 
Method in interface org.apache.nutch.searcher.response.ResponseWriter
Writes out the search results response to the HttpServletResponse.
WWW_AUTHENTICATE - 
Static variable in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
The HTTP Authentication (WWW-Authenticate) header which is returned 
 by a webserver requiring authentication.



X

X_POINT_ID - 
Static variable in interface org.apache.nutch.clustering.OnlineClusterer
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.indexer.field.FieldFilter
 
X_POINT_ID - 
Static variable in interface org.apache.nutch.indexer.IndexingFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.net.URLFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.net.URLNormalizer
 
X_POINT_ID - 
Static variable in interface org.apache.nutch.ontology.Ontology
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.parse.HtmlParseFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.parse.Parser
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.protocol.Protocol
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.scoring.ScoringFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.searcher.QueryFilter
The name of the extension point.
X_POINT_ID - 
Static variable in interface org.apache.nutch.searcher.response.ResponseWriter
 
X_POINT_ID - 
Static variable in interface org.apache.nutch.searcher.Summarizer
The name of the extension point.
XMLCharacterRecognizer - Class in org.apache.nutch.parse.html
Class used to verify whether the specified ch 
 conforms to the XML 1.0 definition of whitespace.
XMLCharacterRecognizer() - 
Constructor for class org.apache.nutch.parse.html.XMLCharacterRecognizer
 



Z

zip(byte[]) - 
Static method in class org.apache.nutch.util.GZIPUtils
Returns an gzipped copy of the input array.
ZipParser - Class in org.apache.nutch.parse.zip
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter.
ZipParser() - 
Constructor for class org.apache.nutch.parse.zip.ZipParser
Creates a new instance of ZipParser
ZipTextExtractor - Class in org.apache.nutch.parse.zip
 
ZipTextExtractor(Configuration) - 
Constructor for class org.apache.nutch.parse.zip.ZipTextExtractor
Creates a new instance of ZipTextExtractor



_

__openPassiveDataConnection(int, String) - 
Method in class org.apache.nutch.protocol.ftp.Client
 
_compare(Object, Object) - 
Static method in class org.apache.nutch.crawl.SignatureComparator
 
_compare(byte[], int, int, byte[], int, int) - 
Static method in class org.apache.nutch.crawl.SignatureComparator
 


A B C D E F G H I J K L M N O P Q R S T U V W X Z _ 









  
      Overview 
      Package 
      Class 
      Use 
      Tree 
      Deprecated 
    Index 
      Help 
  









 PREV 
 NEXT

  FRAMES   
 NO FRAMES   
 










Copyright © 2006 The Apache Software Foundation