|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
FetchSchedule
.NutchDocument.add(String, String)
instead and
set index-level metadata for field information.
String
can be decoded in reverse and the
first character is represented by a terminal node.
String
can be decoded and the last character is
represented by a terminal node.
NutchAnalyzer
plugins.ArchRecordReader
class provides a record reader which
reads records from arc files.ArcSegmentCreator
is a replacement for fetcher that will
take arc files as input and produce a nutch segment as output.CircularDependencyException
will be thrown if a circular
dependency is detected.MimeType
name by removing out the actual MimeType
,
from a string of the form:
OnlineClusterer
extension using clustering components of the Carrot2 project
(http://www.carrot2.org).OnlineClusterer
for documentation.
HitDetails
objects) and
their previously extracted summaries (String
s).
Configuration
for Nutch.
Text
object for the key.
ParseResult
from a single
Parse
output.
RegexRule
.
BytesWritable
object for the key
DomainSuffix
objects
Note: this class is singletonExtension
is a kind of listener descriptor that will be
installed on a concrete ExtensionPoint
that acts as kind of
Publisher.ExtensionPoint
provide meta information of a extension
point.HitSummarizer
and HitContent
for a set of
fetched segments.FetchSchedule
implementation.ParseStatus.isSuccess()
).
MimeTypes.forName(String)
method.
CrawlDatum.getScore()
.
Configurable
sanalyzer
implementation
given a language code.
DomainSuffix
object for the extension, if
extension is a top level domain returned object will be an
instance of TopLevelDomain
Configuration
for Nutch front-end.
List
of RSSChannel
s that the listener parsed from
the RSS document.
Configurable
String
description of the RSS Channel.
DomainSuffix
corresponding to the
last public part of the hostname
DomainSuffix
corresponding to the
last public part of the hostname
i
th field.
i
th hit in this list.
robotsMeta
to appropriate
values, based on any META tags found under the given
node
.
MimeTypes.getMimeType(String)
method.
MimeTypes.getMimeType(File)
method.
node
, and creates appropriate Outlink
records for each (relative to the supplied base
URL), and adds them to the outlinks
ArrayList
.
Outlink
from given plain text.
Outlink
from given plain text and adds anchor
to the extracted Outlink
s
Microsoft document
extractor
.
ParseImpl
.
Parser
instance with the specified
extId
, representing its extension ID.
Parser
s for a given content type.
Plugin
class.
null
.
Properties
of the Microsoft document.
Protocol
implementation for a url.
Content
for a fetchlist entry.
RecordReader
for reading the arc file.
url
with a configured HTTP client and
gets the response.
Summarizer
extension.
StringBuffer
and a DOM Node
,
and will append all the content text found beneath the DOM node to
the StringBuffer
.
getText(sb, node, false)
.
StringBuffer
and a DOM Node
,
and will append the content text found beneath the first
title
node to the StringBuffer
.
i
th field.
RawCluster
interface to
HitsCluster
interface.HtmlParseFilter
implementing plugins.IndexingFilter
implementing plugins.Searcher
and HitDetailer
for either a single
merged index, or a set of indexes.sizeLimit
bytes, if necessary.
Inlink
s.false
if the robots.txt
file
prohibits us from accessing the given url
, or
true
otherwise.
false
if the robots.txt
file
prohibits us from accessing the given path
, or
true
otherwise.
false
if the robots.txt
file
prohibits us from accessing the given url
, or
true
otherwise.
true
if this cluster constains documents
that did not fit anywhere else (presentation layer may
discard such clusters).
IndexingFilter
that
add a lang
(language) field to the document.s
padded with leading spaces so
that it's length is length
.
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the longest suffix of
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the longest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- LoopReader - Class in org.apache.nutch.scoring.webgraph
- The LoopReader tool prints the loopset information for a single url.
- LoopReader() -
Constructor for class org.apache.nutch.scoring.webgraph.LoopReader
-
- Loops - Class in org.apache.nutch.scoring.webgraph
- The Loops job identifies cycles of loops inside of the web graph.
- Loops() -
Constructor for class org.apache.nutch.scoring.webgraph.Loops
-
- Loops.Finalizer - Class in org.apache.nutch.scoring.webgraph
- Finishes the Loops job by aggregating and collecting and found routes.
- Loops.Finalizer() -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Finalizer
- Default constructor.
- Loops.Finalizer(Configuration) -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Finalizer
- Configurable constructor.
- Loops.Initializer - Class in org.apache.nutch.scoring.webgraph
- Initializes the Loop routes.
- Loops.Initializer() -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Initializer
- Default constructor.
- Loops.Initializer(Configuration) -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Initializer
- Configurable constructor.
- Loops.Looper - Class in org.apache.nutch.scoring.webgraph
- Follows a route path looking for the start url of the route.
- Loops.Looper() -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Looper
- Default constructor.
- Loops.Looper(Configuration) -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Looper
- Configurable constructor.
- Loops.LoopSet - Class in org.apache.nutch.scoring.webgraph
- A set of loops.
- Loops.LoopSet() -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.LoopSet
-
- Loops.Route - Class in org.apache.nutch.scoring.webgraph
- A link path or route looking to identify a link cycle.
- Loops.Route() -
Constructor for class org.apache.nutch.scoring.webgraph.Loops.Route
-
- LOOPS_DIR -
Static variable in class org.apache.nutch.scoring.webgraph.Loops
-
- LUCENE_PREFIX -
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
-
- LuceneConstants - Interface in org.apache.nutch.indexer.lucene
-
- LuceneSearchBean - Class in org.apache.nutch.searcher
-
- LuceneSearchBean(Configuration, Path, Path) -
Constructor for class org.apache.nutch.searcher.LuceneSearchBean
- Construct in a named directory.
- LuceneSummarizer - Class in org.apache.nutch.summary.lucene
- Implements hit summarization.
- LuceneSummarizer() -
Constructor for class org.apache.nutch.summary.lucene.LuceneSummarizer
-
- LuceneWriter - Class in org.apache.nutch.indexer.lucene
-
- LuceneWriter() -
Constructor for class org.apache.nutch.indexer.lucene.LuceneWriter
-
- LuceneWriter.INDEX - Enum in org.apache.nutch.indexer.lucene
-
- LuceneWriter.STORE - Enum in org.apache.nutch.indexer.lucene
-
- LuceneWriter.VECTOR - Enum in org.apache.nutch.indexer.lucene
-
TrieStringMatcher.TrieNode
visited, given that you are at
node
, and the the next character in the input is
the idx
'th character of s
.
String
is matched by a
prefix in the trie
String
is matched by a
suffix in the trie
String
is matched by a
pattern in the trie
application/vnd.ms-excel
).
application/vnd.ms-powerpoint
).
application/msword
).
MissingDependencyException
will be thrown if a plugin
dependency cannot be found.Node
on the stack and pushes all of its
children onto the stack, allowing us to walk the node tree without the
use of recursion.
Node
tree from the root node.
Configuration
s that include Nutch-specific
resources.RawDocument
required for Carrot2.summary
and wrapping
a details
hit details.
NutchDocument
is the unit of indexing.JobConf
for Nutch jobs.OnlineClusterer
extensions.Ontology
extensions.Plugin
System.http
,
httpclient
)Outlink
s
/ URLs from plain text using Regular Expressions.Parser
s
until a successful parse is performed and a Parse
object is
returned.
Content
object using the Parser
specified
by the parameter extId
, i.e., the Parser's extension ID.
Protocol
implementation.Parser
plugins.Parser
s to obtain
Parse
objects.Content
metadata.
PluginClassLoader
contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.PluginDescriptor
provide access to all meta information of
a nutch-plugin, as well to the internationalizable resources and the plugin
own classloader.PluginManifestParser
parser just parse the manifest file
in all plugin directories.PluginRuntimeException
will be thrown until a exception in the
plugin managemnt occurs.String
s against a set
of prefixes.PrefixStringMatcher
which will match
String
s with any prefix in the supplied array.
PrefixStringMatcher
which will match
String
s with any prefix in the supplied
Collection
.
ProtocolException
instead.Protocol
plugins.QueryFilter
implementing plugins.Java Regex implementation
.URL filter
based on
regular expressions.IndexingFilter
that
add tag
field(s) to the document."tag:" query clauses.- RelTagQueryFilter() -
Constructor for class org.apache.nutch.microformats.reltag.RelTagQueryFilter
-
- remove(Writable) -
Method in class org.apache.nutch.crawl.MapWritable
- Deprecated.
- remove(String) -
Method in class org.apache.nutch.metadata.Metadata
- Remove a metadata and all its associated values.
- remove(String) -
Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- removeField(String) -
Method in class org.apache.nutch.indexer.NutchDocument
-
- removeLockFile(FileSystem, Path) -
Static method in class org.apache.nutch.util.LockUtil
- Remove lock file.
- renameFile(String, String) -
Method in class org.apache.nutch.indexer.FsDirectory
-
- renderAnonymous(PrintStream, Resource, String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderClassDescription(PrintStream, OntClass, int) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderHierarchy(PrintStream, OntClass, List, int) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderRestriction(PrintStream, Restriction) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderURI(PrintStream, PrefixMapping, String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- replace(FileSystem, Path, Path, boolean) -
Static method in class org.apache.nutch.util.FSUtils
- Replaces the current path with the new path and if set removes the old
path.
- REPR_URL_KEY -
Static variable in interface org.apache.nutch.metadata.Nutch
-
- ReprUrlFixer - Class in org.apache.nutch.tools.compat
-
Significant changes were made to representative url logic used for redirects.
- ReprUrlFixer() -
Constructor for class org.apache.nutch.tools.compat.ReprUrlFixer
-
- RequestUtils - Class in org.apache.nutch.searcher.response
- A set of utility methods for getting request paramters.
- RequestUtils() -
Constructor for class org.apache.nutch.searcher.response.RequestUtils
-
- reset() -
Method in class org.apache.nutch.parse.HTMLMetaTags
- Sets all boolean values to
false
.
- resolveEncodingAlias(String) -
Static method in class org.apache.nutch.util.EncodingDetector
-
- ResolveUrls - Class in org.apache.nutch.tools
- A simple tool that will spin up multiple threads to resolve urls to ip
addresses.
- ResolveUrls(String) -
Constructor for class org.apache.nutch.tools.ResolveUrls
- Create a new ResolveUrls with a file from the local file system.
- ResolveUrls(String, int) -
Constructor for class org.apache.nutch.tools.ResolveUrls
- Create a new ResolveUrls with a urls file and a number of threads for the
Thread pool.
- resolveUrls() -
Method in class org.apache.nutch.tools.ResolveUrls
- Creates a thread pool for resolving urls.
- Response - Interface in org.apache.nutch.net.protocols
- A response inteface.
- RESPONSE_TYPE -
Static variable in class org.apache.nutch.searcher.response.SearchServlet
-
- ResponseWriter - Interface in org.apache.nutch.searcher.response
- Nutch extension point which allow writing search results in many different
output formats.
- ResponseWriters - Class in org.apache.nutch.searcher.response
- Utility class for getting all ResponseWriter implementations and for
returning the correct ResponseWriter for a given request type.
- ResponseWriters(Configuration) -
Constructor for class org.apache.nutch.searcher.response.ResponseWriters
- Constructor that configures the cache of ResponseWriter objects.
- retrieve(String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- retrieveFile(String, OutputStream, int) -
Method in class org.apache.nutch.protocol.ftp.Client
-
- retrieveList(String, List, int, FTPFileEntryParser) -
Method in class org.apache.nutch.protocol.ftp.Client
-
- RETRY -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
- Temporary failure.
- REVERSE -
Static variable in class org.apache.nutch.searcher.response.SearchServlet
-
- REVISION_NUMBER -
Static variable in interface org.apache.nutch.metadata.Office
-
- rightPad(String, int) -
Static method in class org.apache.nutch.util.StringUtil
- Returns a copy of
s
padded with trailing spaces so
that it's length is length
.
- RIGHTS -
Static variable in interface org.apache.nutch.metadata.DublinCore
- Information about rights held in and over the resource.
- RobotRules - Interface in org.apache.nutch.protocol
- This class holds the rules which were parsed from a robots.txt file, and can
test paths against those rules.
- RobotRulesParser - Class in org.apache.nutch.protocol.http.api
- This class handles the parsing of
robots.txt
files. - RobotRulesParser(Configuration) -
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser
-
- RobotRulesParser.RobotRuleSet - Class in org.apache.nutch.protocol.http.api
- This class holds the rules which were parsed from a robots.txt
file, and can test paths against those rules.
- RobotRulesParser.RobotRuleSet() -
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
-
- ROBOTS_DENIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
- Access denied by robots.txt rules.
- root -
Variable in class org.apache.nutch.util.TrieStringMatcher
-
- rootClasses(OntModel) -
Method in class org.apache.nutch.ontology.jena.OwlParser
-
- rootClasses(OntModel) -
Method in interface org.apache.nutch.ontology.jena.Parser
-
- ROUTES_DIR -
Static variable in class org.apache.nutch.scoring.webgraph.Loops
-
- ROWS -
Static variable in class org.apache.nutch.searcher.response.SearchServlet
-
- RPCSearchBean - Interface in org.apache.nutch.searcher
-
- RPCSegmentBean - Interface in org.apache.nutch.searcher
-
- RSSChannel - Class in org.apache.nutch.parse.rss.structs
-
Data class for holding RSS Channels to send to Nutch's indexer
- RSSChannel(String, String, String, List) -
Constructor for class org.apache.nutch.parse.rss.structs.RSSChannel
-
Default Constructor
- RSSChannel(String, String, String) -
Constructor for class org.apache.nutch.parse.rss.structs.RSSChannel
-
Constructor if you don't have the list of RSS Items ready yet.
- RSSItem - Class in org.apache.nutch.parse.rss.structs
-
Data class for holding RSS Items to send to Nutch's indexer
- RSSItem(String, String, String, String) -
Constructor for class org.apache.nutch.parse.rss.structs.RSSItem
-
- RSSParser - Class in org.apache.nutch.parse.rss
-
- RSSParser() -
Constructor for class org.apache.nutch.parse.rss.RSSParser
-
- RULES -
Static variable in class org.apache.nutch.protocol.EmptyRobotRules
-
- run(String[]) -
Method in class org.apache.nutch.crawl.CrawlDb
-
- run(String[]) -
Method in class org.apache.nutch.crawl.CrawlDbMerger
-
- run(String[]) -
Method in class org.apache.nutch.crawl.Generator
-
- run(String[]) -
Method in class org.apache.nutch.crawl.Injector
-
- run(String[]) -
Method in class org.apache.nutch.crawl.LinkDb
-
- run(String[]) -
Method in class org.apache.nutch.crawl.LinkDbMerger
-
- run(String[]) -
Method in class org.apache.nutch.crawl.LinkDbReader
-
- run(RecordReader<Text, CrawlDatum>, OutputCollector<Text, NutchWritable>, Reporter) -
Method in class org.apache.nutch.fetcher.Fetcher
-
- run(String[]) -
Method in class org.apache.nutch.fetcher.Fetcher
-
- run(RecordReader<WritableComparable, Writable>, OutputCollector<Text, NutchWritable>, Reporter) -
Method in class org.apache.nutch.fetcher.OldFetcher
-
- run(String[]) -
Method in class org.apache.nutch.fetcher.OldFetcher
-
- run(String[]) -
Method in class org.apache.nutch.indexer.DeleteDuplicates
-
- run(String[]) -
Method in class org.apache.nutch.indexer.field.AnchorFields
- Runs the AnchorFields job.
- run(String[]) -
Method in class org.apache.nutch.indexer.field.BasicFields
- Runs the BasicFields tool.
- run(String[]) -
Method in class org.apache.nutch.indexer.field.CustomFields
- Runs the CustomFields job.
- run(String[]) -
Method in class org.apache.nutch.indexer.field.FieldIndexer
-
- run(String[]) -
Method in class org.apache.nutch.indexer.Indexer
-
- run(String[]) -
Method in class org.apache.nutch.indexer.IndexMerger
-
- run(String[]) -
Method in class org.apache.nutch.indexer.IndexSorter
-
- run(String[]) -
Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- run(String[]) -
Method in class org.apache.nutch.indexer.solr.SolrIndexer
-
- run(String[]) -
Method in class org.apache.nutch.parse.ParseSegment
-
- run(String[]) -
Method in class org.apache.nutch.scoring.webgraph.LinkDumper
- Runs the LinkDumper tool.
- run(String[]) -
Method in class org.apache.nutch.scoring.webgraph.LinkRank
- Runs the LinkRank tool.
- run(String[]) -
Method in class org.apache.nutch.scoring.webgraph.Loops
- Runs the Loops tool.
- run(String[]) -
Method in class org.apache.nutch.scoring.webgraph.NodeDumper
- Runs the node dumper tool.
- run(String[]) -
Method in class org.apache.nutch.scoring.webgraph.ScoreUpdater
- Runs the ScoreUpdater tool.
- run(String[]) -
Method in class org.apache.nutch.scoring.webgraph.WebGraph
- Parses command link arguments and runs the WebGraph jobs.
- run(String[]) -
Method in class org.apache.nutch.tools.arc.ArcSegmentCreator
-
- run(String[]) -
Method in class org.apache.nutch.tools.compat.CrawlDbConverter
-
- run(String[]) -
Method in class org.apache.nutch.tools.compat.ReprUrlFixer
- Parse command line options and execute the main update logic.
- run(String[]) -
Method in class org.apache.nutch.tools.CrawlDBScanner
-
- run(String[]) -
Method in class org.apache.nutch.tools.FreeGenerator
-
- run() -
Method in class org.apache.nutch.tools.PruneIndexTool
- For each query, find all matching documents and delete them from all input
indexes.
- run(String[]) -
Method in class org.apache.nutch.util.domain.DomainStatistics
-
Fetcher
when processing
redirect URLs.
Generator
.
Injector
.
Outlink
instances.
URLPartitioner
.
ScoringFilter
implementing plugins.NutchBean.search(Query)
instead
NutchBean.search(Query)
instead
NutchBean.search(Query)
instead
NutchBean.search(Query)
instead
NutchBean.search(Query)
instead
Searcher.search(Query)
instead.
MetaWrapper
, to permit merging different
types in reduce and use additional metadata.baseHref
.
Configurable
fetchInterval
and fetchTime
on a
successfully fetched page.
fetchInterval
and fetchTime
on a
successfully fetched page.
noCache
to true
.
noFollow
to true
.
noIndex
to true
.
refresh
to the supplied value.
refreshHref
.
refreshTime
.
Hits.totalIsExact()
.
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the shortest suffix of
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the shortest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- shouldFetch(Text, CrawlDatum, long) -
Method in class org.apache.nutch.crawl.AbstractFetchSchedule
- This method provides information whether the page is suitable for
selection in the current fetchlist.
- shouldFetch(Text, CrawlDatum, long) -
Method in interface org.apache.nutch.crawl.FetchSchedule
- This method provides information whether the page is suitable for
selection in the current fetchlist.
- shutDown() -
Method in class org.apache.nutch.plugin.Plugin
- Shutdown the plugin.
- Signature - Class in org.apache.nutch.crawl
-
- Signature() -
Constructor for class org.apache.nutch.crawl.Signature
-
- SIGNATURE_KEY -
Static variable in interface org.apache.nutch.metadata.Nutch
-
- SignatureComparator - Class in org.apache.nutch.crawl
-
- SignatureComparator() -
Constructor for class org.apache.nutch.crawl.SignatureComparator
-
- SignatureFactory - Class in org.apache.nutch.crawl
- Factory class, which instantiates a Signature implementation according to the
current Configuration configuration.
- SIGRAM -
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
- RegularExpression Id.
- SITE -
Static variable in interface org.apache.nutch.indexer.field.Fields
-
- SiteQueryFilter - Class in org.apache.nutch.searcher.site
- Handles "site:" query clauses, causing them to search the field indexed by
SiteIndexingFilter.
- SiteQueryFilter() -
Constructor for class org.apache.nutch.searcher.site.SiteQueryFilter
-
- size() -
Method in class org.apache.nutch.crawl.Inlinks
-
- size() -
Method in class org.apache.nutch.crawl.MapWritable
- Deprecated.
- size() -
Method in class org.apache.nutch.metadata.Metadata
- Returns the number of metadata names in this metadata.
- size() -
Method in class org.apache.nutch.parse.ParseResult
- Return the number of parse outputs (both successful and failed)
- skip(DataInput) -
Static method in class org.apache.nutch.crawl.Inlink
- Skips over one Inlink in the input.
- skip(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
- Skips over one Outlink in the input.
- skipChildren() -
Method in class org.apache.nutch.util.NodeWalker
- Skips over and removes from the node stack the children of the last
node.
- skippedEntity(String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of a skipped entity.
- SLASH -
Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
- RegularExpression Id.
- SOLR_PREFIX -
Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- SolrConstants - Interface in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates - Class in org.apache.nutch.indexer.solr
- Utility class for deleting duplicate documents from a solr index.
- SolrDeleteDuplicates() -
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- SolrDeleteDuplicates.SolrInputFormat - Class in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates.SolrInputFormat() -
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
-
- SolrDeleteDuplicates.SolrInputSplit - Class in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates.SolrInputSplit() -
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- SolrDeleteDuplicates.SolrInputSplit(int, int) -
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- SolrDeleteDuplicates.SolrRecord - Class in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates.SolrRecord() -
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- SolrDeleteDuplicates.SolrRecord(String, float, long) -
Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- SolrIndexer - Class in org.apache.nutch.indexer.solr
-
- SolrIndexer() -
Constructor for class org.apache.nutch.indexer.solr.SolrIndexer
-
- SolrIndexer(Configuration) -
Constructor for class org.apache.nutch.indexer.solr.SolrIndexer
-
- SolrMappingReader - Class in org.apache.nutch.indexer.solr
-
- SolrMappingReader(Configuration) -
Constructor for class org.apache.nutch.indexer.solr.SolrMappingReader
-
- SolrSearchBean - Class in org.apache.nutch.searcher
-
- SolrSearchBean(Configuration, String) -
Constructor for class org.apache.nutch.searcher.SolrSearchBean
-
- SolrWriter - Class in org.apache.nutch.indexer.solr
-
- SolrWriter() -
Constructor for class org.apache.nutch.indexer.solr.SolrWriter
-
- sort(File) -
Method in class org.apache.nutch.indexer.IndexSorter
-
- SORT -
Static variable in class org.apache.nutch.searcher.response.SearchServlet
-
- SOURCE -
Static variable in interface org.apache.nutch.metadata.DublinCore
- A reference to a resource from which the present resource is derived.
- specialConstructor -
Variable in exception org.apache.nutch.analysis.ParseException
- This variable determines which constructor was used to create
this object and thereby affects the semantics of the
"getMessage" method (see below).
- SpellCheckedMetadata - Class in org.apache.nutch.metadata
- A decorator to Metadata that adds spellchecking capabilities to property
names.
- SpellCheckedMetadata() -
Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
-
- splitEnd -
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- splitLen -
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- splitStart -
Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- START -
Static variable in class org.apache.nutch.searcher.response.SearchServlet
-
- start -
Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- startCDATA() -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the start of a CDATA section.
- startDocument() -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of the beginning of a document.
- startDTD(String, String, String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the start of DTD declarations, if any.
- startElement(String, String, String, Attributes) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of the beginning of an element.
- startEntity(String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the beginning of an entity.
- startPrefixMapping(String, String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Begin the scope of a prefix-URI Namespace mapping.
- startProcessing(RequestContext) -
Method in class org.apache.nutch.clustering.carrot2.NutchInputComponent
- A callback hook that starts the processing.
- startUp() -
Method in class org.apache.nutch.plugin.Plugin
- Will be invoked until plugin start up.
- statNames -
Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- STATUS_BLOCKED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_DB_FETCHED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was successfully fetched.
- STATUS_DB_GONE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page no longer exists.
- STATUS_DB_MAX -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Maximum value of DB-related status.
- STATUS_DB_NOTMODIFIED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was successfully fetched and found not modified.
- STATUS_DB_REDIR_PERM -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page permanently redirects to other page.
- STATUS_DB_REDIR_TEMP -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page temporarily redirects to other page.
- STATUS_DB_UNFETCHED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was not fetched yet.
- STATUS_FAILED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_FAILURE -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_FETCH_GONE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching unsuccessful - page is gone.
- STATUS_FETCH_MAX -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Maximum value of fetch-related status.
- STATUS_FETCH_NOTMODIFIED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching successful - page is not modified.
- STATUS_FETCH_REDIR_PERM -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching permanently redirected to other page.
- STATUS_FETCH_REDIR_TEMP -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching temporarily redirected to other page.
- STATUS_FETCH_RETRY -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching unsuccessful, needs to be retried (transient errors).
- STATUS_FETCH_SUCCESS -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching was successful.
- STATUS_GONE -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_INJECTED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was newly injected.
- STATUS_LINKED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page discovered through a link.
- STATUS_MODIFIED -
Static variable in interface org.apache.nutch.crawl.FetchSchedule
- Page is known to have been modified since our last visit.
- STATUS_NOTFETCHING -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTFOUND -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTMODIFIED -
Static variable in interface org.apache.nutch.crawl.FetchSchedule
- Page is known to remain unmodified since our last visit.
- STATUS_NOTMODIFIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTPARSED -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_PARSE_META -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page got metadata from a parser
- STATUS_REDIR_EXCEEDED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_RETRY -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_ROBOTS_DENIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_SIGNATURE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page signature.
- STATUS_SUCCESS -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_SUCCESS -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_UNKNOWN -
Static variable in interface org.apache.nutch.crawl.FetchSchedule
- It is unknown whether page was changed since our last visit.
- STATUS_WOULDBLOCK -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STD_FORMAT -
Static variable in class org.apache.nutch.crawl.CrawlDbReader
-
- STORE_COMPRESS -
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
-
- STORE_NO -
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
-
- STORE_YES -
Static variable in interface org.apache.nutch.indexer.lucene.LuceneConstants
-
- StringUtil - Class in org.apache.nutch.util
- A collection of String processing utility methods.
- StringUtil() -
Constructor for class org.apache.nutch.util.StringUtil
-
- subclasses(String) -
Method in class org.apache.nutch.ontology.jena.OntologyImpl
- retrieve all subclasses of entity(ies) hashed to searchTerm
- subclasses(String) -
Method in interface org.apache.nutch.ontology.Ontology
-
- SUBJECT -
Static variable in interface org.apache.nutch.metadata.DublinCore
- The topic of the content of the resource.
- SUCCESS -
Static variable in class org.apache.nutch.parse.ParseStatus
- Parsing succeeded.
- SUCCESS -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
- Content was retrieved without errors.
- SUCCESS_REDIRECT -
Static variable in class org.apache.nutch.parse.ParseStatus
- Parsed content contains a directive to redirect to another URL.
- SuffixStringMatcher - Class in org.apache.nutch.util
- A class for efficiently matching
String
s against a set
of suffixes. - SuffixStringMatcher(String[]) -
Constructor for class org.apache.nutch.util.SuffixStringMatcher
- Creates a new
PrefixStringMatcher
which will match
String
s with any suffix in the supplied array.
- SuffixStringMatcher(Collection) -
Constructor for class org.apache.nutch.util.SuffixStringMatcher
- Creates a new
PrefixStringMatcher
which will match
String
s with any suffix in the supplied
Collection
- Summarizer - Interface in org.apache.nutch.searcher
- Extension point for summarizer.
- SummarizerFactory - Class in org.apache.nutch.searcher
- A factory for retrieving
Summarizer
extensions. - SummarizerFactory(Configuration) -
Constructor for class org.apache.nutch.searcher.SummarizerFactory
-
- SUMMARY -
Static variable in class org.apache.nutch.searcher.response.SearchServlet
-
- Summary - Class in org.apache.nutch.searcher
- A document summary dynamically generated to match a query.
- Summary() -
Constructor for class org.apache.nutch.searcher.Summary
- Constructs an empty Summary.
- Summary.Ellipsis - Class in org.apache.nutch.searcher
- An ellipsis fragment within a summary.
- Summary.Ellipsis() -
Constructor for class org.apache.nutch.searcher.Summary.Ellipsis
- Constructs an ellipsis fragment for the given text.
- Summary.Fragment - Class in org.apache.nutch.searcher
- A fragment of text within a summary.
- Summary.Fragment(String) -
Constructor for class org.apache.nutch.searcher.Summary.Fragment
- Constructs a fragment for the given text.
- Summary.Highlight - Class in org.apache.nutch.searcher
- A highlighted fragment of text within a summary.
- Summary.Highlight(String) -
Constructor for class org.apache.nutch.searcher.Summary.Highlight
- Constructs a highlighted fragment for the given text.
- SWFParser - Class in org.apache.nutch.parse.swf
- Parser for Flash SWF files.
- SWFParser() -
Constructor for class org.apache.nutch.parse.swf.SWFParser
-
- SwitchTo(int) -
Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
- Switch to specified lex state.
- synonyms(String) -
Method in class org.apache.nutch.ontology.jena.OntologyImpl
- retrieves synonyms from wordnet via sweet's web interface
- synonyms(String) -
Method in interface org.apache.nutch.ontology.Ontology
-
StringUtil.toHexString(byte[], String, int)
, where
sep = null; lineLen = Integer.MAX_VALUE
.
Hits.getTotal()
gives the exact number of hits, or false if
it is only an estimate of the total number of hits.
sizeLimit
bytes, if necessary.
URLFilter
implementing plugins.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |