Class IndexPDFFiles
- java.lang.Object
-
- org.apache.pdfbox.examples.lucene.IndexPDFFiles
-
public final class IndexPDFFiles extends java.lang.ObjectIndex all pdf files under a directory.This is a command-line application demonstrating simple Lucene indexing. Run it with no command-line arguments for usage information.
It's based on a demo provided by the lucene project.
Important: The pom.xml uses an outdated lucene version. Replace that version with the latest version to avoid security risks like CVE-2024-45772.
-
-
Constructor Summary
Constructors Modifier Constructor Description privateIndexPDFFiles()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description (package private) static voidindexDocs(org.apache.lucene.index.IndexWriter writer, java.io.File file)Indexes the given file using the given writer, or if a directory is given, recurses over files and directories found under the given directory.static voidmain(java.lang.String[] args)Index all text files under a directory.
-
-
-
Method Detail
-
main
public static void main(java.lang.String[] args)
Index all text files under a directory.- Parameters:
args- command line arguments
-
indexDocs
static void indexDocs(org.apache.lucene.index.IndexWriter writer, java.io.File file) throws java.io.IOExceptionIndexes the given file using the given writer, or if a directory is given, recurses over files and directories found under the given directory. NOTE: This method indexes one document per input file. This is slow. For good throughput, put multiple documents into your input file(s). An example of this is in the benchmark module, which can create "line doc" files, one document per line, using the WriteLineDocTask.- Parameters:
writer- Writer to the index where the given file/dir info will be storedfile- The file to index, or the directory to recurse into to find files to index- Throws:
java.io.IOException- If there is a low-level I/O error
-
-