Class IndexPDFFiles

java.lang.Object
org.apache.pdfbox.examples.lucene.IndexPDFFiles

public final class IndexPDFFiles extends Object
Index all pdf files under a directory.

This is a command-line application demonstrating simple Lucene indexing. Run it with no command-line arguments for usage information.

It's based on a demo provided by the lucene project.

Important: The pom.xml uses an outdated lucene version. Replace that version with the latest version to avoid security risks like CVE-2024-45772.

  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
     
  • Method Summary

    Modifier and Type
    Method
    Description
    (package private) static void
    indexDocs(org.apache.lucene.index.IndexWriter writer, File file)
    Indexes the given file using the given writer, or if a directory is given, recurses over files and directories found under the given directory.
    static void
    main(String[] args)
    Index all text files under a directory.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • IndexPDFFiles

      private IndexPDFFiles()
  • Method Details

    • main

      public static void main(String[] args)
      Index all text files under a directory.
      Parameters:
      args - command line arguments
    • indexDocs

      static void indexDocs(org.apache.lucene.index.IndexWriter writer, File file) throws IOException
      Indexes the given file using the given writer, or if a directory is given, recurses over files and directories found under the given directory. NOTE: This method indexes one document per input file. This is slow. For good throughput, put multiple documents into your input file(s). An example of this is in the benchmark module, which can create "line doc" files, one document per line, using the WriteLineDocTask.
      Parameters:
      writer - Writer to the index where the given file/dir info will be stored
      file - The file to index, or the directory to recurse into to find files to index
      Throws:
      IOException - If there is a low-level I/O error