Package it.unimi.dsi.big.webgraph
Class BuildHostMap
- java.lang.Object
-
- it.unimi.dsi.big.webgraph.BuildHostMap
-
public class BuildHostMap extends java.lang.ObjectA class computing host-related data given a list of URLs (usually, the URLs of the nodes of a web graph). All processing is performed by the static utility methodrun(BufferedReader, PrintStream, DataOutputStream, DataOutputStream, boolean, ProgressLogger).Warning: this class provides a main method that saves the host list to standard output, but it does some logging, too, so be careful not to log to standard output.
- Author:
- Sebastiano Vigna
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.regex.PatternDOTTED_ADDRESS
-
Constructor Summary
Constructors Constructor Description BuildHostMap()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidmain(java.lang.String[] arg)static voidrun(java.io.BufferedReader br, java.io.PrintStream hosts, java.io.DataOutputStream mapDos, java.io.DataOutputStream countDos, boolean topPrivateDomain, it.unimi.dsi.logging.ProgressLogger pl)This method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.
-
-
-
Method Detail
-
run
public static void run(java.io.BufferedReader br, java.io.PrintStream hosts, java.io.DataOutputStream mapDos, java.io.DataOutputStream countDos, boolean topPrivateDomain, it.unimi.dsi.logging.ProgressLogger pl) throws java.io.IOException, java.net.URISyntaxExceptionThis method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.Warning: presently, this method uses an
Object2IntOpenHashMapto store the map from host names to host indices. Thus, it cannot handle more than ≈700 million hosts.- Parameters:
br- the buffered reader returning the list of URLs.hosts- the print stream where hosts will be printed.mapDos- the data output stream where the map from URLs to hosts will be written (one long per URL).countDos- the data output stream where the host counts will be written (one long per host).topPrivateDomain- if true, we useInternetDomainName.topPrivateDomain()to map to top private domains, rather than hosts.pl- a progress logger, ornull.- Throws:
java.io.IOExceptionjava.net.URISyntaxException
-
main
public static void main(java.lang.String[] arg) throws java.io.IOException, com.martiansoftware.jsap.JSAPException, java.net.URISyntaxException- Throws:
java.io.IOExceptioncom.martiansoftware.jsap.JSAPExceptionjava.net.URISyntaxException
-
-