Module z_html_parse

Loosely tokenizes and generates parse trees for (X)HTML and XML.

Copyright © 2007 Mochi Media, Inc.; copyright 2018-2021 Maas-Maarten Zeeman

Authors: Bob Ippolito (bob@mochimedia.com).

Description

Loosely tokenizes and generates parse trees for (X)HTML and XML. Adapted by Maas-Maarten Zeeman Extended for basic XML parsing by Marc Worrell

Data Types

end_tag()

end_tag() = {end_tag, Name::binary()}

html_attr()

html_attr() = {html_attr_name(), html_attr_value()}

html_attr_name()

html_attr_name() = binary() | string() | atom()

html_attr_value()

html_attr_value() = binary() | string() | atom() | number()

html_comment()

html_comment() = {comment, Comment::binary()}

html_data()

html_data() = {data, binary(), Whitespace::boolean()}

html_doctype()

html_doctype() = {doctype, [Doctype::any()]}

html_element()

html_element() = html_node() | html_comment() | html_nop() | pi_tag() | inline_html() | {html_tag()} | {html_tag(), [html_element()]} | binary()

html_node()

html_node() = {html_tag(), [html_attr()], [html_element()]}

html_nop()

html_nop() = {nop, [html_element()]}

Special node used by sanitizer for unwanted elements

html_tag()

html_tag() = binary() | string() | atom()

html_token()

html_token() = html_data() | start_tag() | end_tag() | pi_tag() | inline_html() | html_comment() | html_doctype()

html_tree()

html_tree() = html_doctype() | html_node() | html_comment() | inline_html() | {html_tag()} | {html_tag(), [html_element()]} | pi_tag()

inline_html()

inline_html() = {'=', binary()}

options()

options() = #{mode => xml | html, escape => boolean(), lowercase => boolean()}

pi_tag()

pi_tag() = {pi, binary()} | {pi, Tag::binary(), [html_attr()]}

start_tag()

start_tag() = {start_tag, Name::binary(), [html_attr()], Singleton::boolean()}

Function Index

escape/1Escape a string such that it's safe for HTML (amp; lt; gt;).
escape_attr/1Escape a string such that it's safe for HTML attrs (amp; lt; gt; quot;).
parse/1tokenize and then transform the token stream into a HTML tree.
parse/2
parse_to_map/1Parse an HTML/XML document to a JSON compatible map.
parse_to_map/2Parse an HTML/XML document to a JSON compatible map.
parse_tokens/1Transform the output of tokens(Doc) into a HTML tree.
to_html/1Convert a list of html_token() to a HTML document.
to_html/2
to_tokens/1Convert a html_node() tree to a list of tokens.
to_tokens/2
tokens/1Transform the input UTF-8 HTML into a token stream.
tokens/2

Function Details

escape/1

escape(B::string() | atom() | binary()) -> binary()

Escape a string such that it's safe for HTML (amp; lt; gt;).

escape_attr/1

escape_attr(B::string() | binary() | atom() | integer() | float()) -> binary()

Escape a string such that it's safe for HTML attrs (amp; lt; gt; quot;).

parse/1

parse(Input::iodata()) -> {ok, html_node()} | {error, nohtml}

tokenize and then transform the token stream into a HTML tree.

parse/2

parse(Input::iodata(), Options::options()) -> {ok, html_node()} | {error, nohtml}

parse_to_map/1

parse_to_map(Input::iodata() | {binary, list(), list()}) -> {ok, map()} | {error, term()}

Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an @attributes key. Elements will be mapped to keys with value lists. all keys are lowercased.

parse_to_map/2

parse_to_map(Input::iodata() | {binary, list(), list()}, Options::options()) -> {ok, map()} | {error, term()}

Parse an HTML/XML document to a JSON compatible map. Attributes will be added as keys in an @attributes key. Elements will be mapped to keys with value lists. all keys are lowercased.

parse_tokens/1

parse_tokens(Tokens::[html_token()]) -> {ok, html_node()} | {error, nohtml}

Transform the output of tokens(Doc) into a HTML tree.

to_html/1

to_html(Node::[html_token()] | html_tree()) -> iodata()

Convert a list of html_token() to a HTML document.

to_html/2

to_html(Node::[html_token()] | html_tree(), Options::options()) -> iodata()

to_tokens/1

to_tokens(HtmlNode::html_tree()) -> [html_token()]

Convert a html_node() tree to a list of tokens.

to_tokens/2

to_tokens(T::html_tree(), Options::options()) -> [html_token()]

tokens/1

tokens(Input::iodata()) -> [html_token()]

Transform the input UTF-8 HTML into a token stream.

tokens/2

tokens(Input::iodata(), Options::options()) -> [html_token()]


Generated by EDoc