(PHP 4, PHP 5)
htmlentities — Convert all applicable characters to HTML entities
This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.
If you're wanting to decode instead (the reverse) you can use html_entity_decode().
The input string.
A bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_COMPAT | ENT_HTML401.
| Constant Name | Description |
|---|---|
| ENT_COMPAT | Will convert double-quotes and leave single-quotes alone. |
| ENT_QUOTES | Will convert both double and single quotes. |
| ENT_NOQUOTES | Will leave both double and single quotes unconverted. |
| ENT_IGNORE | Silently discard invalid code unit sequences instead of returning an empty string. This is provided for backwards compatibility; avoid using it as it may have security implications. |
| ENT_SUBSTITUTE | Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string. |
| ENT_DISALLOWED | Replace code unit sequences, which are invalid in the specified document type, with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise). |
| ENT_HTML401 | Handle code as HTML 4.01. |
| ENT_XML1 | Handle code as XML 1. |
| ENT_XHTML | Handle code as XHTML. |
| ENT_HTML5 | Handle code as HTML 5. |
Like htmlspecialchars(), it takes an optional third argument charset which defines character set used in conversion. Presently, the ISO-8859-1 character set is used as the default. However, this default is very likely to change in future versions of PHP; the programmer is highly encouraged to specify a value.
Following character sets are supported in PHP 4.3.0 and later.
| Charset | Aliases | Description |
|---|---|---|
| ISO-8859-1 | ISO8859-1 | Western European, Latin-1 |
| ISO-8859-15 | ISO8859-15 | Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1). |
| UTF-8 | ASCII compatible multi-byte 8-bit Unicode. | |
| cp866 | ibm866, 866 | DOS-specific Cyrillic charset. This charset is supported in 4.3.2. |
| cp1251 | Windows-1251, win-1251, 1251 | Windows-specific Cyrillic charset. This charset is supported in 4.3.2. |
| cp1252 | Windows-1252, 1252 | Windows specific charset for Western European. |
| KOI8-R | koi8-ru, koi8r | Russian. This charset is supported in 4.3.2. |
| BIG5 | 950 | Traditional Chinese, mainly used in Taiwan. |
| GB2312 | 936 | Simplified Chinese, national standard character set. |
| BIG5-HKSCS | Big5 with Hong Kong extensions, Traditional Chinese. | |
| Shift_JIS | SJIS, 932 | Japanese |
| EUC-JP | EUCJP | Japanese |
| '' | An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale()), in this order. Not recommended. |
Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.
When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.
Returns the encoded string.
| Version | Description |
|---|---|
| 5.4.0 | The constants ENT_SUBSTITUTE, ENT_DISALLOWED, ENT_HTML401, ENT_XML1, ENT_XHTML and ENT_HTML5 were added. |
| 5.3.0 | The constant ENT_IGNORE was added. |
| 5.2.3 | The double_encode parameter was added. |
| 4.1.0 | The charset parameter was added. |
| 4.0.3 | The flags parameter was added. |
Example #1 A htmlentities() example
<?php
$str = "A 'quote' is <b>bold</b>";
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities($str);
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities($str, ENT_QUOTES);
?>
Example #2 Usage of ENT_IGNORE
<?php
$str = "\x8F!!!";
// Outputs an empty string
echo htmlentities($str, ENT_QUOTES, "UTF-8");
// Outputs "!!!"
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
?>