What are HTML Entities?
HTML entities are special strings that begin with an ampersand (&) and end with a semicolon (;). They represent characters that are either reserved in HTML (like < > & and quotes) or not easily typed on a keyboard (like em dashes, copyright symbols, or accented letters). For example, < represents the less-than sign (<) and & represents the ampersand itself. The HTML specification defines both named entities (like ©) and numeric entities (like © or ©) for hundreds of Unicode characters.
Why HTML Entity Encoding Matters
Encoding special characters as HTML entities is essential for two reasons. First, it prevents the browser from misinterpreting content as markup — a bare < in text would be read as a tag opening. Second, and critically, it defends against Cross-Site Scripting (XSS) attacks where malicious scripts are injected through user input. By encoding characters like < > and quotes before inserting user data into HTML, you neutralize potential attack vectors. Every web framework includes HTML escaping for this reason.
Named vs Numeric Entities
Named entities use human-readable codes like & < > " and '. Numeric entities use decimal (<) or hexadecimal (<) Unicode code points. Named entities are easier to read in source code, but only a subset of Unicode characters have named entities. Numeric entities can represent any Unicode character, making them more versatile for special symbols, mathematical notation, and international characters.
Best Practices
Always encode the five critical characters in user-generated content: < > & single quote and double quote. Use your framework or library built-in escaping functions rather than manual replacement to avoid missing edge cases. In modern HTML5, only & < > " and ' are strictly required entities — other Unicode characters can be included directly if the document uses UTF-8 encoding.





