bluemonday – github.com/microcosm-cc/bluemonday Index | Examples | Files | Directories

package bluemonday

import "github.com/microcosm-cc/bluemonday"

Package bluemonday provides a way of describing an allowlist of HTML elements and attributes as a policy, and for that policy to be applied to untrusted strings from users that may contain markup. All elements and attributes not on the allowlist will be stripped.

The default bluemonday.UGCPolicy().Sanitize() turns this:

Hello <STYLE>.XSS{background-image:url("javascript:alert('XSS')");}</STYLE><A CLASS=XSS></A>World

Into the more harmless:

Hello World

And it turns this:

<a href="javascript:alert('XSS1')" onmouseover="alert('XSS2')">XSS<a>

Into this:

XSS

Whilst still allowing this:

<a href="http://www.google.com/">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

To pass through mostly unaltered (it gained a rel="nofollow"):

<a href="http://www.google.com/" rel="nofollow">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

The primary purpose of bluemonday is to take potentially unsafe user generated content (from things like Markdown, HTML WYSIWYG tools, etc) and make it safe for you to put on your website.

It protects sites against XSS (http://en.wikipedia.org/wiki/Cross-site_scripting) and other malicious content that a user interface may deliver. There are many vectors for an XSS attack (https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) and the safest thing to do is to sanitize user input against a known safe list of HTML elements and attributes.

Note: You should always run bluemonday after any other processing.

If you use blackfriday (https://github.com/russross/blackfriday) or Pandoc (http://johnmacfarlane.net/pandoc/) then bluemonday should be run after these steps. This ensures that no insecure HTML is introduced later in your process.

bluemonday is heavily inspired by both the OWASP Java HTML Sanitizer (https://code.google.com/p/owasp-java-html-sanitizer/) and the HTML Purifier (http://htmlpurifier.org/).

We ship two default policies, one is bluemonday.StrictPolicy() and can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on its allowlist.

The other is bluemonday.UGCPolicy() and allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does not allow iframes, object, embed, styles, script, etc.

The essence of building a policy is to determine which HTML elements and attributes are considered safe for your scenario. OWASP provide an XSS prevention cheat sheet ( https://www.google.com/search?q=xss+prevention+cheat+sheet ) to help explain the risks, but essentially:

  1. Avoid allowing anything other than plain HTML elements
  2. Avoid allowing `script`, `style`, `iframe`, `object`, `embed`, `base` elements
  3. Avoid allowing anything other than plain HTML elements with simple values that you can match to a regexp
Example

Code:play 

package main

import (
	"fmt"
	"regexp"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// Create a new policy
	p := bluemonday.NewPolicy()

	// Add elements to a policy without attributes
	p.AllowElements("b", "strong")

	// Add elements as a virtue of adding an attribute
	p.AllowAttrs("nowrap").OnElements("td", "th")

	// Attributes can either be added to all elements
	p.AllowAttrs("dir").Globally()

	//Or attributes can be added to specific elements
	p.AllowAttrs("value").OnElements("li")

	// It is ALWAYS recommended that an attribute be made to match a pattern
	// XSS in HTML attributes is a very easy attack vector

	// \p{L} matches unicode letters, \p{N} matches unicode numbers
	p.AllowAttrs("title").Matching(regexp.MustCompile(`[\p{L}\p{N}\s\-_',:\[\]!\./\\\(\)&]*`)).Globally()

	// You can stop at any time and call .Sanitize()

	// Assumes that string htmlIn was passed in from a HTTP POST and contains
	// untrusted user generated content
	htmlIn := `untrusted user generated content <body onload="alert('XSS')">`
	fmt.Println(p.Sanitize(htmlIn))

	// And you can take any existing policy and extend it
	p = bluemonday.UGCPolicy()
	p.AllowElements("fieldset", "select", "option")

	// Links are complex beasts and one of the biggest attack vectors for
	// malicious content so we have included features specifically to help here.

	// This is not recommended:
	p = bluemonday.NewPolicy()
	p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a")

	// The regexp is insufficient in this case to have prevented a malformed
	// value doing something unexpected.

	// This will ensure that URLs are not considered invalid by Go's net/url
	// package.
	p.RequireParseableURLs(true)

	// If you have enabled parseable URLs then the following option will allow
	// relative URLs. By default this is disabled and will prevent all local and
	// schema relative URLs (i.e. `href="//www.google.com"` is schema relative).
	p.AllowRelativeURLs(true)

	// If you have enabled parseable URLs then you can allow the schemas
	// that are permitted. Bear in mind that allowing relative URLs in the above
	// option allows for blank schemas.
	p.AllowURLSchemes("mailto", "http", "https")

	// Regardless of whether you have enabled parseable URLs, you can force all
	// URLs to have a rel="nofollow" attribute. This will be added if it does
	// not exist.

	// This applies to "a" "area" "link" elements that have a "href" attribute
	p.RequireNoFollowOnLinks(true)

	// We provide a convenience function that applies all of the above, but you
	// will still need to allow the linkable elements:
	p = bluemonday.NewPolicy()
	p.AllowStandardURLs()
	p.AllowAttrs("cite").OnElements("blockquote")
	p.AllowAttrs("href").OnElements("a", "area")
	p.AllowAttrs("src").OnElements("img")

	// Policy Building Helpers

	// If you've got this far and you're bored already, we also bundle some
	// other convenience functions
	p = bluemonday.NewPolicy()
	p.AllowStandardAttributes()
	p.AllowImages()
	p.AllowLists()
	p.AllowTables()
}

Index

Examples

Variables

var (
	// CellAlign handles the `align` attribute
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-align
	CellAlign = regexp.MustCompile(`(?i)^(center|justify|left|right|char)$`)

	// CellVerticalAlign handles the `valign` attribute
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-valign
	CellVerticalAlign = regexp.MustCompile(`(?i)^(baseline|bottom|middle|top)$`)

	// Direction handles the `dir` attribute
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo#attr-dir
	Direction = regexp.MustCompile(`(?i)^(rtl|ltr)$`)

	// ImageAlign handles the `align` attribute on the `image` tag
	// http://www.w3.org/MarkUp/Test/Img/imgtest.html
	ImageAlign = regexp.MustCompile(
		`(?i)^(left|right|top|texttop|middle|absmiddle|baseline|bottom|absbottom)$`,
	)

	// Integer describes whole positive integers (including 0) used in places
	// like td.colspan
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-colspan
	Integer = regexp.MustCompile(`^[0-9]+$`)

	// ISO8601 according to the W3 group is only a subset of the ISO8601
	// standard: http://www.w3.org/TR/NOTE-datetime
	//
	// Used in places like time.datetime
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/time#attr-datetime
	//
	// Matches patterns:
	//  Year:
	//     YYYY (eg 1997)
	//  Year and month:
	//     YYYY-MM (eg 1997-07)
	//  Complete date:
	//     YYYY-MM-DD (eg 1997-07-16)
	//  Complete date plus hours and minutes:
	//     YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
	//  Complete date plus hours, minutes and seconds:
	//     YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
	//  Complete date plus hours, minutes, seconds and a decimal fraction of a
	//  second
	//      YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
	ISO8601 = regexp.MustCompile(
		`^[0-9]{4}(-[0-9]{2}(-[0-9]{2}([ T][0-9]{2}(:[0-9]{2}){1,2}(.[0-9]{1,6})` +
			`?Z?([\+-][0-9]{2}:[0-9]{2})?)?)?)?$`,
	)

	// ListType encapsulates the common value as well as the latest spec
	// values for lists
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ol#attr-type
	ListType = regexp.MustCompile(`(?i)^(circle|disc|square|a|A|i|I|1)$`)

	// SpaceSeparatedTokens is used in places like `a.rel` and the common attribute
	// `class` which both contain space delimited lists of data tokens
	// http://www.w3.org/TR/html-markup/datatypes.html#common.data.tokens-def
	// Regexp: \p{L} matches unicode letters, \p{N} matches unicode numbers
	SpaceSeparatedTokens = regexp.MustCompile(`^([\s\p{L}\p{N}_-]+)$`)

	// Number is a double value used on HTML5 meter and progress elements
	// http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-meter-element
	Number = regexp.MustCompile(`^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$`)

	// NumberOrPercent is used predominantly as units of measurement in width
	// and height attributes
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-height
	NumberOrPercent = regexp.MustCompile(`^[0-9]+[%]?$`)

	// Paragraph of text in an attribute such as *.'title', img.alt, etc
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes#attr-title
	// Note that we are not allowing chars that could close tags like '>'
	Paragraph = regexp.MustCompile(`^[\p{L}\p{N}\s\-_',\[\]!\./\\\(\)]*$`)
)

A selection of regular expressions that can be used as .Matching() rules on HTML attributes.

Types

type Policy

type Policy struct {
	// contains filtered or unexported fields
}

Policy encapsulates the allowlist of HTML elements and attributes that will be applied to the sanitised HTML.

You should use bluemonday.NewPolicy() to create a blank policy as the unexported fields contain maps that need to be initialized.

func NewPolicy

func NewPolicy() *Policy

NewPolicy returns a blank policy with nothing allowed or permitted. This is the recommended way to start building a policy and you should now use AllowAttrs() and/or AllowElements() to construct the allowlist of HTML elements and attributes.

Example

Code:play 

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// NewPolicy is a blank policy and we need to explicitly allow anything
	// that we wish to allow through
	p := bluemonday.NewPolicy()

	// We ensure any URLs are parseable and have rel="nofollow" where applicable
	p.AllowStandardURLs()

	// AllowStandardURLs already ensures that the href will be valid, and so we
	// can skip the .Matching()
	p.AllowAttrs("href").OnElements("a")

	// We allow paragraphs too
	p.AllowElements("p")

	html := p.Sanitize(
		`<p><a onblur="alert(secret)" href="http://www.google.com">Google</a></p>`,
	)

	fmt.Println(html)

}

Output:

<p><a href="http://www.google.com" rel="nofollow">Google</a></p>

func StrictPolicy

func StrictPolicy() *Policy

StrictPolicy returns an empty policy, which will effectively strip all HTML elements and their attributes from a document.

Example

Code:play 

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// StrictPolicy is equivalent to NewPolicy and as nothing else is declared
	// we are stripping all elements (and their attributes)
	p := bluemonday.StrictPolicy()

	html := p.Sanitize(
		`Goodbye <a onblur="alert(secret)" href="http://en.wikipedia.org/wiki/Goodbye_Cruel_World_(Pink_Floyd_song)">Cruel</a> World`,
	)

	fmt.Println(html)

}

Output:

Goodbye Cruel World

func StripTagsPolicy

func StripTagsPolicy() *Policy

StripTagsPolicy is DEPRECATED. Use StrictPolicy instead.

func UGCPolicy

func UGCPolicy() *Policy

UGCPolicy returns a policy aimed at user generated content that is a result of HTML WYSIWYG tools and Markdown conversions.

This is expected to be a fairly rich document where as much markup as possible should be retained. Markdown permits raw HTML so we are basically providing a policy to sanitise HTML5 documents safely but with the least intrusion on the formatting expectations of the user.

Example

Code:play 

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	html := p.Sanitize(
		`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
	)

	fmt.Println(html)

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) AddSpaceWhenStrippingTag

func (p *Policy) AddSpaceWhenStrippingTag(allow bool) *Policy

AddSpaceWhenStrippingTag states whether to add a single space " " when removing tags that are not allowed by the policy.

This is useful if you expect to strip tags in dense markup and may lose the value of whitespace.

For example: "<p>Hello</p><p>World</p>"" would be sanitized to "HelloWorld" with the default value of false, but you may wish to sanitize this to " Hello World " by setting AddSpaceWhenStrippingTag to true as this would retain the intent of the text.

func (p *Policy) AddTargetBlankToFullyQualifiedLinks(require bool) *Policy

AddTargetBlankToFullyQualifiedLinks will result in all a, area and link tags that point to a non-local destination (i.e. starts with a protocol and has a host) having a target="_blank" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (*Policy) AllowAttrs

func (p *Policy) AllowAttrs(attrNames ...string) *attrPolicyBuilder

AllowAttrs takes a range of HTML attribute names and returns an attribute policy builder that allows you to specify the pattern and scope of the allowed attribute.

The attribute policy is only added to the core policy when either Globally() or OnElements(...) are called.

Example

Code:play 

package main

import (
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Allow the 'title' attribute on every HTML element that has been
	// allowed
	p.AllowAttrs("title").Matching(bluemonday.Paragraph).Globally()

	// Allow the 'abbr' attribute on only the 'td' and 'th' elements.
	p.AllowAttrs("abbr").Matching(bluemonday.Paragraph).OnElements("td", "th")

	// Allow the 'colspan' and 'rowspan' attributes, matching a positive integer
	// pattern, on only the 'td' and 'th' elements.
	p.AllowAttrs("colspan", "rowspan").Matching(
		bluemonday.Integer,
	).OnElements("td", "th")
}

func (*Policy) AllowComments

func (p *Policy) AllowComments()

AllowComments allows comments.

Please note that only one type of comment will be allowed by this, this is the the standard HTML comment <!-- --> which includes the use of that to permit conditionals as per https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/compatibility/ms537512(v=vs.85)?redirectedfrom=MSDN

What is not permitted are CDATA XML comments, as the x/net/html package we depend on does not handle this fully and we are not choosing to take on that work: https://pkg.go.dev/golang.org/x/net/html#Tokenizer.AllowCDATA . If the x/net/html package changes this then these will be considered, otherwise if you AllowComments but provide a CDATA comment, then as per the documentation in x/net/html this will be treated as a plain HTML comment.

func (*Policy) AllowDataAttributes

func (p *Policy) AllowDataAttributes()

AllowDataAttributes permits all data attributes. We can't specify the name of each attribute exactly as they are customized.

NOTE: These values are not sanitized and applications that evaluate or process them without checking and verification of the input may be at risk if this option is enabled. This is a 'caveat emptor' option and the person enabling this option needs to fully understand the potential impact with regards to whatever application will be consuming the sanitized HTML afterwards, i.e. if you know you put a link in a data attribute and use that to automatically load some new window then you're giving the author of a HTML fragment the means to open a malicious destination automatically. Use with care!

func (*Policy) AllowDataURIImages

func (p *Policy) AllowDataURIImages()

AllowDataURIImages permits the use of inline images defined in RFC2397 http://tools.ietf.org/html/rfc2397 http://en.wikipedia.org/wiki/Data_URI_scheme

Images must have a mimetype matching:

image/gif
image/jpeg
image/png
image/webp

NOTE: There is a potential security risk to allowing data URIs and you should only permit them on content you already trust. http://palizine.plynt.com/issues/2010Oct/bypass-xss-filters/ https://capec.mitre.org/data/definitions/244.html

func (*Policy) AllowElements

func (p *Policy) AllowElements(names ...string) *Policy

AllowElements will append HTML elements to the allowlist without applying an attribute policy to those elements (the elements are permitted sans-attributes)

Example

Code:play 

package main

import (
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Allow styling elements without attributes
	p.AllowElements("br", "div", "hr", "p", "span")
}

func (*Policy) AllowElementsContent

func (p *Policy) AllowElementsContent(names ...string) *Policy

AllowElementsContent marks the HTML elements whose content should be retained after removing the tag.

func (*Policy) AllowElementsMatching

func (p *Policy) AllowElementsMatching(regex *regexp.Regexp) *Policy

AllowElementsMatching will append HTML elements to the allowlist if they match a regexp.

func (*Policy) AllowIFrames

func (p *Policy) AllowIFrames(vals ...SandboxValue)

func (*Policy) AllowImages

func (p *Policy) AllowImages()

AllowImages enables the img element and some popular attributes. It will also ensure that URL values are parseable. This helper does not enable data URI images, for that you should also use the AllowDataURIImages() helper.

func (*Policy) AllowLists

func (p *Policy) AllowLists()

AllowLists will enabled ordered and unordered lists, as well as definition lists

func (*Policy) AllowNoAttrs

func (p *Policy) AllowNoAttrs() *attrPolicyBuilder

AllowNoAttrs says that attributes on element are optional.

The attribute policy is only added to the core policy when OnElements(...) are called.

func (*Policy) AllowRelativeURLs

func (p *Policy) AllowRelativeURLs(require bool) *Policy

AllowRelativeURLs enables RequireParseableURLs and then permits URLs that are parseable, have no schema information and url.IsAbs() returns false This permits local URLs

func (*Policy) AllowStandardAttributes

func (p *Policy) AllowStandardAttributes()

AllowStandardAttributes will enable "id", "title" and the language specific attributes "dir" and "lang" on all elements that are allowed

func (*Policy) AllowStandardURLs

func (p *Policy) AllowStandardURLs()

AllowStandardURLs is a convenience function that will enable rel="nofollow" on "a", "area" and "link" (if you have allowed those elements) and will ensure that the URL values are parseable and either relative or belong to the "mailto", "http", or "https" schemes

func (*Policy) AllowStyles

func (p *Policy) AllowStyles(propertyNames ...string) *stylePolicyBuilder

AllowStyles takes a range of CSS property names and returns a style policy builder that allows you to specify the pattern and scope of the allowed property.

The style policy is only added to the core policy when either Globally() or OnElements(...) are called.

Example

Code:play 

package main

import (
	"fmt"
	"regexp"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Allow only 'span' and 'p' elements
	p.AllowElements("span", "p", "strong")

	// Only allow 'style' attributes on 'span' and 'p' elements
	p.AllowAttrs("style").OnElements("span", "p")

	// Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none'
	// on 'span' elements only
	p.AllowStyles("text-decoration").MatchingEnum("underline", "line-through", "none").OnElements("span")

	// Allow the 'color' property with valid RGB(A) hex values only
	// on every HTML element that has been allowed
	p.AllowStyles("color").Matching(regexp.MustCompile("(?i)^#([0-9a-f]{3,4}|[0-9a-f]{6}|[0-9a-f]{8})$")).Globally()

	// Default handler
	p.AllowStyles("background-origin").Globally()

	// The span has an invalid 'color' which will be stripped along with other disallowed properties
	html := p.Sanitize(
		`<p style="color:#f00;">
	<span style="text-decoration: underline; background-image: url(javascript:alert('XSS')); color: #f00ba; background-origin: invalidValue">
		Red underlined <strong style="text-decoration:none;">text</strong>
	</span>
</p>`,
	)

	fmt.Println(html)

}

Output:

<p style="color: #f00">
	<span style="text-decoration: underline">
		Red underlined <strong>text</strong>
	</span>
</p>

func (*Policy) AllowStyling

func (p *Policy) AllowStyling()

AllowStyling presently enables the class attribute globally.

Note: When bluemonday ships a CSS parser and we can safely sanitise that, this will also allow sanitized styling of elements via the style attribute.

func (*Policy) AllowTables

func (p *Policy) AllowTables()

AllowTables will enable a rich set of elements and attributes to describe HTML tables

func (*Policy) AllowURLSchemeWithCustomPolicy

func (p *Policy) AllowURLSchemeWithCustomPolicy(
	scheme string,
	urlPolicy func(url *url.URL) (allowUrl bool),
) *Policy

AllowURLSchemeWithCustomPolicy will append URL schemes with a custom URL policy to the allowlist. Only the URLs with matching schema and urlPolicy(url) returning true will be allowed.

func (*Policy) AllowURLSchemes

func (p *Policy) AllowURLSchemes(schemes ...string) *Policy

AllowURLSchemes will append URL schemes to the allowlist Example: p.AllowURLSchemes("mailto", "http", "https")

func (*Policy) AllowURLSchemesMatching

func (p *Policy) AllowURLSchemesMatching(r *regexp.Regexp) *Policy

AllowURLSchemesMatching will append URL schemes to the allowlist if they match a regexp.

func (*Policy) AllowUnsafe

func (p *Policy) AllowUnsafe(allowUnsafe bool) *Policy

AllowUnsafe permits fundamentally unsafe elements.

If false (default) then elements such as `style` and `script` will not be permitted even if declared in a policy. These elements when combined with untrusted input cannot be safely handled by bluemonday at this point in time.

If true then `style` and `script` would be permitted by bluemonday if a policy declares them. However this is not recommended under any circumstance and can lead to XSS being rendered thus defeating the purpose of using a HTML sanitizer.

func (*Policy) RequireCrossOriginAnonymous

func (p *Policy) RequireCrossOriginAnonymous(require bool) *Policy

RequireCrossOriginAnonymous will result in all audio, img, link, script, and video tags having a crossorigin="anonymous" added to them if one does not already exist

func (p *Policy) RequireNoFollowOnFullyQualifiedLinks(require bool) *Policy

RequireNoFollowOnFullyQualifiedLinks will result in all a, area, and link tags that point to a non-local destination (i.e. starts with a protocol and has a host) having a rel="nofollow" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (p *Policy) RequireNoFollowOnLinks(require bool) *Policy

RequireNoFollowOnLinks will result in all a, area, link tags having a rel="nofollow"added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (p *Policy) RequireNoReferrerOnFullyQualifiedLinks(require bool) *Policy

RequireNoReferrerOnFullyQualifiedLinks will result in all a, area, and link tags that point to a non-local destination (i.e. starts with a protocol and has a host) having a rel="noreferrer" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (p *Policy) RequireNoReferrerOnLinks(require bool) *Policy

RequireNoReferrerOnLinks will result in all a, area, and link tags having a rel="noreferrrer" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (*Policy) RequireParseableURLs

func (p *Policy) RequireParseableURLs(require bool) *Policy

RequireParseableURLs will result in all URLs requiring that they be parseable by "net/url" url.Parse() This applies to: - a.href - area.href - blockquote.cite - img.src - link.href - script.src

func (*Policy) RequireSandboxOnIFrame

func (p *Policy) RequireSandboxOnIFrame(vals ...SandboxValue)

RequireSandboxOnIFrame will result in all iframe tags having a sandbox="" tag Any sandbox values not specified here will be filtered from the generated HTML

func (*Policy) RewriteSrc

func (p *Policy) RewriteSrc(fn urlRewriter) *Policy

RewriteSrc will rewrite the src attribute of a resource downloading tag (e.g. <img>, <script>, <iframe>) using the provided function.

Typically the use case here is that if the content that we're sanitizing is untrusted then the content that is inlined is also untrusted. To prevent serving this content on the same domain as the content appears on it is good practise to proxy the content through an additional domain name as this will force the web client to consider the inline content as third party to the main content, thus providing browser isolation around the inline content.

An example of this is a web mail provider like fastmail.com , when an email (user generated content) is displayed, the email text is shown on fastmail.com but the inline attachments and content are rendered from fastmailusercontent.com . This proxying of the external content on a domain that is different to the content domain forces the browser domain security model to kick in. Note that this only applies to differences below the suffix (as per the publix suffix list).

This is a good practise to adopt as it prevents the content from being able to set cookies on the main domain and thus prevents the content on the main domain from being able to read those cookies.

func (*Policy) Sanitize

func (p *Policy) Sanitize(s string) string

Sanitize takes a string that contains a HTML fragment or document and applies the given policy allowlist.

It returns a HTML string that has been sanitized by the policy or an empty string if an error has occurred (most likely as a consequence of extremely malformed input)

Example

Code:play 

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	// string in, string out
	html := p.Sanitize(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)

	fmt.Println(html)

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) SanitizeBytes

func (p *Policy) SanitizeBytes(b []byte) []byte

SanitizeBytes takes a []byte that contains a HTML fragment or document and applies the given policy allowlist.

It returns a []byte containing the HTML that has been sanitized by the policy or an empty []byte if an error has occurred (most likely as a consequence of extremely malformed input)

Example

Code:play 

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	// []byte in, []byte out
	b := []byte(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)
	b = p.SanitizeBytes(b)

	fmt.Println(string(b))

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) SanitizeReader

func (p *Policy) SanitizeReader(r io.Reader) *bytes.Buffer

SanitizeReader takes an io.Reader that contains a HTML fragment or document and applies the given policy allowlist.

It returns a bytes.Buffer containing the HTML that has been sanitized by the policy. Errors during sanitization will merely return an empty result.

Example

Code:play 

package main

import (
	"fmt"
	"strings"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	// io.Reader in, bytes.Buffer out
	r := strings.NewReader(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)
	buf := p.SanitizeReader(r)

	fmt.Println(buf.String())

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) SanitizeReaderToWriter

func (p *Policy) SanitizeReaderToWriter(r io.Reader, w io.Writer) error

SanitizeReaderToWriter takes an io.Reader that contains a HTML fragment or document and applies the given policy allowlist and writes to the provided writer returning an error if there is one.

func (*Policy) SkipElementsContent

func (p *Policy) SkipElementsContent(names ...string) *Policy

SkipElementsContent adds the HTML elements whose tags is needed to be removed with its content.

type Query

type Query struct {
	Key      string
	Value    string
	HasValue bool
}

Query represents a single part of the query string, a query param

type SandboxValue

type SandboxValue int64
const (
	SandboxAllowDownloads SandboxValue = iota
	SandboxAllowDownloadsWithoutUserActivation
	SandboxAllowForms
	SandboxAllowModals
	SandboxAllowOrientationLock
	SandboxAllowPointerLock
	SandboxAllowPopups
	SandboxAllowPopupsToEscapeSandbox
	SandboxAllowPresentation
	SandboxAllowSameOrigin
	SandboxAllowScripts
	SandboxAllowStorageAccessByUserActivation
	SandboxAllowTopNavigation
	SandboxAllowTopNavigationByUserActivation
)

Source Files

doc.go helpers.go policies.go policy.go sanitize.go

Directories

PathSynopsis
cmd
cmd/sanitise_html_emailPackage main demonstrates a HTML email cleaner.
cmd/sanitise_ugcPackage main demonstrates a simple user generated content sanitizer.
css
Version
v1.0.27 (latest)
Published
Jul 4, 2024
Platform
linux/amd64
Imports
11 packages
Last checked
2 days ago

Tools for package owners.