src.elv.shsrc.elv.sh/pkg/md Index | Files | Directories

package md

import "src.elv.sh/pkg/md"

Package md implements a Markdown parser.

To use this package, call Render with one of the Codec implementations:

Why another Markdown implementation?

The Elvish project uses Markdown in the documentation ("elvdoc") for the functions and variables defined in builtin modules. These docs are then converted to HTML as part of the website; for example, you can read the docs for builtin functions and variables at https://elv.sh/ref/builtin.html.

We used to use Pandoc to convert the docs from their Markdown sources to HTML. However, we would also like to expand the elvdoc system in two ways:

With these requirements, Elvish itself needs to know how to parse Markdown sources and render them in the terminal, so we need a Go implementation instead. There is a good Go implementation, github.com/yuin/goldmark, but it is quite large: linking it into Elvish will increase the binary size by more than 1MB. (There is another popular Markdown implementation, github.com/russross/blackfriday/v2, but it doesn't support CommonMark.)

By having a more narrow focus, this package is much smaller than goldmark, and can be easily optimized for Elvish's use cases. In contrast to goldmark's 1MB, including Render and HTMLCodec in Elvish only increases the binary size by 150KB. That said, the functionalities provided by this package still try to be as general as possible, and can potentially be used by other people interested in a small Markdown implementation.

Besides elvdocs, Pandoc was also used to convert all the other content on the Elvish website (https://elv.sh) to HTML. Additionally, Prettier used to be used to format all the Markdown files in the repo. Now that Elvish has its own Markdown implementation, we can use it not just for rendering elvdocs in the terminal, but also replace the use of Pandoc and Prettier. These external tools are decent, but using them still came with some frictions:

Replacing external tools with this package removes these frictions.

Additionally, this package is very easy to extend and optimize to suit Elvish's needs:

Which Markdown variant does this package implement?

This package implements a large subset of the CommonMark spec, with the following omissions:

The package also supports the following extensions:

These omitted features are never used in Elvish's Markdown sources.

All implemented features pass their relevant CommonMark spec tests, currently targeting CommonMark 0.31.2. See testutils_test.go for a complete list of which spec tests are skipped.

Is this package useful outside Elvish?

Yes! Well, hopefully. Assuming you don't use the features this package omits, it can be useful in at least the following ways:

Index

Variables

var UnescapeHTML = unescapeHTML

UnescapeHTML is used by the parser to unescape HTML entities and numeric character references.

The default implementation supports numeric character references, plus a minimal set of entities that are necessary for writing valid HTML or can appear in the output of FmtCodec. It can be set to html.UnescapeString for better CommonMark compliance.

Functions

func Render

func Render(text string, codec Codec)

Render parses markdown and renders it with a Codec.

func RenderInlineContentToHTML

func RenderInlineContentToHTML(sb *strings.Builder, ops []InlineOp)

RenderInlineContentToHTML renders inline content to HTML, writing to a strings.Builder. This is useful for implementing an alternative HTML-outputting Codec.

func RenderString

func RenderString(text string, codec StringerCodec) string

Render calls Render(text, codec) and returns codec.String(). This can be a bit more convenient to use than Render.

Types

type Codec

type Codec interface {
	Do(Op)
}

Codec is used to render output.

type FmtCodec

type FmtCodec struct {
	Width int
	// contains filtered or unexported fields
}

FmtCodec is a codec that formats Markdown in a specific style.

The only supported configuration option is the text width.

The formatted text uses the following style:

func (*FmtCodec) Do

func (c *FmtCodec) Do(op Op)

func (*FmtCodec) String

func (c *FmtCodec) String() string

func (*FmtCodec) Unsupported

func (c *FmtCodec) Unsupported() *FmtUnsupported

Unsupported returns information about use of unsupported features that may make the output incorrect. It returns nil if there is no use of unsupported features.

type FmtUnsupported

type FmtUnsupported struct {
	// Input contains emphasis or strong emphasis nested in another emphasis or
	// strong emphasis (not necessarily of the same type).
	NestedEmphasisOrStrongEmphasis bool
	// Input contains emphasis or strong emphasis that follows immediately after
	// another emphasis or strong emphasis (not necessarily of the same type).
	ConsecutiveEmphasisOrStrongEmphasis bool
}

FmtUnsupported contains information about use of unsupported features.

type HTMLCodec

type HTMLCodec struct {
	strings.Builder
	// If non-nil, will be called for each code block. The return value is
	// inserted into the HTML output and should be properly escaped.
	ConvertCodeBlock func(info, code string) string
}

HTMLCodec converts markdown to HTML.

func (*HTMLCodec) Do

func (c *HTMLCodec) Do(op Op)

type InlineOp

type InlineOp struct {
	Type InlineOpType
	// OpText, OpCodeSpan, OpRawHTML, OpAutolink: Text content
	// OpLinkStart, OpLinkEnd, OpImage: title text
	Text string
	// OpLinkStart, OpLinkEnd, OpImage, OpAutolink
	Dest string
	// ForOpImage
	Alt string
}

InlineOp represents an inline operation.

func (InlineOp) String

func (op InlineOp) String() string

String returns the text content of the InlineOp

type InlineOpType

type InlineOpType uint

InlineOpType enumerates possible types of an InlineOp.

const (
	// Text elements. Embedded newlines in OpText are turned into OpNewLine, but
	// OpRawHTML can contain embedded newlines. OpCodeSpan never contains
	// embedded newlines.
	OpText InlineOpType = iota
	OpCodeSpan
	OpRawHTML
	OpNewLine

	// Inline markup elements.
	OpEmphasisStart
	OpEmphasisEnd
	OpStrongEmphasisStart
	OpStrongEmphasisEnd
	OpLinkStart
	OpLinkEnd
	OpImage
	OpAutolink
	OpHardLineBreak
)

func (InlineOpType) String

func (i InlineOpType) String() string

type Op

type Op struct {
	Type OpType
	// 1-based line number. If the Op spans multiple lines, this identifies the
	// first line. For the *End types, this identifies the first line that
	// causes the block to be terminated, which can be the first line of another
	// block.
	LineNo int
	// For OpOrderedListStart (the start number) or OpHeading (as the heading
	// level)
	Number int
	// For OpHeading (attributes inside { }) and OpCodeBlock (text after opening
	// fence)
	Info string
	// For OpCodeBlock and OpHTMLBlock
	Lines []string
	// For OpParagraph and OpHeading
	Content []InlineOp
}

Op represents an operation for the Codec.

type OpType

type OpType uint

OpType enumerates possible types of an Op.

const (
	// Leaf blocks.
	OpThematicBreak OpType = iota
	OpHeading
	OpCodeBlock
	OpHTMLBlock
	OpParagraph

	// Container blocks.
	OpBlockquoteStart
	OpBlockquoteEnd
	OpListItemStart
	OpListItemEnd
	OpBulletListStart
	OpBulletListEnd
	OpOrderedListStart
	OpOrderedListEnd
)

Possible output operations.

func (OpType) String

func (i OpType) String() string

type SmartPunctsCodec

type SmartPunctsCodec struct{ Inner Codec }

SmartPunctsCodec wraps another codec, converting certain ASCII punctuations to nicer Unicode counterparts:

Start of lines are considered to be whitespaces.

func (SmartPunctsCodec) Do

func (c SmartPunctsCodec) Do(op Op)

type StringerCodec

type StringerCodec interface {
	Codec
	String() string
}

StringerCodec is a Codec that also implements the String method.

type TTYCodec

type TTYCodec struct {
	Width int
	// If non-nil, will be called to highlight the content of code blocks.
	HighlightCodeBlock func(info, code string) ui.Text
	// If non-nil, will be called for each relative link destination.
	ConvertRelativeLink func(dest string) string
	// contains filtered or unexported fields
}

TTYCodec renders Markdown in a terminal.

The rendered text uses the following style:

The structure of the implementation closely mirrors FmtCodec in a lot of places, without the complexity of handling all edge cases correctly, but with the slight complexity of handling styles.

func (*TTYCodec) Do

func (c *TTYCodec) Do(op Op)

Do processes an Op.

func (*TTYCodec) String

func (c *TTYCodec) String() string

String returns the rendering result as a string with ANSI escape sequences.

func (*TTYCodec) Text

func (c *TTYCodec) Text() ui.Text

Text returns the rendering result as a ui.Text.

type TextBlock

type TextBlock struct {
	Text string
	Code bool
}

TextBlock is a text block dumped by TextCodec.

type TextCodec

type TextCodec struct {
	// contains filtered or unexported fields
}

TextCodec is a codec that dumps the pure text content of Markdown.

func (*TextCodec) Blocks

func (c *TextCodec) Blocks() []TextBlock

func (*TextCodec) Do

func (c *TextCodec) Do(op Op)

type TraceCodec

type TraceCodec struct {
	strings.Builder
	// contains filtered or unexported fields
}

TraceCodec is a Codec that records all the Op's passed to its Do method.

func (*TraceCodec) Do

func (c *TraceCodec) Do(op Op)

func (*TraceCodec) Ops

func (c *TraceCodec) Ops() []Op

Source Files

fmt.go html.go inline.go md.go smart_puncts.go stack.go text.go trace.go tty.go zstring.go

Directories

PathSynopsis
pkg/md/mdrunCommand mdrun can be used to test the md package.
Version
v0.21.0 (latest)
Published
Aug 13, 2024
Platform
linux/amd64
Imports
9 packages
Last checked
4 days ago

Tools for package owners.