utf8

Constants & Variables

MaxRune const #

Numbers fundamental to the encoding.

const MaxRune = '\U0010FFFF'

RuneError const #

Numbers fundamental to the encoding.

const RuneError = '\uFFFD'

RuneSelf const #

Numbers fundamental to the encoding.

const RuneSelf = 0x80

UTFMax const #

Numbers fundamental to the encoding.

const UTFMax = 4

acceptRanges var #

acceptRanges has size 16 to avoid bounds checks in the code that uses it.

var acceptRanges = [16]acceptRange{...}

as const #

const as = 0xF0

first var #

first is information about the first byte in a UTF-8 sequence.

var first = [256]uint8{...}

hicb const #

const hicb = 0b10111111

locb const #

The default lowest and highest continuation byte.

const locb = 0b10000000

mask2 const #

const mask2 = 0b00011111

mask3 const #

const mask3 = 0b00001111

mask4 const #

const mask4 = 0b00000111

maskx const #

const maskx = 0b00111111

rune1Max const #

const rune1Max = *ast.BinaryExpr

rune2Max const #

const rune2Max = *ast.BinaryExpr

rune3Max const #

const rune3Max = *ast.BinaryExpr

runeErrorByte0 const #

const runeErrorByte0 = *ast.BinaryExpr

runeErrorByte1 const #

const runeErrorByte1 = *ast.BinaryExpr

runeErrorByte2 const #

const runeErrorByte2 = *ast.BinaryExpr

s1 const #

const s1 = 0x02

s2 const #

const s2 = 0x13

s3 const #

const s3 = 0x03

s4 const #

const s4 = 0x23

s5 const #

const s5 = 0x34

s6 const #

const s6 = 0x04

s7 const #

const s7 = 0x44

surrogateMax const #

Code points in the surrogate range are not valid for UTF-8.

const surrogateMax = 0xDFFF

surrogateMin const #

Code points in the surrogate range are not valid for UTF-8.

const surrogateMin = 0xD800

t1 const #

const t1 = 0b00000000

t2 const #

const t2 = 0b11000000

t3 const #

const t3 = 0b11100000

t4 const #

const t4 = 0b11110000

t5 const #

const t5 = 0b11111000

tx const #

const tx = 0b10000000

xx const #

These names of these constants are chosen to give nice alignment in the table below. The first nibble is an index into acceptRanges or F for special one-byte cases. The second nibble is the Rune length or the Status for the special one-byte case.

const xx = 0xF1

Structs

acceptRange struct #

acceptRange gives the range of valid values for the second byte in a UTF-8 sequence.

type acceptRange struct {
lo uint8
hi uint8
}

Functions

AppendRune function #

AppendRune appends the UTF-8 encoding of r to the end of p and returns the extended buffer. If the rune is out of range, it appends the encoding of [RuneError].

func AppendRune(p []byte, r rune) []byte

DecodeLastRune function #

DecodeLastRune unpacks the last UTF-8 encoding in p and returns the rune and its width in bytes. If p is empty it returns ([RuneError], 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8. An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.

func DecodeLastRune(p []byte) (r rune, size int)

DecodeLastRuneInString function #

DecodeLastRuneInString is like [DecodeLastRune] but its input is a string. If s is empty it returns ([RuneError], 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8. An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.

func DecodeLastRuneInString(s string) (r rune, size int)

DecodeRune function #

DecodeRune unpacks the first UTF-8 encoding in p and returns the rune and its width in bytes. If p is empty it returns ([RuneError], 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8. An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.

func DecodeRune(p []byte) (r rune, size int)

DecodeRuneInString function #

DecodeRuneInString is like [DecodeRune] but its input is a string. If s is empty it returns ([RuneError], 0). Otherwise, if the encoding is invalid, it returns (RuneError, 1). Both are impossible results for correct, non-empty UTF-8. An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.

func DecodeRuneInString(s string) (r rune, size int)

EncodeRune function #

EncodeRune writes into p (which must be large enough) the UTF-8 encoding of the rune. If the rune is out of range, it writes the encoding of [RuneError]. It returns the number of bytes written.

func EncodeRune(p []byte, r rune) int

FullRune function #

FullRune reports whether the bytes in p begin with a full UTF-8 encoding of a rune. An invalid encoding is considered a full Rune since it will convert as a width-1 error rune.

func FullRune(p []byte) bool

FullRuneInString function #

FullRuneInString is like FullRune but its input is a string.

func FullRuneInString(s string) bool

RuneCount function #

RuneCount returns the number of runes in p. Erroneous and short encodings are treated as single runes of width 1 byte.

func RuneCount(p []byte) int

RuneCountInString function #

RuneCountInString is like [RuneCount] but its input is a string.

func RuneCountInString(s string) (n int)

RuneLen function #

RuneLen returns the number of bytes in the UTF-8 encoding of the rune. It returns -1 if the rune is not a valid value to encode in UTF-8.

func RuneLen(r rune) int

RuneStart function #

RuneStart reports whether the byte could be the first byte of an encoded, possibly invalid rune. Second and subsequent bytes always have the top two bits set to 10.

func RuneStart(b byte) bool

Valid function #

Valid reports whether p consists entirely of valid UTF-8-encoded runes.

func Valid(p []byte) bool

ValidRune function #

ValidRune reports whether r can be legally encoded as UTF-8. Code points that are out of range or a surrogate half are illegal.

func ValidRune(r rune) bool

ValidString function #

ValidString reports whether s consists entirely of valid UTF-8-encoded runes.

func ValidString(s string) bool

appendRuneNonASCII function #

func appendRuneNonASCII(p []byte, r rune) []byte

encodeRuneNonASCII function #

func encodeRuneNonASCII(p []byte, r rune) int

Generated with Arrow