diff

Imports

Imports #

"bytes"
"fmt"
"sort"
"strings"

Structs

pair struct #

A pair is a pair of values tracked for both the x and y side of a diff. It is typically a pair of line indexes.

type pair struct {
x int
y int
}

Functions

Diff function #

Diff returns an anchored diff of the two texts old and new in the “unified diff” format. If old and new are identical, Diff returns a nil slice (no output). Unix diff implementations typically look for a diff with the smallest number of lines inserted and removed, which can in the worst case take time quadratic in the number of lines in the texts. As a result, many implementations either can be made to run for a long time or cut off the search after a predetermined amount of work. In contrast, this implementation looks for a diff with the smallest number of “unique” lines inserted and removed, where unique means a line that appears just once in both old and new. We call this an “anchored diff” because the unique lines anchor the chosen matching regions. An anchored diff is usually clearer than a standard diff, because the algorithm does not try to reuse unrelated blank lines or closing braces. The algorithm also guarantees to run in O(n log n) time instead of the standard O(n²) time. Some systems call this approach a “patience diff,” named for the “patience sorting” algorithm, itself named for a solitaire card game. We avoid that name for two reasons. First, the name has been used for a few different variants of the algorithm, so it is imprecise. Second, the name is frequently interpreted as meaning that you have to wait longer (to be patient) for the diff, meaning that it is a slower algorithm, when in fact the algorithm is faster than the standard one.

func Diff(oldName string, old []byte, newName string, new []byte) []byte

lines function #

lines returns the lines in the file x, including newlines. If the file does not end in a newline, one is supplied along with a warning about the missing newline.

func lines(x []byte) []string

tgs function #

tgs returns the pairs of indexes of the longest common subsequence of unique lines in x and y, where a unique line is one that appears once in x and once in y. The longest common subsequence algorithm is as described in Thomas G. Szymanski, “A Special Case of the Maximal Common Subsequence Problem,” Princeton TR #170 (January 1975), available at https://research.swtch.com/tgs170.pdf.

func tgs(x []string, y []string) []pair

Generated with Arrow