package matchr
import "github.com/antzucaro/matchr"
Index ¶
- Constants
- func DamerauLevenshtein(s1 string, s2 string) (distance int)
- func DoubleMetaphone(s1 string) (string, string)
- func Hamming(s1 string, s2 string) (distance int, err error)
- func Jaro(r1 string, r2 string) (distance float64)
- func JaroWinkler(r1 string, r2 string, longTolerance bool) (distance float64)
- func Levenshtein(s1 string, s2 string) (distance int)
- func LongestCommonSubsequence(s1, s2 string) int
- func NYSIIS(s1 string) string
- func OSA(s1 string, s2 string) (distance int)
- func Phonex(s1 string) string
- func SmithWaterman(s1 string, s2 string) float64
- func Soundex(s1 string) string
- type String
Constants ¶
const GAP_COST = float64(0.5)
Functions ¶
func DamerauLevenshtein ¶
DamerauLevenshtein computes the Damerau-Levenshtein distance between two strings. The returned value - distance - is the number of insertions, deletions, substitutions, and transpositions it takes to transform one string (s1) into another (s2). Each step in the transformation "costs" one distance point. It is similar to the Optimal String Alignment, algorithm, but is more complex because it allows multiple edits on substrings.
This implementation is based off of the one found on Wikipedia at http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Distance_with_adjacent_transpositions as well as KevinStern's Java implementation found at https://github.com/KevinStern/software-and-algorithms.
func DoubleMetaphone ¶
DoubleMetaphone computes the Double-Metaphone value of the input string. This value is a phonetic representation of how the string sounds, with affordances for many different language dialects. It was originally developed by Lawrence Phillips in the 1990s.
More information about this algorithm can be found on Wikipedia at http://en.wikipedia.org/wiki/Metaphone.
func Hamming ¶
Hamming computes the Hamming distance between two equal-length strings. This is the number of times the two strings differ between characters at the same index. This implementation is based off of the algorithm description found at http://en.wikipedia.org/wiki/Hamming_distance.
func Jaro ¶
Jaro computes the Jaro edit distance between two strings. It represents this with a float64 between 0 and 1 inclusive, with 0 indicating the two strings are not at all similar and 1 indicating the two strings are exact matches.
See http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance for a full description.
func JaroWinkler ¶
JaroWinkler computes the Jaro-Winkler edit distance between two strings. This is a modification of the Jaro algorithm that gives additional weight to prefix matches.
func Levenshtein ¶
Levenshtein computes the Levenshtein distance between two strings. The returned value - distance - is the number of insertions, deletions, and substitutions it takes to transform one string (s1) into another (s2). Each step in the transformation "costs" one distance point.
func LongestCommonSubsequence ¶
LongestCommonSubsequence computes the longest substring between two strings. The returned value is the length of the substring, which contains letters from both strings, while maintaining the order of the letters.
func NYSIIS ¶
NYSIIS computes the NYSIIS phonetic encoding of the input string. It is a modification of the traditional Soundex algorithm.
func OSA ¶
OSA computes the Optimal String Alignment distance between two strings. The returned value - distance - is the number of insertions, deletions, substitutions, and transpositions it takes to transform one string (s1) into another (s2). Each step in the transformation "costs" one distance point. It is similar to Damerau-Levenshtein, but is simpler because it does not allow multiple edits on any substring.
func Phonex ¶
Phonex computes the Phonex phonetic encoding of the input string. Phonex is a modification of the venerable Soundex algorithm. It accounts for a few more letter combinations to improve accuracy on some data sets.
This implementation is based off of the original C implementation by the creator - A. J. Lait - as found in his research paper entitled "An Assessment of Name Matching Algorithms."
func SmithWaterman ¶
SmithWaterman computes the Smith-Waterman local sequence alignment for the two input strings. This was originally designed to find similar regions in strings representing DNA or protein sequences.
func Soundex ¶
Soundex computes the Soundex phonetic representation of the input string. It attempts to encode homophones with the same characters. More information can be found at http://en.wikipedia.org/wiki/Soundex.
Types ¶
type String ¶
type String struct {
// contains filtered or unexported fields
}
String wraps a regular string with a small structure that provides more efficient indexing by code point index, as opposed to byte index. Scanning incrementally forwards or backwards is O(1) per index operation (although not as fast a range clause going forwards). Random access is O(N) in the length of the string, but the overhead is less than always scanning from the beginning. If the string is ASCII, random access is O(1). Unlike the built-in string type, String has internal mutable state and is not thread-safe.
func NewString ¶
NewString returns a new UTF-8 string with the provided contents.
func (*String) At ¶
At returns the rune with index i in the String. The sequence of runes is the same as iterating over the contents with a "for range" clause.
func (*String) Init ¶
Init initializes an existing String to hold the provided contents. It returns a pointer to the initialized String.
func (*String) IsASCII ¶
IsASCII returns a boolean indicating whether the String contains only ASCII bytes.
func (*String) RuneCount ¶
RuneCount returns the number of runes (Unicode code points) in the String.
func (*String) Slice ¶
Slice returns the string sliced at rune positions [i:j].
func (*String) String ¶
String returns the contents of the String. This method also means the String is directly printable by fmt.Print.
Source Files ¶
damerau_levenshtein.go hamming.go jarowinkler.go levenshtein.go longestcommonsubsequence.go metaphone.go nysiis.go osa.go phonex.go runestring.go smithwaterman.go soundex.go utf8.go util.go
- Version
- v0.0.0-20221106193745-7bed6ef61ef9 (latest)
- Published
- Nov 6, 2022
- Platform
- linux/amd64
- Imports
- 5 packages
- Last checked
- 9 hours ago –
Tools for package owners.