Skip to content
/ edits Public

Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

License

Notifications You must be signed in to change notification settings

tcrouch/edits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

88fa4f6 · Apr 7, 2024

History

63 Commits
Apr 7, 2024
Sep 22, 2017
Oct 9, 2022
Feb 22, 2020
Oct 9, 2022
Sep 22, 2017
Feb 22, 2020
Oct 9, 2022
Sep 22, 2017
Sep 22, 2017
Sep 22, 2017
Jan 5, 2023
Sep 22, 2017
Oct 9, 2022

Repository files navigation

Edits

Gem GitHub Workflow Status (with branch) Inline docs Yard Docs

A collection of edit distance algorithms in Ruby.

Includes Levenshtein, Restricted Edit (Optimal Alignment) and Damerau-Levenshtein distances, and Jaro and Jaro-Winkler similarity.

Installation

Add this line to your application's Gemfile:

gem 'edits'

And then execute:

$ bundle

Or install it yourself as:

$ gem install edits

Usage

Levenshtein variants

Calculate the edit distance between two sequences with variants of the Levenshtein distance algorithm.

Edits::Levenshtein.distance "raked", "bakers"
# => 3
Edits::RestrictedEdit.distance "iota", "atom"
# => 3
Edits::DamerauLevenshtein.distance "acer", "earn"
# => 3
  • Levenshtein edit distance, counting insertion, deletion and substitution.
  • Restricted Damerau-Levenshtein edit distance (aka Optimal Alignment), counting insertion, deletion, substitution and transposition (adjacent symbols swapped). Restricted by the condition that no substring is edited more than once.
  • Damerau-Levenshtein edit distance, counting insertion, deletion, substitution and transposition (adjacent symbols swapped).
Levenshtein Restricted Damerau-Levenshtein Damerau-Levenshtein
"raked" vs. "bakers" 3 3 3
"iota" vs. "atom" 4 3 3
"acer" vs. "earn" 4 4 3

Levenshtein and Restricted Edit distances also have a bounded version.

# Max distance
Edits::Levenshtein.distance_with_max "fghijk", "abcde", 3
# => 3

The convenience method most_similar searches for the best match to a given sequence from a collection. It is similar to using min_by, but leverages a maximum bound.

Edits::RestrictedEdit.most_similar "atom", ["iota", "tome", "mown", "tame"]
# => "tome"

Jaro & Jaro-Winkler

Calculate the Jaro and Jaro-Winkler similarity/distance of two sequences.

Edits::Jaro.similarity "information", "informant"
# => 0.90235690235690236
Edits::Jaro.distance "information", "informant"
# => 0.097643097643097643

Edits::JaroWinkler.similarity "information", "informant"
# => 0.94141414141414137
Edits::JaroWinkler.distance "information", "informant"
# => 0.05858585858585863

Hamming

Calculate the hamming distance between two sequences.

Edits::Hamming.distance("explorer", "exploded")
# => 2

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/tcrouch/edits.

License

The gem is available as open source under the terms of the MIT License.