Skip to content

Alternate string metrics #131

Description

@ParadaCarleton

I think a great feature would be to let the have an optional argument that changes the string metric. For example, keyboard-based distances can improve performance and accuracy. A standard QWERTY keyboard has layout:

` 1 2 3 4 5 6 7 8 9 0 - = 
  q w e r t y u i o p [ ] \
   a s d f g h j k l ; ' 
    z x c v b n m , . /

A substitution adds a distance of 1 for every step we have to take on a keyboard. Keys on a keyboard are typically spaced along a nearly-hexagonal grid with 6 roughly equidistant neighbors, e.g. "g" has:

 t y
f g h
 v b

All of these keys can be considered neighbors, and substituting g with one of these keys increases the edit distance by 1. For an edit distance of 2, we can consider keys that are neighbors of neighbors, and so on.

This both cuts back on the number of neighbors within a given edit distance, speeding up performance, and should improve accuracy when working with real-world data typed on a keyboard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions