Skip to content

Add U+FE00 to the model and Phake training data #119

@sffc

Description

@sffc

The following string is Unicode, but it detects as Zawgyi. It contains a lot of U+FE00. If we add that code point to the model, it might make this text correctly detect as Unicode, even without a lot of training data.

ꩬ︀ံꩭုဝ︀်ꩬ︀ိပ︀်တ︀ိꩫ︀်ၸ︀ႝꩫ︀ိုဝ︀်ꩫ︀ိꩫ︀်မ︀ေ︀ပꩫ︀ႃ ။ ၸ︀ၞ်ꩭူၺꩫ︀်တ︀ႝꩡ︀ွ်မ︀ႃꩭေ︀ႃကꩭၞ်ꩫ︀ႝမ︀ွက︀်လ︀ွ်ꩡ︀ွ်
တ︀ႃ ။ ꩬ︀ိပ︀်တ︀ိꩫ︀်ꩬ︀ံꩭုဝ︀်ၸ︀ႝꩫ︀ိꩫ︀ၵ︀ံမ︀ေ︀ပꩫ︀ႃ ။ ၸ︀ၞ်ꩭူမ︀ႃꩭေ︀ႃၺꩫ︀်ၸ︀ြႃကꩭၞ်ꩫ︀ႝမ︀ွက︀်လ︀ွ် ꩡ︀ွ်တ︀ႃ ။ ꩬ︀ုတ︀်ယ︀ွ် ။
ဝ︀ွႃꩭင︀်ထ︀ႝꩫ︀ႃ ။

CC @sven-oly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions