Add support for Chinese sentence boundary character (。) by jjroelofs · Pull Request #48 · Tessmore/sbd

jjroelofs · 2024-08-31T11:38:16Z

I understand between this PR and the Hindi support PR there is some conflict, let me know if you have a better idea about making the plugin support multiple scripts in a scalable way

Update Match.js and sbd.js to recognize Chinese full stop (。) as a sentence boundary
Modify isConcatenated function to handle Chinese characters
Add test cases for Chinese sentences
Update README to mention Chinese sentence boundary support

This change improves the library's ability to correctly tokenize Chinese text and mixed Chinese-English text into sentences.

- Update Match.js and sbd.js to recognize Chinese full stop (。) as a sentence boundary - Modify isConcatenated function to handle Chinese characters - Add test cases for Chinese sentences - Update README to mention Chinese sentence boundary support This change improves the library's ability to correctly tokenize Chinese text and mixed Chinese-English text into sentences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Chinese sentence boundary character (。)#48

Add support for Chinese sentence boundary character (。)#48
jjroelofs wants to merge 1 commit intoTessmore:masterfrom
dxpr:jur/master/support-chinese-sentence-boundary-char

jjroelofs commented Aug 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jjroelofs commented Aug 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant