feat: add new page text extraction methods and improve content parsing#17
feat: add new page text extraction methods and improve content parsing#17jankzl wants to merge 1 commit intoctxinf:devfrom
Conversation
jankzl
commented
Apr 12, 2026
- Introduced new extraction methods: DOM heuristic as alternative option for @mozilla/readability.
- Updated localization files for English, Simplified Chinese, and Traditional Chinese to include new extraction method messages.
- Enhanced Summary component to allow users to select the extraction method.
- Implemented logic in useSummary composable to handle extraction method changes and update webpage content accordingly.
- Refactored page-read utility functions to support new extraction methods and improve content parsing.
- Added new properties to WebpageContent type to track extraction method and input text length.
- Updated default configuration to set the default extraction method to readability.
- Introduced new extraction methods: DOM heuristic as alternative option for @mozilla/readability. - Updated localization files for English, Simplified Chinese, and Traditional Chinese to include new extraction method messages. - Enhanced Summary component to allow users to select the extraction method. - Implemented logic in useSummary composable to handle extraction method changes and update webpage content accordingly. - Refactored page-read utility functions to support new extraction methods and improve content parsing. - Added new properties to WebpageContent type to track extraction method and input text length. - Updated default configuration to set the default extraction method to readability.
|
The default content extract method |
| <span class="font-bold"> general</span> extract method. | ||
| Two <span class="font-bold">general</span> extract methods are available now. | ||
| <br /> | ||
| <span class="font-bold">@mozilla/readability</span> remains the default for classic article pages. |
| <template #trigger> @mozilla/readability </template> | ||
| </Select> | ||
| <div class="w-[28rem] max-w-full"> | ||
| <RadioGroup v-model="pageTextExtractMethod" class="gap-3"> |
| import { minimatch } from 'minimatch'; | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
这个核心流程的代码是在Agent编程还未成熟之前我手工写的, 经过了大量的人工测试和调试, 虽然写的是一坨狗屎,但是请采用最小变更, 不要重构它的核心流程。
| }) | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
对于这个feature, 我的理解是只用变更这个地方就可以了,根据选项判断去具体调用哪个函数, 获取返回的结果
| <SummaryDialog class="mt-[-1px] min-h-16 overflow-y-auto max-h-[--webpage-summary-panel-dialog-max-height]" | ||
| style="overflow-anchor: auto" ref="summaryDialog"> | ||
| <template #top-right-buttons> | ||
| <div class="flex items-center gap-1 rounded-md border bg-background/80 p-1" :title="t('Extract_method')"> |
There was a problem hiding this comment.
因为界面里没有为这个Toggle Buttons Group提供空间,现在的样子非常奇怪, 建议不要这个Toggle Buttons Group, 仅允许用户在设置界面里进行配置
| } | ||
|
|
||
| export type SummaryInput = WebpageContent & { summaryLanguage: string, currentSelection?: string } | ||
| export type SummaryInput = WebpageContent & { summaryLanguage: string, currentSelection?: string, currentModel?: string } |
|
|
||
| or download from [Github Releases](https://github.com/slow-groovin/webpage-summary/releases) and manually install | ||
|
|
||
| ### Load Modified Extension In Firefox |
There was a problem hiding this comment.
该项目当前是使用 bun 作为包管理器的。请确认这一段描述是否是人工核实过的,以及是否有需要添加的说明。
ctxinf
left a comment
There was a problem hiding this comment.
谢谢你的提交,请参考评论进行修改。另外,你的另外的两个提交是否是和这个功能相关的后续提交?如果是的话,请在相同分支提交commits作为一个PR。
There was a problem hiding this comment.
这个文件可以重构为一下,比如说新增一个文件夹 utils/extract,然后每个不同的功能分散到不同的文件里, export相同类型的函数入口