AI Summarization and Classification for Chats (#2641)#2646
Conversation
1bb7f45 to
8c6e63d
Compare
995ce0e to
70c7dba
Compare
|
As I mentioned in #2641, this will be a great addition to the project.
|
|
@mbichara, not sure if we can do anything about 12. I can try to deal with the others, if you agree with them. |
|
Hi @wladimirleite! Thank you for the valuable comments and suggestions and for helping on this. I agree on all of them. About 12, I believe it is fairly doable to generate the "header" of the answer in a standard way. Regarding 4, I also agree that keeping "ai:chunk_ids" visible in metadata tab is not ideal. Just to describe it briefly, it's a hierarchical RAG architecture to allow the user to ask general questions about one or multiple chats during analysis-time in a chatGPT-like panel in the interface, and get a textual response. I would like to later show you a POC of this next feature, as I think you could also provide some good insights. Thank you again! |
|
Thanks @mbichara! |
|
|
I also suggest renaming "chunk_ids" to "chunkIds". |
|
@mbichara, there are likely some details to fine-tune, but the current version should be functional. |
|
@wladimirleite, very nice, I will process a case here and let you know |
|
Hi @wladimirleite, I processed a couple of cases, it seems fine, good work! |
|
@mbichara, is this still a draft? If not could you convert it to ready for review? Could anyone finish reviewing and testing and, if OK, approve it? I think this will be very helpful in many cases, such as in CSAM analysis, bank fraud, drugs and weapon dealing, extortion, money counterfeit, etc... |
|
I made some changes and improvements, including on the server side, and I now believe this is ready for review. @wladimirleite, thanks for collaborating on this. I’ll send you a config that points to the server so you can run it when you have some time. If other colleague wants to test it let me know. Just a heads-up: I’m also adding/testing document summarization to the task, but due to infrastructure constraints, maybe we should start with chats first. What do you think @lfcnassif? |
I agree. |
Sorry for the slow response! @mbichara, a last suggestion (sorry for not mentioning this before):
By the way, I added two chat analysis questions, which probably increased the processing time. I am running it again with summarization only. A minor detail, there were a lot of warning in the log:
I also agree that starting with chats only seems a better idea. |
With summarization only, time spent on this task decreased from 14608 s to 4811 s. |
|
After taking a closer look in the processed case and discussing with @felipecampanini, who also processed a large case he is working on with this PR, some final comments and suggestions for future improvements:
|
|
Hi @mbichara! I know you are working on a very sensitive case, but could you push the last implemented fixes and commits here? I think this is a very important feature to include into version 4.4.0, hopefully before the middle of this year... |
|
@mbichara, do you plan to push more enhancements here? |
|
Hi @lfcnassif . Yes, I was working on other urgent demand on past weeks. |
There was a problem hiding this comment.
Pull request overview
Adds AI-driven chat summarization (and optional per-chunk “analysis score” attributes) to IPED, plus a dedicated “Summary” viewer tab that renders stored summaries and can navigate to the relevant message in the chat preview. It also extends the AI Filters panel to group chats by analysis score fields discovered in the index.
Changes:
- Introduces
AISummarizationTask.py+ configuration to call a remote AI middleware service, storeai:summary,ai:chunkIds, andai:analysis:*extra attributes, and register the task inTaskInstaller.xml. - Adds a new
SummaryViewer(tab) that displaysai:summarychunks, renders basic markup, shows per-chunk analysis labels, and provides “go to first message” navigation via a newMessageNavigatorAPI. - Extends AI filters to support wildcard expansion for
ai:analysis:*fields and adds “Analyzed Chats” / “Summarized Chats” filter entries and localization keys.
Reviewed changes
Copilot reviewed 25 out of 32 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| iped-viewers/iped-viewers-impl/src/main/java/iped/viewers/SummaryViewer.java | New viewer to render stored AI summaries, show analysis labels, and link into chat preview. |
| iped-viewers/iped-viewers-api/src/main/java/iped/viewers/api/MessageNavigator.java | New API hook for viewers to request navigation to a message id. |
| iped-utils/src/main/java/iped/utils/UiUtil.java | Enhances empty HTML helper to support optional message text and theme colors. |
| iped-engine/src/main/java/iped/engine/data/SimpleFilterNode.java | Adds suffix to support display labeling for wildcard-expanded AI filter nodes. |
| iped-app/src/main/java/iped/app/ui/ViewerController.java | Registers SummaryViewer, wires navigation callback, loads analysis thresholds, and hides Summary when index lacks summary field. |
| iped-app/src/main/java/iped/app/ui/ai/AIFiltersTreeCellRenderer.java | Displays node suffix in the AI filters tree. |
| iped-app/src/main/java/iped/app/ui/ai/AIFiltersLoader.java | Expands wildcard property definitions (e.g., ai:analysis:*) into concrete filter nodes based on indexed fields. |
| iped-app/resources/scripts/tasks/AISummarizationTask.py | New processing task that calls remote service, parses chat HTML, and stores summaries + analysis attributes. |
| iped-app/resources/localization/iped-viewer-messages.properties | Adds SummaryViewer strings (EN). |
| iped-app/resources/localization/iped-viewer-messages_pt_BR.properties | Adds SummaryViewer strings (pt_BR). |
| iped-app/resources/localization/iped-viewer-messages_it_IT.properties | Adds SummaryViewer strings (it_IT) placeholders. |
| iped-app/resources/localization/iped-viewer-messages_fr_FR.properties | Adds SummaryViewer strings (fr_FR) placeholders. |
| iped-app/resources/localization/iped-viewer-messages_es_AR.properties | Adds SummaryViewer strings (es_AR) placeholders. |
| iped-app/resources/localization/iped-viewer-messages_de_DE.properties | Adds SummaryViewer strings (de_DE) placeholders. |
| iped-app/resources/localization/iped-ai-filters.properties | Adds “Analyzed Chats” / “Summarized Chats” filter labels (EN). |
| iped-app/resources/localization/iped-ai-filters_pt_BR.properties | Adds “Analyzed Chats” / “Summarized Chats” filter labels (pt_BR). |
| iped-app/resources/localization/iped-ai-filters_it_IT.properties | Adds filter labels (it_IT) placeholders. |
| iped-app/resources/localization/iped-ai-filters_fr_FR.properties | Adds filter labels (fr_FR) placeholders. |
| iped-app/resources/localization/iped-ai-filters_es_AR.properties | Adds filter labels (es_AR) placeholders. |
| iped-app/resources/localization/iped-ai-filters_de_DE.properties | Adds filter labels (de_DE) placeholders. |
| iped-app/resources/config/IPEDConfig.txt | Adds enableAISummarization config flag documentation. |
| iped-app/resources/config/conf/TaskInstaller.xml | Registers AISummarizationTask.py in the processing pipeline. |
| iped-app/resources/config/conf/AISummarizationConfig.txt | New task configuration file (remote address, timeouts, parser selection, analysis questions). |
| iped-app/resources/config/conf/AIFiltersConfig.json | Adds “Analyzed Chats” wildcard filter and “Summarized Chats” filter entries. |
| iped-api/src/main/java/iped/properties/ExtraProperties.java | Adds SUMMARY and CHUNK_IDS constants for AI summarization attributes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| String html = SimpleHTMLEncoder.htmlEncode(text).replace("\n", "<br>"); | ||
| Matcher matcher = BOLD_PATTERN.matcher(html); | ||
| StringBuffer sb = new StringBuffer(); | ||
| while (matcher.find()) { | ||
| matcher.appendReplacement(sb, "<strong>" + matcher.group(1) + "</strong>"); | ||
| } |
| sb.append("color:"); | ||
| sb.append(getHexRGB(c)); | ||
| sb.append(";"); | ||
| sb.append("\">").append(msg).append("</p>"); |
| # --- summaries --- | ||
| summary = entry.get("summary") | ||
| if isinstance(summary, str) and summary.strip(): | ||
| chunk_summaries.append(summary) | ||
|
|
||
| # --- chunk ids --- | ||
| chunk_id = entry.get("chunk_id") | ||
| if isinstance(chunk_id, str) and chunk_id.strip(): | ||
| chunk_ids.append(chunk_id) | ||
|
|
| String suffix = node.getSuffix(); | ||
| if (suffix != null) { | ||
| text += " - " + suffix; | ||
| } |
| DefaultSingleCDockable dock = dockPerViewer.get(viewer); | ||
| dockPerViewer.remove(viewer); | ||
| viewers.remove(i); | ||
| CControl cControl = dock.getControl(); | ||
| if (cControl != null) { | ||
| cControl.removeDockable(dock); | ||
| } |
| String s = field.substring(prop.length()).trim(); | ||
| if (s.toLowerCase().endsWith("score")) { | ||
| s = Character.toUpperCase(s.charAt(0)) + s.substring(1, s.length() - 5); |
| SummaryViewer.NoSummary=No summary available[TBT] | ||
| SummaryViewer.Title=AI-generated summary. Check all information. [TBT] | ||
| SummaryViewer.TabName=Summary[TBT] |
| SummaryViewer.NoSummary=No summary available[TBT] | ||
| SummaryViewer.Title=AI-generated summary. Check all information. [TBT] | ||
| SummaryViewer.TabName=Summary[TBT] |
| SummaryViewer.NoSummary=No summary available[TBT] | ||
| SummaryViewer.Title=AI-generated summary. Check all information. [TBT] | ||
| SummaryViewer.TabName=Summary[TBT] |
| SummaryViewer.NoSummary=No summary available[TBT] | ||
| SummaryViewer.Title=AI-generated summary. Check all information. [TBT] | ||
| SummaryViewer.TabName=Summary[TBT] |

This is an ongoing work.
The AISummarizationTask sends WhatsApp chat contents to a remote service (AI middleware service) and stores returned textual summaries on each item’s extra attributes.
In the analysis interface, when the user clicks on an item that has "summaries" attributes, these are rendered on a "Summary" tab near Preview tab on bottom right. The Summary tab should be hidden otherwise. But I noticed this is still buggy when scrolling through chats, and needs some work.
The idea is also to later support the summarization of other content types. I started looking into adding support to UFED chats, and will check recent changes made by @aberenguel on the UFED chats parser.