AI Summarization and Classification for Chats (#2641) by mbichara · Pull Request #2646 · sepinf-inc/IPED

mbichara · 2025-10-07T14:43:59Z

This is an ongoing work.

The AISummarizationTask sends WhatsApp chat contents to a remote service (AI middleware service) and stores returned textual summaries on each item’s extra attributes.

In the analysis interface, when the user clicks on an item that has "summaries" attributes, these are rendered on a "Summary" tab near Preview tab on bottom right. The Summary tab should be hidden otherwise. But I noticed this is still buggy when scrolling through chats, and needs some work.

The idea is also to later support the summarization of other content types. I started looking into adding support to UFED chats, and will check recent changes made by @aberenguel on the UFED chats parser.

…ommit

…task

wladimirleite · 2026-01-20T15:36:08Z

As I mentioned in #2641, this will be a great addition to the project.
I have a few comments/suggestions:

Change the property name from "ai:summaries" to "ai:summary". Although there may be several summary values (for longer chats, I guess), most multivalued property names are singular.
Localize the "AI-generated summaries. Check all information" message.
Localize the "Summary" tab title.
It seems that "ai:chunk_ids" is used for internal control only. If that is the case, make it a temporary attribute so it won't be added to the case and visible in the advanced properties metadata tab.
Allow hit navigation (left/right arrow icons) in the Summary viewer.
Allow "search in viewer" in the Summary viewer.
Currently, the "Summary" tab is only visible if the selected item has a summary. While this makes sense, it can be inconvenient in practice. Even if the tab is pinned, selecting a chat without a summary hides the tab. When subsequently selecting a chat with a summary, the tab reappears but loses its pinned state (in the default layout). I propose adding a check when the case is opened: if there is any item with a summary, the tab should be visible. That would be more consistent with other viewers (e.g., the preview tab is always present, even for items with no preview available).
When processing, skip chats if "Communication:isEmpty: true".
Show "Summarized Chats" in the AI panel.
Show chats in the AI panel grouped by question score (very high, high, medium, low, very low?). This is not trivial and will require some changes in the AI panel code, as the questions can be customized.
Support "application/x-telegram-chat" and "application/x-threema-chat" content types. Maybe the parameter "enableWhatsAppSummarization" could be changed to "enableInternalChatSummarization".
Normalize HTML summaries "header". Not sure if this is feasible, but I observed that each part of the summary has a "header" with its period and participants involved. However, it sometimes uses "**" (I guess to highlight it), sometimes it shows "Participantes" (in Portuguese) and in others "Interlocutores", sometimes a "|" is used instead of placing "period" and "participants" in separated lines, and sometimes there are no labels, just a textual description. The following image shows some samples (all summaries belong to the same chat).

wladimirleite · 2026-01-20T15:41:02Z

@mbichara, not sure if we can do anything about 12.
Please, take a look in this item as it would require some change in the "AISummarizationTask.py" and/or the server-side code.

I can try to deal with the others, if you agree with them.

mbichara · 2026-01-20T18:35:05Z

Hi @wladimirleite!

Thank you for the valuable comments and suggestions and for helping on this.

I agree on all of them.

About 12, I believe it is fairly doable to generate the "header" of the answer in a standard way.
I can adapt the prompt on server side to insctuct the LLM to answer in a structured form, and check if the header is correct, regenerating the answer if it is not.

Regarding 4, I also agree that keeping "ai:chunk_ids" visible in metadata tab is not ideal.
The reason for saving "ai:chunk_ids" is related to the next AI feature we are planning after this, that directly relies on summaries and their "ids", generated by this task.

Just to describe it briefly, it's a hierarchical RAG architecture to allow the user to ask general questions about one or multiple chats during analysis-time in a chatGPT-like panel in the interface, and get a textual response.
For questions about large or multiple chats, summaries are used, and I instruct the LLM to use chunk_ids to quote the relevant evidence parts(chunks) in its answer. The user can then click on the quotation link (chunk_id) in the answer to see the specific chat chunk directly.

I would like to later show you a POC of this next feature, as I think you could also provide some good insights.

Thank you again!

wladimirleite · 2026-01-20T18:40:28Z

Thanks @mbichara!
About 4, let's keep chunk_id's.
I think it is possible just to omit them on the metadata panel.

wladimirleite · 2026-01-27T12:39:36Z

wladimirleite · 2026-01-27T12:42:19Z

I also suggest renaming "chunk_ids" to "chunkIds".
I am adding it to the task list.

…:summary".

…a attribute.

…mary viewer.

wladimirleite · 2026-01-28T20:55:36Z

@mbichara, there are likely some details to fine-tune, but the current version should be functional.
Please try processing a case and let me know if you encounter any issues or have suggestions.

…at analysis.

mbichara · 2026-01-29T16:35:32Z

@wladimirleite, very nice, I will process a case here and let you know

mbichara · 2026-02-04T12:53:49Z

Hi @wladimirleite, I processed a couple of cases, it seems fine, good work!
Now I am fixing some problems I found for pasing internal Telegram chats and will also test Threema next.
Also improving the backend logic.
I will let you know once I get it done. Thank you very much

lfcnassif · 2026-03-25T18:55:25Z

@mbichara, is this still a draft? If not could you convert it to ready for review?

Could anyone finish reviewing and testing and, if OK, approve it? I think this will be very helpful in many cases, such as in CSAM analysis, bank fraud, drugs and weapon dealing, extortion, money counterfeit, etc...

…ror handling

mbichara · 2026-03-30T19:27:37Z

@lfcnassif,

I made some changes and improvements, including on the server side, and I now believe this is ready for review.

@wladimirleite, thanks for collaborating on this. I’ll send you a config that points to the server so you can run it when you have some time. If other colleague wants to test it let me know.

Just a heads-up: I’m also adding/testing document summarization to the task, but due to infrastructure constraints, maybe we should start with chats first. What do you think @lfcnassif?

…task

lfcnassif · 2026-04-10T14:44:45Z

Just a heads-up: I’m also adding/testing document summarization to the task, but due to infrastructure constraints, maybe we should start with chats first. What do you think @lfcnassif?

I agree.

wladimirleite · 2026-04-11T12:20:11Z

@wladimirleite, thanks for collaborating on this. I’ll send you a config that points to the server so you can run it when you have some time. If other colleague wants to test it let me know.

Sorry for the slow response!
I processed a large UFDR yesterday and analysed the results today.
Everything seems fine!

@mbichara, a last suggestion (sorry for not mentioning this before):
As the task took quite some time (about 50% of processing time in this case with a lot of large chats), I think it would be important to have in the processing log some basic performance data:

Number of chats processed,
Total/average number of characters in processed chats,
Average time spent per chat.
Maybe separated for chat analysis and summarization, if these tasks ran as separated steps.

By the way, I added two chat analysis questions, which probably increased the processing time. I am running it again with summarization only.

A minor detail, there were a lot of warning in the log:

xxxx\iped-4.4.0-SNAPSHOT\python\lib\site-packages\urllib3\connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.61.xx.xx'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Just a heads-up: I’m also adding/testing document summarization to the task, but due to infrastructure constraints, maybe we should start with chats first. What do you think @lfcnassif?

I also agree that starting with chats only seems a better idea.
After the feedback of users in real cases, we can make adjustments and expand to other type of items.

wladimirleite · 2026-04-12T18:59:16Z

By the way, I added two chat analysis questions, which probably increased the processing time. I am running it again with summarization only.

With summarization only, time spent on this task decreased from 14608 s to 4811 s.

wladimirleite · 2026-04-17T15:51:27Z

After taking a closer look in the processed case and discussing with @felipecampanini, who also processed a large case he is working on with this PR, some final comments and suggestions for future improvements:

Although the summarization content is very accurate, for many chats it is not very concise, looking more like a detailed description. I suggest trying to make it a bit more concise (by default) and, maybe, having a configuration parameter to set the level of conciseness (it could be a number or low/medium/high).
For the questions of the analysis feature, show the score of each chunk in the visualization. That would help a lot to find which parts are important in large chats.
Allow "open" questions, i.e. questions which instead of producing a score, would bring a textual answer.

lfcnassif · 2026-06-09T01:33:00Z

Hi @mbichara!

I know you are working on a very sensitive case, but could you push the last implemented fixes and commits here? I think this is a very important feature to include into version 4.4.0, hopefully before the middle of this year...

lfcnassif · 2026-06-23T19:45:06Z

@mbichara, do you plan to push more enhancements here?

mbichara · 2026-06-23T19:50:04Z

Hi @lfcnassif . Yes, I was working on other urgent demand on past weeks.
Just returned to this. I am finishing some tests mostly on the backend. But I made some enhancements here also.
I will push it now

…nstead of numerical value

Copilot

Pull request overview

Adds AI-driven chat summarization (and optional per-chunk “analysis score” attributes) to IPED, plus a dedicated “Summary” viewer tab that renders stored summaries and can navigate to the relevant message in the chat preview. It also extends the AI Filters panel to group chats by analysis score fields discovered in the index.

Changes:

Introduces AISummarizationTask.py + configuration to call a remote AI middleware service, store ai:summary, ai:chunkIds, and ai:analysis:* extra attributes, and register the task in TaskInstaller.xml.
Adds a new SummaryViewer (tab) that displays ai:summary chunks, renders basic markup, shows per-chunk analysis labels, and provides “go to first message” navigation via a new MessageNavigator API.
Extends AI filters to support wildcard expansion for ai:analysis:* fields and adds “Analyzed Chats” / “Summarized Chats” filter entries and localization keys.

Reviewed changes

Copilot reviewed 25 out of 32 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
iped-viewers/iped-viewers-impl/src/main/java/iped/viewers/SummaryViewer.java	New viewer to render stored AI summaries, show analysis labels, and link into chat preview.
iped-viewers/iped-viewers-api/src/main/java/iped/viewers/api/MessageNavigator.java	New API hook for viewers to request navigation to a message id.
iped-utils/src/main/java/iped/utils/UiUtil.java	Enhances empty HTML helper to support optional message text and theme colors.
iped-engine/src/main/java/iped/engine/data/SimpleFilterNode.java	Adds `suffix` to support display labeling for wildcard-expanded AI filter nodes.
iped-app/src/main/java/iped/app/ui/ViewerController.java	Registers SummaryViewer, wires navigation callback, loads analysis thresholds, and hides Summary when index lacks summary field.
iped-app/src/main/java/iped/app/ui/ai/AIFiltersTreeCellRenderer.java	Displays node suffix in the AI filters tree.
iped-app/src/main/java/iped/app/ui/ai/AIFiltersLoader.java	Expands wildcard `property` definitions (e.g., `ai:analysis:*`) into concrete filter nodes based on indexed fields.
iped-app/resources/scripts/tasks/AISummarizationTask.py	New processing task that calls remote service, parses chat HTML, and stores summaries + analysis attributes.
iped-app/resources/localization/iped-viewer-messages.properties	Adds SummaryViewer strings (EN).
iped-app/resources/localization/iped-viewer-messages_pt_BR.properties	Adds SummaryViewer strings (pt_BR).
iped-app/resources/localization/iped-viewer-messages_it_IT.properties	Adds SummaryViewer strings (it_IT) placeholders.
iped-app/resources/localization/iped-viewer-messages_fr_FR.properties	Adds SummaryViewer strings (fr_FR) placeholders.
iped-app/resources/localization/iped-viewer-messages_es_AR.properties	Adds SummaryViewer strings (es_AR) placeholders.
iped-app/resources/localization/iped-viewer-messages_de_DE.properties	Adds SummaryViewer strings (de_DE) placeholders.
iped-app/resources/localization/iped-ai-filters.properties	Adds “Analyzed Chats” / “Summarized Chats” filter labels (EN).
iped-app/resources/localization/iped-ai-filters_pt_BR.properties	Adds “Analyzed Chats” / “Summarized Chats” filter labels (pt_BR).
iped-app/resources/localization/iped-ai-filters_it_IT.properties	Adds filter labels (it_IT) placeholders.
iped-app/resources/localization/iped-ai-filters_fr_FR.properties	Adds filter labels (fr_FR) placeholders.
iped-app/resources/localization/iped-ai-filters_es_AR.properties	Adds filter labels (es_AR) placeholders.
iped-app/resources/localization/iped-ai-filters_de_DE.properties	Adds filter labels (de_DE) placeholders.
iped-app/resources/config/IPEDConfig.txt	Adds `enableAISummarization` config flag documentation.
iped-app/resources/config/conf/TaskInstaller.xml	Registers `AISummarizationTask.py` in the processing pipeline.
iped-app/resources/config/conf/AISummarizationConfig.txt	New task configuration file (remote address, timeouts, parser selection, analysis questions).
iped-app/resources/config/conf/AIFiltersConfig.json	Adds “Analyzed Chats” wildcard filter and “Summarized Chats” filter entries.
iped-api/src/main/java/iped/properties/ExtraProperties.java	Adds `SUMMARY` and `CHUNK_IDS` constants for AI summarization attributes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        String html = SimpleHTMLEncoder.htmlEncode(text).replace("\n", "<br>");
+        Matcher matcher = BOLD_PATTERN.matcher(html);
+        StringBuffer sb = new StringBuffer();
+        while (matcher.find()) {
+            matcher.appendReplacement(sb, "<strong>" + matcher.group(1) + "</strong>");
+        }


+            sb.append("color:");
+            sb.append(getHexRGB(c));
+            sb.append(";");
+            sb.append("\">").append(msg).append("</p>");


+            # --- summaries ---
+            summary = entry.get("summary")
+            if isinstance(summary, str) and summary.strip():
+                chunk_summaries.append(summary)
+
+            # --- chunk ids ---
+            chunk_id = entry.get("chunk_id")
+            if isinstance(chunk_id, str) and chunk_id.strip():
+                chunk_ids.append(chunk_id)
+


+            String suffix = node.getSuffix();
+            if (suffix != null) {
+                text += " - " + suffix;
+            }


+                    DefaultSingleCDockable dock = dockPerViewer.get(viewer);
+                    dockPerViewer.remove(viewer);
+                    viewers.remove(i);
+                    CControl cControl = dock.getControl();
+                    if (cControl != null) {
+                        cControl.removeDockable(dock);
+                    }


+                            String s = field.substring(prop.length()).trim();
+                            if (s.toLowerCase().endsWith("score")) {
+                                s = Character.toUpperCase(s.charAt(0)) + s.substring(1, s.length() - 5);


+SummaryViewer.NoSummary=No summary available[TBT]
+SummaryViewer.Title=AI-generated summary. Check all information. [TBT]
+SummaryViewer.TabName=Summary[TBT]


+SummaryViewer.NoSummary=No summary available[TBT]
+SummaryViewer.Title=AI-generated summary. Check all information. [TBT]
+SummaryViewer.TabName=Summary[TBT]


+SummaryViewer.NoSummary=No summary available[TBT]
+SummaryViewer.Title=AI-generated summary. Check all information. [TBT]
+SummaryViewer.TabName=Summary[TBT]


+SummaryViewer.NoSummary=No summary available[TBT]
+SummaryViewer.Title=AI-generated summary. Check all information. [TBT]
+SummaryViewer.TabName=Summary[TBT]


mbichara force-pushed the add-aisummarizationtask branch from 1bb7f45 to 8c6e63d Compare October 7, 2025 17:01

mbichara added 8 commits December 10, 2025 13:08

feature sepinf-inc#2641: AISummarization for WhatsApp chats - first c…

36b581f

…ommit

sepinf-inc#2641: add suport to UFED chats: mime x-ufed-chat-preview

adacca5

sepinf-inc#2641 add missing config prop

6bf7c72

sepinf-inc#2641: Performing the chat msgs parsing in the task.

029dd91

sepinf-inc#2641: Improvements in server communication and chat parsing

8cceea0

sepinf-inc#2641: Add chat analysis (questions) functionality.

3a1bc52

sepinf-inc#2641: Fix boolean conversion and add chunk_ids in metadata

b40d53f

sepinf-inc#2641: After rebase additions

70c7dba

mbichara force-pushed the add-aisummarizationtask branch from 995ce0e to 70c7dba Compare December 10, 2025 19:48

sepinf-inc#2641: Small changes

1c47602

mbichara marked this pull request as ready for review December 12, 2025 19:01

mbichara and others added 3 commits December 22, 2025 14:46

sepinf-inc#2641: Fix sleep constants

89fcad6

'sepinf-inc#2641: Code formatting.

ee9447a

Merge remote-tracking branch 'origin/master' into add-aisummarization…

55388db

…task

wladimirleite changed the title ~~#2641: AISummarization for WhatsApp chats - first commit~~ AISummarization for WhatsApp chats (#2641) Jan 17, 2026

'sepinf-inc#2641: Remove unused/commented code.

9f06124

wladimirleite marked this pull request as draft January 20, 2026 15:30

wladimirleite changed the title ~~AISummarization for WhatsApp chats (#2641)~~ AISummarization for Chats (#2641) Jan 27, 2026

wladimirleite added 5 commits January 27, 2026 09:48

'sepinf-inc#2641: Change the property name from "ai:summaries" to "ai…

38beb85

…:summary".

'sepinf-inc#2641: Rename "ai:chunk_ids" to "ai:chunkIds".

b6c7482

'sepinf-inc#2641: Simplify hasSummary() checks and the access to extr…

e233dae

…a attribute.

'sepinf-inc#2641: Allow hit navigation (left/right arrows) in the Sum…

ce69444

…mary viewer.

'sepinf-inc#2641: Simplify "Summary" tab visibility management.

4fb7468

'sepinf-inc#2641: Additional comment about question attributes for ch…

e7c350f

…at analysis.

'sepinf-inc#2641: Reverts commit b8e359b.

2ccbac0

lfcnassif mentioned this pull request Feb 28, 2026

AI summarization and AI questions based classification for chats #2641

Open

lfcnassif linked an issue Feb 28, 2026 that may be closed by this pull request

AI summarization and AI questions based classification for chats #2641

Open

lfcnassif changed the title ~~AISummarization for Chats (#2641)~~ AI Summarization and Classification for Chats (#2641) Mar 25, 2026

mbichara and others added 3 commits March 30, 2026 15:56

sepinf-inc#2641: Add connection params, fix attachment parsing and er…

d2e81ee

…ror handling

sepinf-inc#2641: Put connection params on config

bb9c017

Merge branch 'master' into add-aisummarizationtask

6f23561

mbichara marked this pull request as ready for review March 30, 2026 19:27

Merge remote-tracking branch 'origin/master' into add-aisummarization…

1bef43e

…task

mbichara added 5 commits June 23, 2026 16:57

Add stats and fix InsecureRequestWarning

9f2f1cb

Add scores of analysis in SummaryViewer

2ceae08

Add "go to first msg" in summary chunks, and show level of analysis i…

68e8526

…nstead of numerical value

Add parsersToUse option

5aa84c1

Fix Unicode rendering in summary viewer

9488c03

hauck-jvsh requested a review from Copilot June 30, 2026 16:12

Copilot started reviewing on behalf of hauck-jvsh June 30, 2026 16:12 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Uh oh!

Conversation

mbichara commented Oct 7, 2025

Uh oh!

wladimirleite commented Jan 20, 2026

Uh oh!

wladimirleite commented Jan 20, 2026

Uh oh!

mbichara commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wladimirleite commented Jan 20, 2026

Uh oh!

wladimirleite commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wladimirleite commented Jan 27, 2026

Uh oh!

wladimirleite commented Jan 28, 2026

Uh oh!

mbichara commented Jan 29, 2026

Uh oh!

mbichara commented Feb 4, 2026

Uh oh!

lfcnassif commented Mar 25, 2026

Uh oh!

mbichara commented Mar 30, 2026

Uh oh!

lfcnassif commented Apr 10, 2026

Uh oh!

wladimirleite commented Apr 11, 2026

Uh oh!

wladimirleite commented Apr 12, 2026

Uh oh!

wladimirleite commented Apr 17, 2026

Uh oh!

lfcnassif commented Jun 9, 2026

Uh oh!

lfcnassif commented Jun 23, 2026

Uh oh!

mbichara commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mbichara commented Jan 20, 2026 •

edited

Loading

wladimirleite commented Jan 27, 2026 •

edited

Loading