This repository contains the dataset and supporting scripts used in the paper "SIP-Classifier: Unsupervised Classification of SIP-IMS Signaling with Transformer and Clustering".
Repository Structure
-
sip_dataset.csv: CSV dataset of SIP signaling flows. Columns:Signaling Flow: a string describing the SIP signaling sequence.Class: numeric label for the sequence (0 = benign,1 = anomalous,2 = unknown).
-
sota/: folder containing baseline model implementations and results used for comparison in experiments.CNN.py: Convolutional Neural Network baseline from the paper Detection of abnormal SIP signaling patterns: a deep learning comparison (Computers, 2022)corr.py: Correlation-based baseline from the paper Correlation-Based Abnormal SIP Dialog Identification: A Performance Comparison with Bayesian and Deep Learning Approaches (IEEE Access, 2025).HMM.py: Hidden Markov Model baseline from the paper A Machine Learning Approach for Prediction of Signaling SIP Dialogs (IEEE Access, 2021).LSTM1.pyandLSTM2.py: LSTM baseline implementations from the paper Classification of Abnormal Signaling SIP Dialogs Through Deep Learning (IEEE Access, 2021)results/: directory containing the results obtained from running the baseline models on thesip_dataset.csv.
-
util/: utility scripts and labeled subsets.anomalous.csv: subset of thesip_dataset.csvcontaining only anomalous signaling flows, without duplicates.benign.csv: subset of thesip_dataset.csvcontaining only benign signaling flows, without duplicates.split.py: helper script to extract theanomalous.csv/benign.csvsubsets.stats.py: script to compute statistics over the datasetsip_dataset.csv.
If you find this work interesting and use it in your academic research, please cite our paper!