Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = space
indent_size = 4
trim_trailing_whitespace = true

[*.py]
indent_size = 4

[*.md]
trim_trailing_whitespace = false
44 changes: 44 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: CI

on:
push:
branches: [ main ]
pull_request:
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[dev]
pip install pytest pytest-cov mypy ruff black isort
- name: Lint
run: |
ruff check src
black --check src tests
isort --check-only src tests
- name: Type check
run: mypy src
- name: Test
run: pytest --cov=src --cov-report=xml
- name: Build distributions
if: matrix.python-version == '3.12'
run: |
pip install build
python -m build
- name: Upload artifacts
if: matrix.python-version == '3.12'
uses: actions/upload-artifact@v4
with:
name: dist-${{ github.sha }}
path: dist
22 changes: 22 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
__pycache__/
*.pyc
*.pyo
*.pyd
*.so
*.dylib
*.egg-info/
.build/
.cache/
.mypy_cache/
.pytest_cache/
.hypothesis/
.coverage
htmlcov/
.dist/
.DS_Store
*.log
!tests/data/*.log
.eggs/
.env
.venv/
venv/
21 changes: 21 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
repos:
- repo: https://github.com/psf/black
rev: 24.4.2
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.6
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.9.0
hooks:
- id: mypy
additional_dependencies: []
- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.13.2
hooks:
- id: isort
args: ["--profile", "black"]
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] - 2024-06-06
### Added
- Initial Python package layout with `syslogcef` module and CLI.
- RFC3164/RFC5424 parsers with key/value and structured data extraction.
- CEF encoder with deterministic severity mapping and escaping.
- Mapping framework plus default, Cisco, Linux, F5 and VMware implementations.
- Streaming CLI with watch mode, worker pool, stats and mapping overrides.
- Test suite covering parsing, encoding, converters and CLI.
- Pre-commit configuration, CI workflow, benchmark helper and updated documentation.
120 changes: 95 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,123 @@
# syslogcef

`syslogcef` is a lightweight Python package that turns raw RFC3164/RFC5424 syslog or JSON events into ArcSight CEF. It provides a composable mapping layer, streaming command line tools, and a small API for embedding the converter in other services.

- Converts classic syslog and structured JSON events to deterministic CEF output
- Handles timezone normalisation, key/value extraction, and UTF-8 sanitisation
- Ships with vendor tuned mappings for Cisco, Linux, F5 and VMware plus a sane default
- CLI supports streaming input, tail mode and worker pools for high-throughput ingestion

# JSON & SYSLOG to CEF converter
![Architecture](images/jsoncef.png)

* Author : Tamir Suliman
* Date : 02-09-2023
## Installation


## JSON to CEF
```bash
pip install .
```

The package requires Python 3.10 or later.

## Quickstart

Convert syslog files to CEF and stream the output to stdout:

```bash
syslogcef --input syslog-logs/cisco/cisco-ios.log --source cisco
```

Process JSON events from stdin, using the Linux mapping and capturing statistics:

```bash
cat json-logs/cisco/cisco-ios.json \
| syslogcef --format json --source linux --stats
```

convert your JSON events to CEF format
* Converting from JSON to CEF involves mapping the fields from the JSON data to the fields in the Common Event Format (CEF). CEF is a standardized log format that enables log management systems to process and store logs from various security and network devices.
Watch a log file for updates and write CEF to a file:

* The CEF format consists of a number of key-value pairs that provide information about the log event. The basic structure of a CEF log message is as follows:
```bash
syslogcef --input /var/log/messages --output /tmp/messages.cef --watch --source linux
```
CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|[Extension Key]=[Value] ...

## Library usage

```python
import json

from syslogcef import convert_line, parse_syslog, from_json, to_cef
from syslogcef.mappings import get_mapping

syslog_line = "<189>Feb 8 04:00:48 host sshd[123]: user=alice action=login"
parsed = parse_syslog(syslog_line)
event = parsed.as_event()
cef = to_cef(event, vendor="Example", product="Collector", version="1.0", mapping=get_mapping("linux"))
print(cef)

json_event = {"message": "Login", "host": "firewall", "action": "allow"}
cef_line = convert_line(json.dumps(json_event))
```

* To convert a JSON log to CEF, you would need to extract the relevant information from the JSON data and map it to the appropriate fields in the CEF format. For example, you could extract the "event.code", "event.severity", and "message" fields from the JSON data and map them to the "Signature ID", "Severity", and "Name" fields in the CEF format.
## Mapping architecture

Mappings translate parsed events into CEF signature, name, severity and extension dictionaries. Built-in mappings live under `syslogcef.mappings`:

- `default`: generic conversion that preserves message, host and process information
- `cisco`: tailored to ASA/IOS events with action/severity detection and network fields
- `linux`: surfaces authentication and auditd attributes
- `f5`: maps client/server addressing fields from BIG-IP style logs
- `vmware`: extracts hypervisor user and VM identifiers

Mappings conform to a simple protocol and can be extended with JSON/YAML override files via `--mapping-file`. Overrides support Python format strings using event fields (`src`, `dst`, `msg`, …) and merge with the mapping result.

## CLI reference

* The scripts in this repository would help you achieve that.However, since JSON structure and data changes a template must be created to address all different data sources.
Run `syslogcef --help` for the full option list. Key flags:

* The use case scenario would be :
- `--format {syslog,json}`: force input format instead of auto detection
- `--source`: choose mapping source (`default`, `cisco`, `linux`, `f5`, `vmware`)
- `--watch`: tail the input file for streaming ingestion
- `--workers N`: convert lines in parallel using worker threads
- `--tz Europe/Berlin`: default timezone for naive timestamps
- `--strict`: abort on parse errors; otherwise errors are tagged inside the CEF payload
- `--stats`: print processed/failed counters to stderr

![JSON CEF](https://github.com/allamiro/JSON-SYSLOG-TO-CEF/blob/main/images/jsoncef.png)
## Performance tips

- Use `--workers` when CPU-bound mappings dominate the workload; throughput scales with available cores for pure Python workloads.
- Prefer piping data directly to the CLI to avoid storing large intermediate files.
- The `scripts/bench.py` helper exercises conversion throughput:
```bash
python scripts/bench.py tests/data/cisco-ios.log --lines 10000
```

### CISCO Logs
## Known limitations

## SYSLOG to CEF
Convert your Syslog format to CEF format
Syslog, is an open standard for logging and reporting events from computer systems, network devices, and other IT assets. Syslog is supported by a wide range of network devices and operating systems, making it a widely used logging format. Syslog messages contain a priority value, which indicates the severity of the event, and a message body, which provides detailed information about the event.
- The bundled mappings focus on common fields; bespoke environments should extend the mapping set.
- JSON fragments embedded deeply within syslog messages are best extracted upstream for accuracy.
- YAML overrides require PyYAML (optional dependency) when used.

## API reference

![SYSLOG 2 CEF ](https://github.com/allamiro/JSON-SYSLOG-TO-CEF/blob/main/images/Screenshot%202023-02-10%20at%201.41.31%20AM.png)
| Function | Description |
| --- | --- |
| `parse_syslog(line: str) -> ParsedSyslog` | Parse RFC3164/RFC5424 line into structured data |
| `from_json(event: dict) -> ParsedEvent` | Normalise JSON dict to a parsed event |
| `to_cef(event: ParsedEvent, vendor, product, version, mapping)` | Encode a parsed event using the supplied mapping |
| `convert_line(line: str, source: str | None = None, mapping: Mapping | None = None)` | High-level conversion helper |

## Sample data & rsyslog templates

### CISCO Logs
Sample logs live under `tests/data`, sourced from the original `json-logs/` and `syslog-logs/` directories. For rsyslog configuration examples, see [RSYSLOG_TEMPLATES.md](RSYSLOG_TEMPLATES.md).

## Development

- Run formatting and linting: `ruff check src && black src tests`
- Execute the test suite: `pytest`
- Type-check with `mypy`

### Built With
## Benchmarks

This section should list any major frameworks/libraries used to bootstrap your project. Leave any add-ons/plugins for the acknowledgements section. Here are a few examples.
On a sample dataset (`tests/data/cisco-ios.log`) processed with `python scripts/bench.py --lines 50000` on a laptop (Apple M2, Python 3.11), the converter sustains ~220k lines/sec in single-threaded mode.

## License

# References
* [1] Log sampels used from https://github.com/elastic/beats/tree/main/x-pack/filebeat/module
* [1] https://learn.microsoft.com/en-us/azure/sentinel/cef-name-mapping
* [1] https://www.microfocus.com/documentation/arcsight/arcsight-smartconnectors-8.3/cef-implementation-standard/
Apache License 2.0. See [LICENSE](LICENSE).
75 changes: 5 additions & 70 deletions elk-json-to-cef.py
Original file line number Diff line number Diff line change
@@ -1,72 +1,7 @@
#!/usr/bin/python3
# Author Tamir Suliman
# Email allamiro@gmail.com
# Date : 02-09-2023
#!/usr/bin/env python3
"""Compatibility wrapper pointing to the new syslogcef CLI."""

# Import libraries
from syslogcef.cli import main

import json
import re
import os
import socket


# Need to add the ability to read from a socket forward to an object after decoding the data

# CEF_template = "CEF:0|{cisco.ios.facility}|{event.module}|{event.code}|{message}|{event.severity}|"
# Define the CEF header template
cef_header = "CEF:0|Cisco|IOS|1.0|"
cef_data = []

# Read the json files in the directory
with open('data2.json', 'r+') as ciscolog:
jess_dict = json.load(ciscolog)
#jess_dict1 = json.dumps(jess_dict, indent=4)
jess_dict1 = json.dumps(jess_dict)

# Passing the JSON file to a Converter Function
# JSON LOGS PARSED BY ELASTIC FILE BEATS

with open("cisco-cef_logs.log", "w") as f:
for log in jess_dict:
cef_log = cef_header + "id=" + str(log.get("event.sequence", "")) + " "
cef_log += "src=" + log.get("source.address", "") + " "
cef_log += "spt=" + str(log.get("source.port", "")) + " "
cef_log += "dst=" + log.get("destination.ip", "") + " "
cef_log += "dpt=" + str(log.get("destination.port", "")) + " "
cef_log += "proto=" + log.get("network.transport", "") + " "
cef_log += "cat=" + ','.join(log.get("event.category", [])) + " "
cef_log += "dvc=" + ','.join(log.get("log.source.address", [])) + " "
cef_log += "msg=" + log.get("event.original", "") + " "
cef_log += "outcome=" + log.get("event.outcome", "") + " "
cef_log += "cs1=" + log.get("event.code", "") + " "
cef_log += "cs2=" + log.get("network.community_id", "") + " "
cef_log += "cs3=" + log.get("cisco.ios.access_list", "") + " "
cef_log += "severity=" + str(log.get("event.severity", "")) + " "
print(cef_log)
f.write(cef_log + "\n")



# MICROSOFT WINDOWS SECURITY LOGS

# cef_data = "CEF:0|source|name|version|signature_id|signature|severity|"
cef_header2 = "CEF:0|Microsoft|Microsoft Windows|Microsoft-Windows-Security-Auditing|1.12.0|"

with open('win-log.json', 'r+') as ciscolog:
jess_dict = json.load(ciscolog)
#jess_dict1 = json.dumps(jess_dict, indent=4)
jess_dict1 = json.dumps(jess_dict)
# Passing the JSON file to a Converter Function
with open("win-cef_logs.log", "w") as f:
for log in jess_dict:
cef_log = cef_header2 + "start=" + log.get("@timestamp", "") + " "
cef_log += "event_id=" + log.get("winlog", {}).get("event_id", "") + " "
cef_log += "event_message=" + log.get("message","") + " "
cef_log += "action=" + log.get("event", {}).get("action", "") + " "
outcome = log.get("event", {}).get("outcome", "") + " "
print(cef_log)
f.write(cef_log + "\n")



if __name__ == "__main__":
raise SystemExit(main())
Loading