Skip to content

SHAROZ221/codealpha_Task-Automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Task Automation with Python — Email Extractor

CodeAlpha Python Programming Internship | Task 3


📌 Goal

Automate the extraction of all email addresses from a .txt file, categorize them, and save a detailed report to a separate output file.


📁 Project Files

File Purpose
main.py Main script — extracts, categorizes, and reports on emails using regex
sample.txt Sample input file with multiple email addresses
extracted_emails.txt Auto-generated output file with extracted emails and stats

▶️ How to Run

Step 1 — Install Python

No external libraries needed. Uses only built-in Python modules (re, os, collections).

Step 2 — Run the script

python main.py

Step 3 — Enter the filename

Enter the .txt filename (e.g. sample.txt): sample.txt

Step 4 — Check the output

Extracted emails, category breakdown, and domain stats are saved to extracted_emails.txt in the same folder.


🔍 How It Works

  1. Reads the contents of the input .txt file
  2. Uses a regex pattern to find all valid email addresses
  3. Removes duplicates while preserving order
  4. Categorizes each email as either:
    • Personal/Work — regular human or business addresses
    • System/No-reply — automated addresses (e.g. no-reply@, notifications@, alerts@, newsletter@)
  5. Groups emails by domain and counts how many addresses belong to each domain
  6. Saves a full report to extracted_emails.txt, including:
    • Total email count
    • Category breakdown (Personal/Work vs System/No-reply)
    • Domain breakdown (sorted by frequency)
    • Separate lists of Personal/Work and System/No-reply emails

💡 Key Concepts Used

  • re — regular expressions for pattern matching
  • os — file existence check
  • collections.Counter — counting categories and domains
  • File handling — reading input, writing output
  • Deduplication logic using set()
  • String matching for email categorization

🧪 Sample Output

Console

[+] Found 24 unique email(s):
    michael.ross@company.com
    ...

[*] Category Breakdown:
    Personal/Work: 21
    System/No-reply: 3

[*] Top Domains:
    devteam.org: 5
    company.com: 4
    secops.net: 4
    bigcorp.in: 3
    partnerltd.co.uk: 1

[✓] Saved to: extracted_emails.txt

extracted_emails.txt

Email Extraction Report
========================================
Total emails found: 24
========================================

Category Breakdown:
  Personal/Work emails: 21
  System/No-reply emails: 3

Domain Breakdown:
  devteam.org: 5
  company.com: 4
  secops.net: 4
  bigcorp.in: 3
  ...

----------------------------------------
Personal/Work Emails
----------------------------------------
michael.ross@company.com
rachel.zane@company.com
...

----------------------------------------
System/No-reply Emails
----------------------------------------
no-reply@alerts.system.com
notifications@monitor.net
noreply@newsletter.company.com

🚀 Extra Features Added Beyond the Original Brief

  • Email categorization — automatically flags addresses as Personal/Work or System/No-reply based on common automated-mail prefixes (no-reply, notifications, alerts, newsletter, etc.)
  • Domain-based statistics — counts and ranks how many extracted emails belong to each domain
  • Structured report — output file is organized into category stats, domain stats, and separated email lists for easier review

About

Python script that extracts all email addresses from a .txt file using regex and saves them to an output file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages