FIT3155 S1/2025: Assignment 2

Due Date: 11:55 pm on Fri 09 May 2025
Weight: 20 marks

Assignment Instructions

Code Requirement: You must write all the code yourself. Only use standard libraries. Standard data structures (like list, dictionary, tuple, set) are allowed as long as they are space/time - efficient for your purpose. Input/output and other unavoidable routines are exempted.
Submission Procedure:
- File Naming and Content: All your scripts must contain your name and student ID.
- Zip File: Upload a .zip archive via Moodle with the filename in the form <student ID>.zip. The archive should extract to a directory named after your student ID. Include a2.py script and a2report.pdf in this directory, and nothing else.
- Submission Method: Submit the archive electronically via Moodle.
- Specification Adherence: Strictly follow the specifications in each question. Do not hard - code input filenames; they should be passed from the command line.
Academic Integrity: Monash University upholds high standards of honesty. All submissions will be scanned via MOSS or JPLAG. Using generative AI to answer this assessed task is prohibited.

Assignment Task

DL - distance ≤1 multiple pattern matching within a collection of texts

Task: Given a collection of text files and multiple pattern files, report all occurrences of each pattern that matches, under the DL - distance ≤1 threshold, the regions of texts in the collection.
Data Structure Requirement: Use the suffix tree data structure as the primary search data structure for DL - distance ≤1 searches. You may also use supplementary data structures or algorithms from this unit or its prerequisite units to optimize performance.
Input Files Details:
- Character Set: Each text/pattern file contains characters from the 7 - bit ASCII printable characters (32–126) and ASCII whitespace control codes (8–13, 32). Assume the ‘$’ symbol does not appear in text/pattern files.
- File Content: Each text/pattern file has at least one character from the specified alphabet. The longest pattern is at most as long as the shortest text. Multiple text files can have identical contents, and a larger text file may contain the contents of a smaller one. The same applies to pattern files.

Program Specification

Program Name: a2.py
Argument: The program accepts one argument, which is the filename of a run - configuration file. The run - configuration file format is:

<N=number of text files> <M=number of pattern files>
1 <text_filename_1.txt>
2 <text_filename_2.txt>
...
N <text_filename_N.txt>
1 <pattern_filename_1.txt>
2 <pattern_filename_2.txt>
...
M <pattern_filename_M.txt>

Command Line Usage: a2.py <run - configuration - fileaname>
Output File Name: output a2.txt
Output Format:

<pattern number> <text number> <position of occurrence> <DL - distance>
...

- **Numbering**: The pattern number and text number in each line should follow the numbering in the run - configuration file. They need not be in sorted order.
- **Position of Occurrence**: It is the 1 - based position where the pattern was observed in the text under a DL - distance of 0 or 1.

Written PDF Report (`a2report.pdf`)

Answer the following questions in the report:

How did you handle multiple texts in your approach?
How did you handle DL - distance = 0 search on multiple patterns in your approach?
How did you handle DL - distance = 1 search on multiple patterns in your approach? Report one subsection for each of the 4 edit operations: (1) substitution, (2) transposition, (3) insertion (of a character in a pattern), and (4) deletion (of a character in a pattern).
Were changes made to the suffix tree data structure (compared to the version delivered in your lecture)? If yes, state the changes and justify them.
Did you use any other supporting data structures or algorithms not already covered in your responses to the above?
What is the worst - case time AND space complexity of your suffix tree construction over multiple files?
What is the worst - case time AND space complexity of your search approach under the DL - distance ≤1 threshold? # FIT3155 a2

加微信 powcoder

QQ 1823890830

Programming Help Add Wechat powcoder

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FIT3155 S1/2025: Assignment 2

Assignment Instructions

Assignment Task

DL - distance ≤1 multiple pattern matching within a collection of texts

Program Specification

Written PDF Report (`a2report.pdf`)

加微信 powcoder

QQ 1823890830

Programming Help Add Wechat powcoder

Email: powcoder@163.com

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FIT3155 S1/2025: Assignment 2

Assignment Instructions

Assignment Task

DL - distance ≤1 multiple pattern matching within a collection of texts

Program Specification

Written PDF Report (a2report.pdf)

加微信 powcoder

QQ 1823890830

Programming Help Add Wechat powcoder

Email: powcoder@163.com

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Written PDF Report (`a2report.pdf`)

Packages