Skip to content

powcoder/FIT3155-a2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

FIT3155 S1/2025: Assignment 2

  • Due Date: 11:55 pm on Fri 09 May 2025
  • Weight: 20 marks

Assignment Instructions

  1. Code Requirement: You must write all the code yourself. Only use standard libraries. Standard data structures (like list, dictionary, tuple, set) are allowed as long as they are space/time - efficient for your purpose. Input/output and other unavoidable routines are exempted.
  2. Submission Procedure:
    • File Naming and Content: All your scripts must contain your name and student ID.
    • Zip File: Upload a .zip archive via Moodle with the filename in the form <student ID>.zip. The archive should extract to a directory named after your student ID. Include a2.py script and a2report.pdf in this directory, and nothing else.
    • Submission Method: Submit the archive electronically via Moodle.
    • Specification Adherence: Strictly follow the specifications in each question. Do not hard - code input filenames; they should be passed from the command line.
  3. Academic Integrity: Monash University upholds high standards of honesty. All submissions will be scanned via MOSS or JPLAG. Using generative AI to answer this assessed task is prohibited.

Assignment Task

DL - distance ≤1 multiple pattern matching within a collection of texts

  1. Task: Given a collection of text files and multiple pattern files, report all occurrences of each pattern that matches, under the DL - distance ≤1 threshold, the regions of texts in the collection.
  2. Data Structure Requirement: Use the suffix tree data structure as the primary search data structure for DL - distance ≤1 searches. You may also use supplementary data structures or algorithms from this unit or its prerequisite units to optimize performance.
  3. Input Files Details:
    • Character Set: Each text/pattern file contains characters from the 7 - bit ASCII printable characters (32–126) and ASCII whitespace control codes (8–13, 32). Assume the ‘$’ symbol does not appear in text/pattern files.
    • File Content: Each text/pattern file has at least one character from the specified alphabet. The longest pattern is at most as long as the shortest text. Multiple text files can have identical contents, and a larger text file may contain the contents of a smaller one. The same applies to pattern files.

Program Specification

  1. Program Name: a2.py
  2. Argument: The program accepts one argument, which is the filename of a run - configuration file. The run - configuration file format is:
<N=number of text files> <M=number of pattern files>
1 <text_filename_1.txt>
2 <text_filename_2.txt>
...
N <text_filename_N.txt>
1 <pattern_filename_1.txt>
2 <pattern_filename_2.txt>
...
M <pattern_filename_M.txt>
  1. Command Line Usage: a2.py <run - configuration - fileaname>
  2. Output File Name: output a2.txt
  3. Output Format:
<pattern number> <text number> <position of occurrence> <DL - distance>
...
- **Numbering**: The pattern number and text number in each line should follow the numbering in the run - configuration file. They need not be in sorted order.
- **Position of Occurrence**: It is the 1 - based position where the pattern was observed in the text under a DL - distance of 0 or 1.

Written PDF Report (a2report.pdf)

Answer the following questions in the report:

  1. How did you handle multiple texts in your approach?
  2. How did you handle DL - distance = 0 search on multiple patterns in your approach?
  3. How did you handle DL - distance = 1 search on multiple patterns in your approach? Report one subsection for each of the 4 edit operations: (1) substitution, (2) transposition, (3) insertion (of a character in a pattern), and (4) deletion (of a character in a pattern).
  4. Were changes made to the suffix tree data structure (compared to the version delivered in your lecture)? If yes, state the changes and justify them.
  5. Did you use any other supporting data structures or algorithms not already covered in your responses to the above?
  6. What is the worst - case time AND space complexity of your suffix tree construction over multiple files?
  7. What is the worst - case time AND space complexity of your search approach under the DL - distance ≤1 threshold? # FIT3155 a2

加微信 powcoder

QQ 1823890830

Programming Help Add Wechat powcoder

About

FIT3155 编程辅导, Code Help, CS tutor, Wechat: powcoder, powcoder@163.com

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors