Skip to content

e9srawat/lesson017

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokenizing and Parsing - Part I

Introduction 1

A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines).

A lexer is basically a tokenizer, but it usually attaches extra context to the tokens -- this token is a number, that token is a string literal, this other token is an equality operator.

A parser takes the stream of tokens from the lexer and turns it into an abstract syntax tree representing the (usually) program represented by the original text.

The best book on the subject for many decades was Compilers: Principles, Techniques, and Tools, aka "The Dragon Book".

Worksheet

1. Tokenizer

Write a function that takes a string target and returns all tokens that start with prefix and end with suffix, inclusive, and returns a list of items that match.

Add tokenizer.pd and add the function as stubbed.

tokenizer(target, prefix, suffix):
    return list_of_tokens_that_match

2. Better Tokenization

Can you describe three limitations of the function described, which if addressed, could allow this code to be more useful and reuseable?

Write your answer in tokenizer.md

3. Scraping a webpage

Write a function that takes a url and returns a list of all urls that are only referenced as ahrefs in the response text. Make sure to use the tokenizer function you wrote in Part 1.

get_url_list(url):
    # do something
    url_list = tokenizer(webpage_source, prefix, suffix)
    # do something else
    return url_list

4. Infix to postfix

Take an infix expression (a + b) ^ c - d / q and render it as a postfix expression.

The output must be a list that can be processed as postfix.

Make no assumptions about spacing or other delimiters.

Add your function to the file tokenizer.py.

infix_to_postfix(infix_expression: str):
    # do stuff
    return postfix_result

Example

>>> infix_to_postfix("a+b*c+d")
["a", "b", "c", "*", "+", "d", "+"]

Footnotes

  1. Stack Overflow

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages