so far the input sentences need to be at least context_length. need to have padding to make sure model takes in at least a context_length (esp. GPT)