love baseball? Want to code? Start here. Using Python and baseball's biggest stars and stats, we'll learn variables, loops, lists, and more!
This project is in progress and open to feedback. đź’ˇ
Why Python?
Python is one of the easiest programming languages to read and write, which makes it perfect for learning to code for the first time. Python is also lightweight and great for data - Python doesn’t force you to memorize lots of symbols. You focus on what you want to do, not how complicated the language is.
Python is like learning the rules of baseball before learning advanced analytics — it builds a foundation you can grow on.
Open Terminal
Applications → Utilities → Terminal
Run:
python3 --version
What you’ll see
✅ Python 3.x.x → Python is installed
❌ command not found → we’ll install it
macOS sometimes ships with a system Python — we will NOT use python, only python3.
What is Homebrew? Homebrew is a popular, free, open-source package manager for macOS and Linux that simplifies installing, updating, and managing software from the command line.
So to install homebrew, we'll simply copy the following command in our terminal. (Open Terminal appliction)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
After it finishes, let's verify that it was installed by checking the version.
brew --version
brew install python
Verify:
python3 --version
pip3 --versionWe should see something like this:
Python 3.12.x
pip 23.xIn terminal, first, we'll navigate to our Desktop. (alternatively, we can use another location but for beginners, I recommed Desktop)
mkdir baseball_python
cd baseball_pythoncd changes our directory to baseball_pyhton
Create our first file:
touch baseball.pyđź’Ą You created your first python file! Congrats.
Install VSCode
Download from VSCode
Once downloaded, we can open up our project.
In your terminal window, we can simply type:
code .The project should open in VSCode.
You can also open VSCode, select Open and select your baseball.py file.
If code doesn't work, Cmd + Shift + P -> Shell Command: Install 'Code' command
Open baseball.py and type (or copy):
print("Hello, baseball Python!")❗️ you will need to add the Python extension in VSCode. It should prompt you when adding a python file.
In VS Code:
- Click the Extensions icon on the left sidebar
- Search for Python
- Install Python
This extension gives us:
- Syntax highlighting
- Helpful error messages
- Easy run buttons
Run your file either in VSCode or in Terminal.
In VSCode, select â–¶Run. The intergrated terminal should open and you should see:
Hello, baseball Python!
Or in Terminal, type:
python3 baseball.pyYou'll then see:
Hello, baseball Python!
🎉 That’s it. You just ran your first Python file.
Why this matters
print()shows output.pyfiles are Python programspython3 filename.pyruns the file in Terminal
Python really shines when working with data, and baseball is full of it.
We’ll install a few libraries that help us work with stats.
pip install pandas numpy matplotlibwhat these do:
- pandas → tables of player stats
- numby → math and calculations
- matplotlib → charts and graphs
Create a new file:
touch stats.pyOpen it and add:
import pandas as pd
data = {
"player": ["Judge", "Ohtani", "Trout"],
"hits": [2, 3, 1],
"at_bats": [4, 5, 3]
}
df = pd.DataFrame(data)
df["average"] = df["hits"] / df["at_bats"]
print(df)Run it and you should see the following output:
| Player | Hits | At Bats | Batting Average |
|---|---|---|---|
| Judge | 2 | 4 | 0.500 |
| Ohtani | 3 | 5 | 0.600 |
| Trout | 1 | 3 | 0.333 |
👏 Now we're ready to code with baseball ⚾️
Before we can do anything interesting in Python, we need to understand variables.
What is a variable?
A variable is a named container that stores a value.
In baseball terms: *A stat (hits, at-bats, runs) is a value
*The stat name is the variable
Example: open baseball.py and replace the code with this:
player_name = "Judge"
hits = 2
at_bats = 4Here's what's happening:
player_namestores text (a string)hitsstores a number- ```at_bats```` stores a number
Using variables together
Now let’s calculate a batting average:
batting_average = hits / at_bats
print(player_name)
print(batting_average)When we run the file, you should see:
Judge
0.5Making the output easier to read
Python lets us format output so it looks nicer:
print(f"{player_name}'s batting average is {batting_average:.3f}")Output:
Judge's batting average is 0.500Why Variables matter:
- Store baseball stats
- Reuse values
- Change data without rewriting code
If Judge gets another hit, we only change one number:
hits = 3Everything else updates automatically.
Takeaways
- Variables store information
- Numbers don’t need quotes
- Text does need quotes
- Variables make stats reusable and flexible
Python needs to know what kind of data it’s working with.
These are called data types.
For now, we’ll focus on the two most important ones:
- Strings (text)
- Numbers (integers and decimals) Strings (text)
A string is any text wrapped in quotes.
Baseball examples:
player_name = "Judge"
team = "Yankees"
position = "RF"Key rule:
- Strings must be inside quotes (" " or ' ')
If you forget the quotes, Python thinks it’s a variable and will throw an error.
Numbers
Numbers are used for math and do NOT use quotes.
hits = 2
at_bats = 4
batting_average = 0.500There are two main number types you’ll see:
Integers (whole numbers)
hits = 2
at_bats = 4Floats (decimals)
batting_average = 0.500Why data types matter
Python treats strings and numbers very differently.
This works (math with numbers):
hits = 2
at_bats = 4
print(hits / at_bats)Output:
0.5This does NOT work (math with strings):
hits = "2"
at_bats = "4"
print(hits / at_bats)❌ Python will error because text can’t be divided.
Mixing strings and numbers (the right way)
If you want to print text and numbers together, use an f-string:
player_name = "Judge"
hits = 2
at_bats = 4
average = hits / at_bats
print(f"{player_name} has a batting average of {average:.3f}")Output:
Judge has a batting average of 0.500Checking a variable’s data type
You can ask Python what type something is:
print(type(player_name))
print(type(hits))
print(type(average))Output:
<class 'str'>
<class 'int'>
<class 'float'>Common beginner mistake 🚨
❌ This looks right but is wrong:
hits = "2"âś… This is correct:
hits = 2
Remember:
- Quotes = text
- No quotes = number
Key takeaways
- Strings = text (names, teams, positions)
- Integers = whole numbers (hits, at-bats)
- Floats = decimals (averages)
- Python needs correct data types to do math
So far, we’ve worked with one player at a time. But baseball is a team sport — we need a way to store multiple players together.
That’s where lists come in.
Creating a List
players = ["Judge", "Ohtani", "Trout"]Things to remember:
- Lists use square brackets []
- Items are separated by commas
- Order matters
Accessing items in a list
Each item in a list has a position called an index. Indexes start at 0, not 1.
print(players[0])
print(players[1])
print(players[2])Output:
Judge
Ohtani
TroutAdding players to a list
You can add a new player to the roster using .append() :
players.append("Betts")
print(players)Output:
['Judge', 'Ohtani', 'Trout', 'Betts']Counting players on the roster
To see how many players are in the list, use len() :
print(len(players))Output:
4Lists of numbers (baseball stats)
Lists aren’t just for names — they’re great for stats too:
hits = [2, 3, 1]
at_bats = [4, 5, 3]Each position lines up with the same player:
hits[0]→ Judgehits[1]→ Ohtanihits[2]→ Trout
Using list values in calculations
average = hits[0] / at_bats[0]
print(average)
Output:
0.5Common beginner mistakes 🚨
❌ Using parentheses instead of brackets:
players = ("Judge", "Ohtani", "Trout")❌ Forgetting indexes start at 0:
print(players[1])(This prints the second player, not the first.)
Key takeaways
- Lists store multiple values
- Lists use square brackets
- Indexes start at 0
- Lists are perfect for rosters and stat groups
Lists are great for storing multiple values, but they don’t tell us what each value represents.
For example, this works:
hits = [2, 3, 1]But which number is hits? At-bats? Walks?
That’s where dictionaries come in.
What is a dictionary?
A dictionary stores data as key–value pairs.
Think of a dictionary like a player card:
-
The key is the stat name
-
The value is the stat itself
Creating a dictionary
Here’s a dictionary for one player:
player = {
"name": "Judge",
"hits": 2,
"at_bats": 4
}Things to notice:
-
Dictionaries use curly braces { }
-
Keys are strings
-
Each key maps to a value using :
Accessing values in a dictionary
You access values by using the key name:
print(player["name"])
print(player["hits"])
print(player["at_bats"])Output:
Judge
2
4Using dictionary values in calculations
average = player["hits"] / player["at_bats"]
print(average)Output:
0.5Now the code clearly shows what stat is being used.
Updating values
If a player gets another hit, you can update the dictionary:
player["hits"] = 3Recalculate:
average = player["hits"] / player["at_bats"]
print(average)Output:
0.75Adding new stats
You can add new key–value pairs at any time:
player["walks"] = 1
player["rbi"] = 2Common beginner mistakes 🚨
❌ Forgetting quotes around keys:
player[hits]âś… Correct:
player["hits"]Now we’re going to combine what you’ve learned so far:
- Lists (multiple players)
- Dictionaries (player stats)
This is how you represent a real roster in Python.
What are we building
Instead of one player:
player = {"name": "Judge", "hits": 2, "at_bats": 4}We'll store many players:
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 3}
]Each item in the list is a dictionary representing one player.
Accessing a player
print(roster[0])Output:
{'name': 'Judge', 'hits': 2, 'at_bats': 4}Accessing a specific stat
print(roster[0]["name"])
print(roster[0]["hits"])Output:
Judge
2Looping through the roster
This is where Python starts to feel powerful.
for player in roster:
average = player["hits"] / player["at_bats"]
print(f"{player['name']} batting average: {average:.3f}")Output:
Judge batting average: 0.500
Ohtani batting average: 0.600
Trout batting average: 0.333Why this matters
This pattern is used everywhere:
- Sports analytics
- Databases
- APIs
- Real-world applications
If you understand this, you’re officially past the beginner line.
So far, we’ve manually accessed players one at a time. That doesn’t scale. Baseball has lineups, innings, and seasons.
That’s where for loops come in.
What is a for loop?
A for loop lets Python repeat an action for each item in a collection.
In baseball terms:
- A lineup = list
- Each at-bat = one loop
Basic for loop example
players = ["Judge", "Ohtani", "Trout"]
for player in players:
print(player)Output:
Judge
Ohtani
TroutPython reads this as: For each player in the list, print the player’s name.”
Looping through a roster of player stats
Using our roster from the previous step:
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 3}
]
for player in roster:
average = player["hits"] / player["at_bats"]
print(f"{player['name']} average: {average:.3f}")Output:
Judge average: 0.500
Ohtani average: 0.600
Trout average: 0.333Why indentation matters 🚨
Python uses indentation, not braces.
This works:
for player in roster:
print(player["name"])This does NOT:
for player in roster:
print(player["name"])Indentation tells Python what belongs inside the loop.
Simulating an inning
Think of a loop like an inning where each batter comes up once:
batters = ["Judge", "Ohtani", "Trout"]
for batter in batters:
print(f"{batter} is at the plate")Output:
Judge is at the plate
Ohtani is at the plate
Trout is at the plateKey takeaways
- For loops repeat actions
- Loops work perfectly with lists
- Indentation matters in Python
- Loops model real baseball sequences
So far, we’ve written the same calculations more than once. In real programming, we don’t want to repeat ourselves.
That’s where functions come in.
What is a function?
A function is a reusable block of code that performs a specific task.
In baseball terms:
- A function is like a stat formula
- You give it inputs (hits, at-bats)
- It gives you an output (batting average)
Your first function
def batting_average(hits, at_bats):
return hits / at_batsWhat this means:
defstarts a functionhitsandat_batsare inputs (parameters)returnsends back a result
Using the function
avg = batting_average(2, 4)
print(avg)Output:
0.5Using the function with player data
player = {"name": "Judge", "hits": 2, "at_bats": 4}
avg = batting_average(player["hits"], player["at_bats"])
print(f"{player['name']} average: {avg:.3f}")
Output:
Judge average: 0.500Using functions inside loops
roster = [
{"name": "Judge", "hits": 2, "at_bats": 4},
{"name": "Ohtani", "hits": 3, "at_bats": 5},
{"name": "Trout", "hits": 1, "at_bats": 3}
]
for player in roster:
avg = batting_average(player["hits"], player["at_bats"])
print(f"{player['name']} average: {avg:.3f}")Output:
Judge average: 0.500
Ohtani average: 0.600
Trout average: 0.333Why functions matter
Functions:
- Prevent duplicated code
- Make programs easier to read
- Let you change logic in one place
- Mirror real baseball formulas
Key takeaways
- Functions package logic into reusable blocks
- Inputs go in parentheses
returnsends data back- Functions model real baseball stats
Batting average is useful, but baseball uses multiple stats to measure performance.
In this step, we’ll:
- Create multiple stat formulas
- Use functions for each one
- Combine stats like real baseball analytics
The stats we’ll calculate
We’ll start with three common ones:
Batting Average (BA) BA = hits / at_bats
On-Base Percentage (OBP) OBP = (hits + walks) / (at_bats + walks)
Slugging Percentage (SLG) SLG = total_bases / at_bats
OPS OPS = OBP + SLG
Batting Average function (review)
def batting_average(hits, at_bats):
return hits / at_batsOn-Base Percentage (OBP)
def on_base_percentage(hits, walks, at_bats):
return (hits + walks) / (at_bats + walks)Slugging Percentage (SLG)
def slugging_percentage(total_bases, at_bats):
return total_bases / at_batsOPS (combining stats)
def ops(obp, slg):
return obp + slgUsing the functions with a player
player = {
"name": "Judge",
"hits": 2,
"walks": 1,
"at_bats": 4,
"total_bases": 5
}
ba = batting_average(player["hits"], player["at_bats"])
obp = on_base_percentage(player["hits"], player["walks"], player["at_bats"])
slg = slugging_percentage(player["total_bases"], player["at_bats"])
player_ops = ops(obp, slg)
print(f"{player['name']} BA: {ba:.3f}")
print(f"{player['name']} OBP: {obp:.3f}")
print(f"{player['name']} SLG: {slg:.3f}")
print(f"{player['name']} OPS: {player_ops:.3f}")Output:
Judge BA: 0.500
Judge OBP: 0.600
Judge SLG: 1.250
Judge OPS: 1.850Why this matters
This is real-world programming:
- Small, focused functions
- Reusable logic
- Clear formulas
- Readable output
This is how analytics code is actually written.
Key takeaways
- One stat = one function
- Functions can build on each other
- Code mirrors real baseball formulas
- This is foundational analytics logic
