Patrick Carlberg – Data Science Learning Journey (2020–2025)

Strategic skill development period for deep technical expertise, project-based learning, and modern AI pipelines.

Executive Summary

This repository documents my systematic journey from domain expert in nanotechnology to full-stack, production-ready data scientist. Over five years, I completed 70+ hands-on projects, 50+ advanced certifications, and delivered modern machine learning solutions using the latest open-source libraries and cloud platforms.

Foundation Building: Intensive upskilling in Python, statistics, ML, and cloud
Practical Application: End-to-end projects from data wrangling to web deployment
Modern Workflows: Productionization via Docker, FastAPI, LangChain, and more
Learning-in-Public: Transparent record of raw code, refactoring, and skill development

All raw project code is documented as is to reflect genuine skill progression. Folders are organized to highlight learning phases and technology stacks, with honest badges indicating code maturity. TIMELINE.md

Interactive Project Timeline

🚀 View Interactive Timeline

Click above to see the full D3.js interactive visualization

Repository Structure

learning-journey/
│
├── README.md                    # Executive summary and navigation
├── TIMELINE.md                  # Chronological list of 70+ projects
├── GroupTimeline.md             # Projects sorted by library or subject 
├── 01_coursera_certificates/    # Sample of Coursera certificates 
├── 02_foundations/              # Phase 1 – Python, Stats, Early ML (2020–2021)
├── 03_machine_learning/         # Phase 2 – Core ML, XGB, PyTorch, TF (2021–2022)
├── 04_web_deployment/           # Phase 3 – Flask/FastAPI, APIs, web apps (2021–2023)
├── 05_advanced_computing/       # GPU/CUDA, Big Data, Optimization (2022–2024)
├── 06_quantitative_finance/     # Financial ML, RL, time series (2020–2024)
├── 07_kaggle_competitions/      # Competitive ML & public benchmarks (2021–2025)
├── 08_modern_ml_ai/             # Modern AI (LLMs, LangChain, RAG, NLP) (2024–2025)
├── 09_portfolio_showcase/       # Presentable, end-to-end project demos

Phases of Development

Phase 1: Foundations (2020–2021)

Upgraded Python, statistics, and data visualization skills
Completed foundations via Coursera specializations

Phase 2: Applied ML (2021–2022)

Built and iterated on Kaggle, UCI, and public datasets
First production ML deployments

Phase 3: Deployment + Performance (2021–2024)

Moved projects to web, experimented with containerization
GPU, big-data, and scalable ML implementations

Phase 4: Modern AI (2024–2025)

Integrated LLMs, LangChain, generative AI
Competed in advanced Kaggle and NeurIPS challenges

Featured Code Snippets

🐍 Phase 1: Early Python (2020)

# First data visualization - humble beginnings
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('stock_data.csv')
df['price'].plot(title='My First Stock Chart')
plt.show()

🤖 Phase 2: Machine Learning Pipeline (2021)

# XGBoost model with proper validation
from xgboost import XGBClassifier
from sklearn.model_selection import cross_val_score

model = XGBClassifier(
    n_estimators=1000,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    random_state=42
)

scores = cross_val_score(model, X_train, y_train, cv=5, scoring='roc_auc')
print(f"CV AUC: {scores.mean():.4f} (+/- {scores.std() * 2:.4f})")

🚀 Phase 3: Production FastAPI (2022-2023)

# Full-stack ML serving with FastAPI
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import torch
import asyncio

app = FastAPI(title="ML Model API", version="2.0.0")

class PredictionRequest(BaseModel):
    features: List[float]
    model_version: str = "v1.2"

@app.post("/predict")
async def predict(request: PredictionRequest):
    try:
        model = await load_model_async(request.model_version)
        prediction = model.predict(torch.tensor(request.features))
        return {"prediction": prediction.item(), "confidence": 0.95}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

🧠 Phase 4: Modern AI Integration (2024-2025)

# LangChain RAG system with custom retrieval
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA

class CustomRAGSystem:
    def __init__(self, docs_path: str):
        self.embeddings = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2"
        )
        self.vectorstore = Chroma.from_documents(
            documents=self.load_documents(docs_path),
            embedding=self.embeddings
        )
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=self.setup_llm(),
            retriever=self.vectorstore.as_retriever(search_kwargs={"k": 3})
        )
    
    async def query(self, question: str) -> str:
        return await self.qa_chain.arun(question)

Raw/Unpolished Code & Growth Mindset

Some code in early folders is left intentionally raw or only lightly refactored. This is deliberate: it documents practical learning cycles, iterative improvements, and technological catch-up after a career pivot.

See TIMELINE.md for project-by-project progress, with major milestones and evolving code quality tagged along the way.

Quick Links

Complete Project Timeline: All projects, with dates and themes
Group Timeline – by Tech/Library
Coursera Certificates (Summary)
Best Portfolio Projects
GitHub Profile

Value Proposition

Self-driven, systematic skill acquisition from first principles to production
End-to-end project delivery, with honesty about unfinished/raw work
Strong documentation practices, even for learning-phase code

Contact: • https://github.com/CJRockball

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
01_coursera_certificates		01_coursera_certificates
02_foundations - Python, Stats, Early ML (2020-2021)		02_foundations - Python, Stats, Early ML (2020-2021)
03_machine_learning - Core ML Development (2021-2022)		03_machine_learning - Core ML Development (2021-2022)
04_web_deployment - Flask, FastAPI Progression (2021-2023)		04_web_deployment - Flask, FastAPI Progression (2021-2023)
05_advanced_computing - GPU, Big Data, Performance (2022-2024)		05_advanced_computing - GPU, Big Data, Performance (2022-2024)
06_quantative_finance - Financial Modeleing (2020-2024)		06_quantative_finance - Financial Modeleing (2020-2024)
07_kaggle_competitions - Competitive ML (2021-2025)		07_kaggle_competitions - Competitive ML (2021-2025)
08_modern_nl_ai - Latest AI Technologies (2024-2025)		08_modern_nl_ai - Latest AI Technologies (2024-2025)
09_portfolio_showcase - Professional Highlights		09_portfolio_showcase - Professional Highlights
assets		assets
GroupTimeline.md		GroupTimeline.md
README.md		README.md
TIMELINE.md		TIMELINE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patrick Carlberg – Data Science Learning Journey (2020–2025)

Executive Summary

Interactive Project Timeline

Repository Structure

Phases of Development

Featured Code Snippets

Raw/Unpolished Code & Growth Mindset

Quick Links

Value Proposition

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Patrick Carlberg – Data Science Learning Journey (2020–2025)

Executive Summary

Interactive Project Timeline

Repository Structure

Phases of Development

Featured Code Snippets

Raw/Unpolished Code & Growth Mindset

Quick Links

Value Proposition

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages