Skip to content
View CJRockball's full-sized avatar
💭
Data Driven Discoveries
💭
Data Driven Discoveries

Block or report CJRockball

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
CJROCKBALL/README.md

🚀 Launching Discoveries with Data-Driven Code

Patrick Carlberg | Scientist


🔬 Semiconductor & Materials Science Background
PhD-level nanotechnology scientist with 10+ years of experience in statistical analysis of semiconductor manufacturing processes, material characterization, and device measurements. Expertise spans hypothesis testing, ANOVA, time-series analysis, and Six Sigma methodologies applied to tool development and process optimization.

📊 Data Science & ML Engineering
Specialized in tabular-data analysis and time-series modeling using the Python ecosystem (scikit-learn, XGBoost, PyTorch, PyMC3).

🔧 Data Science in Production
Experience building production ML systems with FastAPI, applying MLOps practices, and deploying scalable data solutions—from experimental design and statistical process control to predictive modeling and automated decision systems.

💻 Technical Problem Solver
Projects range from CUDA-accelerated Mandelbrot visualizations to WebSocket streaming dashboards and microservice-based prediction APIs. Advocate of clean code, thorough documentation, and user-centric design.

🔄 Continuous Learning & Experimentation
Data Science Learning Portfolio — Documented 5-year upskilling journey: 70+ projects, 50+ certifications (Imperial College, Google Cloud, AWS), progressing from Python foundations to production LLMs, and agentic AI systems, MLOps. Includes interactive D3.js project timeline. learning journey

🤖 Agents & LangGraph
Building stateful AI agents is my newest focus. I’m using the open-source langgraph framework to prototype multi-step, tool-using agents that persist state, support human-in-the-loop checkpoints, and recover from failures. My companion repo agent-lab (work-in-progress) collects reusable patterns—React-style planners, streaming memory nodes, and LangGraph + FastAPI micro-services—for anyone exploring agentic workflows.

🔍 Technical Interests

  • Scientific Computing · NumPy, SciPy, mathematical modeling, algorithm optimization
  • ML & Statistics · Experimental design, time series, Bayesian methods, ensemble models
  • Visualization · Interactive plotting, real-time dashboards, GPU-accelerated graphics
  • Web Development · FastAPI, WebSocket streaming, responsive data applications

🛠 Current Interests · Scientific computing • Algorithm optimization • Multi-angle dataset exploration • Async web APIs • Automated validation pipelines • Agentic systems

📖 Tech Stack Snapshot · Python • PyTorch • GBDT • scikit-learn • PyMC3 • NumPy/SciPy • Pandas • FastAPI • WebSocket • Docker • SQL • Langgraph • LangChain

📚 Past Exposure • MATLAB • R • C/C++ • Markdown • CSS • PHP • Assembly • Java


🔭 Demo Corner

🥧 ML Modelling
Full evaluation of the adult salary dataset including EDA, modelling and evaluation. → dataset eval

Some raw model example based on the kaggle s5e12 competition dataset → raw models

🛎️ Services
FasrAPI microservice predicting restaurant tab (Docker-ready). → fast api1

FastAPI microservice predicting personality (Docker-ready). → fast api2

Streamlit rag agent, more complete service with tests and CI. → served chat agent

📊 Statistics
GLMM notebooks. → GLMM

🌀 Mandelbrot GPU
CUDA-powered fractal explorer with real-time zoom. → CUDA

📈 Streaming Plot
WebSocket server streaming data into interactive Bokeh charts. → Live plotting


🛠️ Main Languages

Python HTML SQL

🧰 Python Frameworks and Libraries

XGBoost PyTorch FastAPI NumPy Pandas Scikit-learn Langgraph Statsmodels PYMC Dask CUDA

🗄️ Databases & Cloud Hosting

SQLite PostgreSQL GitHub Docker Heroku

💻 Software & Tools

Git Jupyter VS Code MINITAB JMP Tableau Raspberry Pi


Pinned Loading

  1. adult_dataset_analysis_notebooks adult_dataset_analysis_notebooks Public

    EDA and modeling of adult dataset

    Jupyter Notebook

  2. stat_glmm stat_glmm Public

    Notebooks with GLMM models

    Jupyter Notebook

  3. kaggles5e12_diabetes kaggles5e12_diabetes Public

    Some raw modelling for kaggle s5e12

    Python

  4. Deep_Agents Deep_Agents Public

    Explore and develop deep agents with langgraph

    Python

  5. learning_journey learning_journey Public

    Machine Learning Journey

    Jupyter Notebook

  6. xgb-predict xgb-predict Public

    Example prediction service

    Python