Skip to content

Scientific Machine Learning documentation #84

@EnlNovius

Description

@EnlNovius

GeoSciML May 7, 2026 : discussion on the Scientific Machine Learning documentation.

The Scientific Machine Learning documentation is intended for two audiences with different needs:

  • Newcomers to the IGE who have little or no experience with machine learning.
  • Newcomers to the IGE who already have a background in machine learning.

Introductory Documentation

  • Links to machine learning tutorials, such as CNRS Fiddles; PDF documentation or other format of documentation would also be helpful...
  • Highlight key considerations (with links to tutorials for each point):
    • How to verify that your model is learning.
    • Loss: how to choose, etc.
    • Normalizing and selecting data
    • Handling missing data
    • Which format to use for input data (pipeline?)
    • The importance of the different datasets : train / validation / test
  • Links to supercomputer pages (Kraken, Jean Zay, etc.) + what are the advantages of each (complexity of access, speed of access, available resources, etc.).
  • Links on how to work with GPUs (nvidia-smi, etc.) + when and why to use a GPU + how to verify that you are using the GPU correctly.
  • References to the ping documentation on conda environments.
  • References to solutions for model monitoring (TensorBoard, Weight & Biases).
  • References on best practices for model dissemination: OpenScience...
  • Reference articles (Transformers, etc.).
  • Links to courses on the fundamentals (variational optimization, etc.)
  • Link to the intranet page listing who is working on machine learning at IGE.
  • References to machine learning libraries (pytorch, tensorflow, jax ...)
  • Links to lists of datasets.
  • Links to community models (YOLO, etc.).
  • IDE and notebooks: installation, storage access, remote work, etc.

Documentation for advanced users

  • Access to GPUs.
  • What is PING and how to join it.
  • Resources on best practices for model sharing: OpenScience, etc.
  • Jax vs PyTorch vs TensorFlow.
  • Carbon footprint of training runs/models.

@auraoupa Maybe we could review all the proposals with the GeoSciML group on a Thursday at 3:00 p.m.? We could also use the coworking time (3:00–4:00 p.m.) to start drafting the selected pages.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions