GeoSciML May 7, 2026 : discussion on the Scientific Machine Learning documentation.
The Scientific Machine Learning documentation is intended for two audiences with different needs:
- Newcomers to the IGE who have little or no experience with machine learning.
- Newcomers to the IGE who already have a background in machine learning.
Introductory Documentation
- Links to machine learning tutorials, such as CNRS Fiddles; PDF documentation or other format of documentation would also be helpful...
- Highlight key considerations (with links to tutorials for each point):
- How to verify that your model is learning.
- Loss: how to choose, etc.
- Normalizing and selecting data
- Handling missing data
- Which format to use for input data (pipeline?)
- The importance of the different datasets : train / validation / test
- Links to supercomputer pages (Kraken, Jean Zay, etc.) + what are the advantages of each (complexity of access, speed of access, available resources, etc.).
- Links on how to work with GPUs (nvidia-smi, etc.) + when and why to use a GPU + how to verify that you are using the GPU correctly.
- References to the ping documentation on conda environments.
- References to solutions for model monitoring (TensorBoard, Weight & Biases).
- References on best practices for model dissemination: OpenScience...
- Reference articles (Transformers, etc.).
- Links to courses on the fundamentals (variational optimization, etc.)
- Link to the intranet page listing who is working on machine learning at IGE.
- References to machine learning libraries (pytorch, tensorflow, jax ...)
- Links to lists of datasets.
- Links to community models (YOLO, etc.).
- IDE and notebooks: installation, storage access, remote work, etc.
Documentation for advanced users
- Access to GPUs.
- What is PING and how to join it.
- Resources on best practices for model sharing: OpenScience, etc.
- Jax vs PyTorch vs TensorFlow.
- Carbon footprint of training runs/models.
@auraoupa Maybe we could review all the proposals with the GeoSciML group on a Thursday at 3:00 p.m.? We could also use the coworking time (3:00–4:00 p.m.) to start drafting the selected pages.
GeoSciML May 7, 2026 : discussion on the Scientific Machine Learning documentation.
The Scientific Machine Learning documentation is intended for two audiences with different needs:
Introductory Documentation
Documentation for advanced users
@auraoupa Maybe we could review all the proposals with the GeoSciML group on a Thursday at 3:00 p.m.? We could also use the coworking time (3:00–4:00 p.m.) to start drafting the selected pages.