YESS Learning Groups on Machine Learning applications to Earth System Science
There is a growing interest in Machine Learning (ML) methods applied to Earth System science. Together with more traditional machine learning methods like classification and regression methods, new techniques like artificial neural networks or causal discovery algorithms are showing to have great potential to solve challenges in our field. Learning to apply these methods would be beneficial for Early Career Researchers (ECRs), this is why YESS is organizing a learning activity to bring together members of our community who want to apply these methods to their own data and problems.
There is a huge amount of resources available online, workshops, open source software, codes and datasets and it can be hard to know where to start. These learning groups are intended to provide ECRs the opportunity of engaging in a guided and collaborative learning process via the participation in small learning groups. Additionally, we expect these groups to allow discussion on the interpretation of the results and the combination of ML methods with physics-based methods.
The groups for the proposed methods are the following:
- Classification methods (i.e., decision trees, support vector machine classification)
- Regression methods (i.e., random forest regression, linear regression)
- Neural Networks
- Causal discovery algorithms
Supervised Learning methods for Spatial Data:
- Regression methods
- Classification methods
- Interpolation methods
The learning activities will take place between July and December and would ideally require an availability of at least 8 hours a month. The main goal will be to develop skills at applying one machine-learning method to process a real-world dataset, learn how to generate predictions and interpret the models. We will also discuss papers where these methods are applied to problems in our fields of research. We will encourage the groups to publish their results. There will be a closing session in which each group will be able to present their results.
The objectives of the learning group activities can be summarized to:
- Learning machine learning fundamentals;
- Learning how to pre-process environmental science data to apply machine learning (ML) methods;
- Build and train a ML (or deep learning) model;
- Evaluate and discuss model predictions and physical interpretation of the ML models with colleagues; and
- Solidify understanding of the above topics through group discussion in a closing session.
Description of Learning Group activities:
The first activity will be a short webinar series on providing an introduction to ML methods applied to climate and environmental sciences. Two researchers with experience in the application of these methods will talk about their work and the applications of ML.
Following these webinars participants will register to the Learning Group activities through a form where they can state their specific interests and time availability. Based on the answers the organizing team will arrange the members in Learning Groups with no more than 10 participants. Participants will interact through an online workspace and will be given the following guideline for their learning process:
- Read and discuss material for a theoretical introduction to the methods.
- Decide on a dataset and a problem to apply the method (this can be shared or individual). (August)
- Implement the chosen technique and share the challenges with the group. (August – November)
- Share results with the group members at frequent meetings (the frequency will be decided among the group members, we suggest monthly meetings). (August – November)
- Present results in a closing session with other learning groups (November)
- Collectively choose a researcher to invite to the closing webinar series. At these final webinar series we hope participants will be able to raise their questions regarding the application of these techniques. (November)
- If the results are interesting enough, organizers will encourage the preparation of a publication. (2022)
Computing resources will not be made available by the organizing team and participants will need to guarantee themselves the access to datasets and computer resources. All open access resources such as Google Collaborative will be recommended.
Programming experience is required and Python experience will be beneficial but not exclusive.