{ "cells": [ { "cell_type": "markdown", "id": "4f4804ec-bfb1-46e5-bb90-41d39702b319", "metadata": { "tags": [] }, "source": [ "
\n", "\n", "

ICON Training - Hands-on Session

\n", "\n", "# Optional exercise 2: Parallelization and Run-Time Performance\n", "\n", "
\n", "\"alt_text\"/\n", "
Domain decomposition
and halo region (schematic)
\n", "
\n", "\n", "---\n", "\n", "The course exercises revisit the topics of the individual tutorial chapters and range from easy tests to the setup of complex forecast simulations.\n", "\n", "In this particular exercise you will learn how to \n", "\n", "- make use of the built-in **timer module for performance diagnostic**\n", "- specify details of the **parallel model execution**\n", "\n", "\n", "**Note:** This script is not suited for operational use, it is part of the step-by-step tutorial.\n", "Furthermore, we will omit some of ICON's less important input and output channels here, e.g. the restart files. This exercise focuses on command-line tools and the rudimentary visualization with Python scripts. Professional post-processing and viz tools are beyond the scope of this tutorial.\n", "\n", "---\n", "\n", "### Setup\n", "\n", "We will start from the global setup (without nest), which has been used in exercise 2 \"Global Real Data Run\". \n", "\n", "| Configuration | |\n", "| :--- | :--- |\n", "| mesh size | 40 km (R2B6) |\n", "| model top height | 75 km |\n", "| no. of levels | 90 |\n", "| no. of cells (per level) | 327680 |\n", "| time step | 360 s |\n", "| duration | 12h |\n", "\n", "All input and configuration files have already been prepared for you. These include\n", "\n", "* the master namelist (**icon_master.namelist**)\n", "* the model namelist (**NAMELIST_NWP**)\n", "* the ICON batch job file (**icon.sbatch**)\n", "* the horizontal grid and external parameter file\n", "* the analysis file.\n", "\n", "When executing the two cells below, the experiment directory $EXPDIR will be prepared for you, including all the required files listed above.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "93edfe2f-d23e-4317-b7a7-631345b808c1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", " Exercise: \n", "
\n", " Execute the following cell in order to set some exercise-specific environment variables\n", "
" ] }, { "cell_type": "code", "execution_count": 3, "id": "32a376fc-a7f7-4200-95e2-5c2cd4bcb096", "metadata": {}, "outputs": [], "source": [ "# base directory for ICON sources and binary:\n", "ICONDIR=/pool/data/ICON/ICON_training/icon/\n", "\n", "# directory with input grids and external data:\n", "GRIDDIR=/pool/data/ICON/ICON_training/exercise_realdata/grids\n", "# directory with initial data:\n", "DATADIR=/pool/data/ICON/ICON_training/exercise_realdata/data/ini\n", "\n", "# absolute path to directory with plenty of space:\n", "SCRATCHDIR=/scratch/${USER::1}/$USER\n", "EXPDIR=$SCRATCHDIR/exercise_parallelization\n", "\n", "# absolute path to files needed for radiation\n", "RADDIR=${ICONDIR}/externals/ecrad/data\n", "\n", "# path to prepared namelists\n", "NMLDIR=/pool/data/ICON/ICON_training/exercise_realdata/setup_global" ] }, { "cell_type": "markdown", "id": "5fb6ae1d-f411-423e-be95-3c6a55c756e7", "metadata": {}, "source": [ "
\n", " Exercise: \n", "
\n", " Execute the cell below, in order to copy/link input data and configuration files to the experiment directory $EXPDIR: grids, external parameters, initial conditions, namelists.
\n", "The directory for the experiment will be created, if not already there.\n", "
" ] }, { "cell_type": "code", "execution_count": 4, "id": "394c985d-9dc0-4c8e-a2ff-1144818e4f15", "metadata": {}, "outputs": [], "source": [ "if [ ! -d $EXPDIR ]; then\n", " mkdir -p $EXPDIR\n", "fi\n", "cd ${EXPDIR}\n", "\n", "# grid files: link to output directory\n", "ln -sf ${GRIDDIR}/iconR*.nc .\n", "# external parameter files: link to output directory\n", "ln -sf ${GRIDDIR}/extpar*.nc .\n", "# data files: link to output directory\n", "ln -sf ${DATADIR}/*.grb .\n", "\n", "# Dictionary for the mapping: DWD GRIB2 names <-> ICON internal names\n", "ln -sf ${ICONDIR}/run/ana_varnames_map_file.txt map_file.ana\n", "\n", "# For Output: Dictionary for the mapping: names specified in the output nml <-> ICON internal names\n", "ln -sf ${ICONDIR}/run/dict.output.dwd dict.output.dwd\n", "\n", "# copy the master namelist, model namelist and job script to the output directory\n", "cp ${NMLDIR}/icon_master.namelist .\n", "cp ${NMLDIR}/NAMELIST_NWP_global NAMELIST_NWP\n", "cp ${NMLDIR}/icon.sbatch ." ] }, { "cell_type": "markdown", "id": "204cb996-ef6c-406a-92be-a3a93e81cd67", "metadata": {}, "source": [ "## Measuring the run-time performance\n", "\n", "In this exercise you will learn how to make use of the ICON timer module for basic performance measurement.\n", "\n", "
\n", " Exercise (Performance assessment using the buit-in timer output):\n", " \n", "\n", "At the end of the model run, a log file `slurm.XYZ.out` is created. It can be found in your experiment directory `$EXPDIR`.
Scroll to the end of this file. You should find wall clock timer output comparable to that listed in Section 8.4. Try to identify\n", "\n", "- the total run-time\n", "- the time needed by the radiation module\n", "
\n" ] }, { "cell_type": "markdown", "id": "2e3de0e9-97ac-4713-9340-32161e66a869", "metadata": {}, "source": [ "Submit your ICON batch job" ] }, { "cell_type": "code", "execution_count": null, "id": "970a5fbf-26cf-4113-96a6-68f77834c435", "metadata": {}, "outputs": [], "source": [ "export ICONDIR=$ICONDIR\n", "cd $EXPDIR && sbatch icon.sbatch" ] }, { "cell_type": "markdown", "id": "9a4ae162-b90f-4a46-99a2-06ff7cb52bb6", "metadata": {}, "source": [ "Check the job status via squeue" ] }, { "cell_type": "code", "execution_count": null, "id": "bf28f8f8-7c73-4b5f-962a-f810408eb491", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "raw", "id": "a8306c5c-cdf6-4a9c-86f2-c664ba53df3a", "metadata": {}, "source": [ "your answer" ] }, { "cell_type": "markdown", "id": "63720dd9-34f9-4fe6-b734-ef5ddc7637f9", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "Solution\n", " \n", "Namelist changes\n", "```\n", "&run_nml\n", " ltimer = .TRUE.\n", " timers_level = 10\n", "/\n", "```\n", "\n", "Wall clock timer output\n", "```\n", "- total run-time: ~ 85s\n", "- radiation run-time: ~ 11s \n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "01c2246f-eea9-4f54-8d53-57e29e26c038", "metadata": {}, "source": [ "###" ] }, { "cell_type": "markdown", "id": "33a8137b-6fde-4cb0-9861-944af163bdef", "metadata": {}, "source": [ "## Specifying details of parallel model execution\n", "\n", "The next exercise focuses on the mechanisms for parallel execution of the ICON model. These settings become important for performance scalability, when increasing the model resolution and core counts.\n", "\n", "
\n", " Exercise (Changing the number of MPI tasks):
\n", " In case your computational resources are sufficient, one possibility to speed up your model run is to increase the number of MPI tasks.\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "44ea4d0a-a34a-4d07-969f-7b27b6099503", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "Solution\n", "\n", "```\n", "#SBATCH --nodes=10\n", "#SBATCH --ntasks-per-node=128\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "8cbb62c8-f25f-4ded-899c-00f0c9fa2fc9", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### " ] }, { "cell_type": "markdown", "id": "2afb4321-9f41-4808-84dd-6967f0590982", "metadata": {}, "source": [ "Submit your ICON batch job" ] }, { "cell_type": "code", "execution_count": null, "id": "d72ffe13-dafc-4716-bdc9-8576c52e2162", "metadata": {}, "outputs": [], "source": [ "export ICONDIR=$ICONDIR\n", "cd $EXPDIR && sbatch icon.sbatch" ] }, { "cell_type": "markdown", "id": "9a7af8f3-a60c-4928-82f8-3c2d2b3c15d0", "metadata": {}, "source": [ "Check the job status via squeue" ] }, { "cell_type": "code", "execution_count": null, "id": "659706cf-4d3e-4b65-bb64-5a245f25b9da", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "69144fd4-85a7-476e-8922-d3e970f8f737", "metadata": {}, "source": [ "###" ] }, { "cell_type": "markdown", "id": "0a0ee522-05d8-4f83-a6fe-60a5955b81f5", "metadata": {}, "source": [ "
\n", " Exercise (Computing the speedup):
\n", " Compare the timer output of the dynamical core, nh_solve, and the transport module, transport, with the timer output of your previous run.\n", "\n", "- What do you think is a more sensible measure of the effective cost: `total min` or `total max`?\n", "- Compute the speedup that you gained from doubling the number of MPI tasks. Which speedup did you achieve and what would you expect from \"theory\"?\n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "9d1eb256-2cdd-42ca-aba5-bddeeb97b3b2", "metadata": {}, "source": [ " Speedup achieved $=\\frac{T_{\\mathrm{x}}}{T_{\\mathrm{2x}}} = \\ldots$" ] }, { "cell_type": "raw", "id": "884be74e-dfbb-46e1-bb74-ce9777acdc14", "metadata": {}, "source": [ " Your Answer" ] }, { "cell_type": "markdown", "id": "2395f43c-c9fd-4208-9807-35f91926f7c1", "metadata": {}, "source": [ "Sensible measure of effective cost ..." ] }, { "cell_type": "raw", "id": "57a61ee6-536a-4ce0-82e5-fc5343bab0c4", "metadata": {}, "source": [ " Your Answer" ] }, { "cell_type": "markdown", "id": "221f1b5d-341e-48df-9fc8-3825b31e21f6", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "Solution\n", " \n", "Answer:\n", "\n", "- **Sensible measure of effective cost:** total max (processes are typically waiting for the slowest one at barriers)\n", "\n", "- **approximate speedup:** nh_solve: 46.5s/24.9s = 1.87; transport: 10.1s/5.4s = 1.87\n", "\n", "- **theoretical expectation:** speedup factor 2\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "a166a9e8-4570-4410-a9c1-a87511031584", "metadata": {}, "source": [ "
\n", "\n", "| grid | namelist parameter |\n", "| --- | --- |\n", "| dynamics grid(s) | `dynamics_grid_filename` (namelist `grid_nml`) |\n", "| radiation grid: | `radiation_grid_filename` (namelist `grid_nml`) |\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "aa2b88df-62e2-4262-9f8b-7f6af8ec0e3a", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### " ] }, { "cell_type": "markdown", "id": "92a6c3df-6f0e-4953-bdee-1282eecfc849", "metadata": { "tags": [] }, "source": [ "---" ] }, { "cell_type": "markdown", "id": "cb126281-ee81-48c5-88c2-09e40d2a858d", "metadata": {}, "source": [ "

Congratulations! You have successfully completed optional Exercise 2.

" ] }, { "cell_type": "markdown", "id": "3e3437d7-533d-4d03-a6c8-a8d6a87f6bb4", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "5ead2f4c-a2f9-4cbe-9598-38e371b21448", "metadata": {}, "source": [ "## Further Reading and Resources\n", "\n", "- ICON Tutorial, Ch. 5: https://www.dwd.de/DE/leistungen/nwv_icon_tutorial/nwv_icon_tutorial.html\n", "
A new draft version of the ICON Tutorial is available here: https://icon-training-2025-scripts-rendering-cc74a6.gitlab-pages.dkrz.de/index.html. It is currently being finalized and will be published soon." ] }, { "cell_type": "markdown", "id": "cee681f0-8568-4254-b946-7e96a7c88d6d", "metadata": {}, "source": [ "---\n", "\n", "*Author info: Deutscher Wetterdienst (DWD) 2025 :: icon@dwd.de. For a full list of contributors, see CONTRIBUTING in the root directory. License info: see LICENSE file.*" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 5 }