{"id":255,"date":"2018-06-28T16:14:44","date_gmt":"2018-06-28T15:14:44","guid":{"rendered":"https:\/\/computenodes.net\/?p=255"},"modified":"2025-01-17T16:04:06","modified_gmt":"2025-01-17T16:04:06","slug":"building-hpl-an-atlas-for-the-raspberry-pi","status":"publish","type":"post","link":"https:\/\/computenodes.net\/2018\/06\/28\/building-hpl-an-atlas-for-the-raspberry-pi\/","title":{"rendered":"Building HPL and ATLAS for the Raspberry Pi"},"content":{"rendered":"

Following on from work by Simon Cox and colleagues building the first Raspberry Pi Cluster<\/a> we are building a new cluster and want to compare the performance of the two.\u00a0 To do this we are once again going to run Linpack.\u00a0 Versions of MPI and HPL have moved on since then.\u00a0 When Wee Archie Blue<\/a> was being developed issues were found with the off the shelf atlas version so we are building it from scratch. Here’s an updated set of install instructions for running on Raspbian.<\/p>\n

    \n
  1. Install the dependencies<\/li>\n<\/ol>\n
    \r\nsudo apt install gfortran automake\r\n<\/pre>\n

    2. Download atlas from https:\/\/sourceforge.net\/projects\/math-atlas\/<\/a>. At the time of writing this is version 3.10.3, your version might be different.<\/p>\n

    \r\ntar xjvf atlas3.10.3.tar.bz2\r\n<\/pre>\n

    3. Create a directory to build in (it’s recommended to not build in the source hierarchy), and cd into it<\/p>\n

    \r\n\r\nmkdir atlas-build\r\ncd atlas-build\/\r\n\r\n<\/pre>\n

    4. Disable CPU throttling on the Pi – the process will not start if it detects throttling.<\/p>\n

    \r\necho performance | sudo tee \/sys\/devices\/system\/cpu\/cpu0\/cpufreq\/scaling_governor\r\n<\/pre>\n

    This will stop throttling, I found it helped to have a fan blowing air over the CPU to make sure it didn’t over heat.<\/p>\n

    5. Configure and build Atlas. These steps will take a while. Do NOT use the -j flag to parallelize the make process, this will cause inconsistent results. Where possible it will run operations in parallel automatically.<\/p>\n

    \r\n..\/ATLAS\/configure\r\nmake\r\n<\/pre>\n

    6. Download MPI<\/a> and install<\/p>\n

    \r\ncd\r\nwget http:\/\/www.mpich.org\/static\/downloads\/3.2\/mpich-3.2.tar.gz\r\ntar xzvf mpich-3.2.tar.gz\r\ncd mpich-3.2\r\n.\/configure\r\nmake -j 4\r\nsudo make install\r\n<\/pre>\n

    6. Download HPL from and extract configure<\/p>\n

    \r\ncd\r\nwget http:\/\/www.netlib.org\/benchmark\/hpl\/hpl-2.2.tar.gz\r\ntar xzvf hpl-2.2.tar.gz\r\ncd hpl-2.2\r\ncd setup\r\nsh make_generic\r\ncp Make.UNKNOWN ..\/Make.rpi\r\ncd ..\r\n<\/pre>\n

    7. Then edit Make.rpi to reflect where things are installed.\u00a0 In our case the following lines are edited from the default. Note line numbers might change with future versions<\/p>\n

    \r\nARCH = rpi\r\n<\/pre>\n

    […]<\/p>\n

    \r\nTOPdir       = $(HOME)\/hpl-2.2\r\n<\/pre>\n

    […]<\/p>\n

    \r\nMPdir        = \/usr\/local\r\nMPinc        = -I $(MPdir)\/include\r\nMPlib        = \/usr\/local\/lib\/libmpich.so\r\n<\/pre>\n

    […]<\/p>\n

    \r\nLAdir        = \/home\/pi\/atlas-build\r\nLAinc        =\r\nLAlib        = $(LAdir)\/lib\/libf77blas.a $(LAdir)\/lib\/libatlas.a\r\n<\/pre>\n

    8. Then compile HPL<\/p>\n

    \r\nmake arch=rpi\r\n<\/pre>\n

    Congratulations<\/strong>. You should now have a working HPL install. Let’s test it.<\/p>\n

    9. Change into the working directory and create the configuration needed to test the system. As the pi has 4 cores you need to tell mpi to assign 4 tasks to the host. Depending on the ambient temperature you may need to add a fan to stop the Pi CPU overheating as these tests are very demanding.<\/p>\n

    \r\ncd bin\/rpi\r\ncat << EOF > nodes-1pi\r\nlocalhost\r\nlocalhost\r\nlocalhost\r\nlocalhost\r\nEOF\r\n<\/pre>\n

    Customise the HPL.dat input file. The file below is the starting point we use<\/p>\n

    \r\nHPLinpack benchmark input file\r\nInnovative Computing Laboratory, University of Tennessee\r\nHPL.out      output file name (if any)\r\n6            device out (6=stdout,7=stderr,file)\r\n1            # of problems sizes (N)\r\n5120         Ns\r\n1            # of NBs\r\n128          NBs\r\n0            PMAP process mapping (0=Row-,1=Column-major)\r\n1            # of process grids (P x Q)\r\n2            Ps\r\n2            Qs\r\n16.0         threshold\r\n1            # of panel fact\r\n2            PFACTs (0=left, 1=Crout, 2=Right)\r\n1            # of recursive stopping criterium\r\n4            NBMINs (>= 1)\r\n1            # of panels in recursion\r\n2            NDIVs\r\n1            # of recursive panel fact.\r\n1            RFACTs (0=left, 1=Crout, 2=Right)\r\n1            # of broadcast\r\n1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)\r\n1            # of lookahead depth\r\n1            DEPTHs (>=0)\r\n2            SWAP (0=bin-exch,1=long,2=mix)\r\n64           swapping threshold\r\n0            L1 in (0=transposed,1=no-transposed) form\r\n0            U  in (0=transposed,1=no-transposed) form\r\n1            Equilibration (0=no,1=yes)\r\n8            memory alignment in double (> 0)\r\n<\/pre>\n

    Run the test<\/p>\n

    \r\nmpiexec -f nodes-1pi .\/xhpl\r\n<\/pre>\n

    If all goes to plan your output should look similar to<\/p>\n

    ================================================================================\r\nHPLinpack 2.2  --  High-Performance Linpack benchmark  --   February 24, 2016\r\nWritten by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK\r\nModified by Piotr Luszczek, Innovative Computing Laboratory, UTK\r\nModified by Julien Langou, University of Colorado Denver\r\n================================================================================\r\n\r\nAn explanation of the input\/output parameters follows:\r\nT\/V    : Wall time \/ encoded variant.\r\nN      : The order of the coefficient matrix A.\r\nNB     : The partitioning blocking factor.\r\nP      : The number of process rows.\r\nQ      : The number of process columns.\r\nTime   : Time in seconds to solve the linear system.\r\nGflops : Rate of execution for solving the linear system.\r\n\r\nThe following parameter values will be used:\r\n\r\nN      :    5120\r\nNB     :     128\r\nPMAP   : Row-major process mapping\r\nP      :       2\r\nQ      :       2\r\nPFACT  :   Right\r\nNBMIN  :       4\r\nNDIV   :       2\r\nRFACT  :   Crout\r\nBCAST  :  1ringM\r\nDEPTH  :       1\r\nSWAP   : Mix (threshold = 64)\r\nL1     : transposed form\r\nU      : transposed form\r\nEQUIL  : yes\r\nALIGN  : 8 double precision words\r\n\r\n--------------------------------------------------------------------------------\r\n\r\n- The matrix A is randomly generated for each test.\r\n- The following scaled residual check will be computed:\r\n      ||Ax-b||_oo \/ ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )\r\n- The relative machine precision (eps) is taken to be               1.110223e-16\r\n- Computational tests pass if scaled residuals are less than                16.0\r\n\r\n================================================================================\r\nT\/V                N    NB     P     Q               Time                 Gflops\r\n--------------------------------------------------------------------------------\r\nWR11C2R4        5120   128     2     2              25.11              3.565e+00\r\nHPL_pdgesv() start time Wed May 16 10:35:46 2018\r\n\r\nHPL_pdgesv() end time   Wed May 16 10:36:11 2018\r\n\r\n--------------------------------------------------------------------------------\r\n||Ax-b||_oo\/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.2389736 ...... PASSED\r\n================================================================================\r\n\r\nFinished      1 tests with the following results:\r\n              1 tests completed and passed residual checks,\r\n              0 tests completed and failed residual checks,\r\n              0 tests skipped because of illegal input values.\r\n--------------------------------------------------------------------------------\r\n\r\nEnd of Tests.\r\n================================================================================\r\n<\/pre>\n

    The above steps were sufficient to get run atlas on a Pi 3B+ testing has shown that on a model 3B it may crash for N values above about 6000. This appears to be a problem with the hardware of the 3B, as described in a post on the pi forum<\/a>. Following the step described in the post of adding the following line to \/boot\/config.txt enabled problem sizes up to and including 10240 to be executed.<\/p>\n

    \r\nover_voltage=2\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"

    Following on from work by Simon Cox and colleagues building the first Raspberry Pi Cluster we are building a new cluster and want to compare the performance of the two.\u00a0 To do this we are once again going to run Linpack.\u00a0 Versions of MPI and HPL have moved on since then.\u00a0 When Wee Archie Blue
    Continue reading Building HPL and ATLAS for the Raspberry Pi<\/span> <\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,8],"tags":[],"class_list":["post-255","post","type-post","status-publish","format-standard","hentry","category-cluster","category-raspberry-pi","col-1"],"_links":{"self":[{"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/posts\/255","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/comments?post=255"}],"version-history":[{"count":16,"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/posts\/255\/revisions"}],"predecessor-version":[{"id":280,"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/posts\/255\/revisions\/280"}],"wp:attachment":[{"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/media?parent=255"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/categories?post=255"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computenodes.net\/wp-json\/wp\/v2\/tags?post=255"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}