One of the biggest challenges in using grid for scientific computing is the complexity of the environment. The goal of GridJM is to make large, heterogeneous and distributed computing infrastructure running the Advanced Resource Connector (ARC), being developed at the www.nordugrid.org forum, conveniently available to its users. GridJM tries to achieve this by automating many of the tasks which would otherwise lay on the users.
What Is GridJM?GridJM is a grid job manager for executing parallel computations using ARC middleware. The basic ARC tools allow user to submit jobs to computational grids, but they lack on features to manage the job lifecycle from the submit command to retrieving the results of a successfully completed job. To make grid computing easier, GridJM provides a layer of job management services on top of ARC tools. These services include for example: babysitting (monitoring the progress) of jobs, re-submitting failed jobs, and automatic results retrieval.
How GridJM can be used?GridJM can be installed as a daemon on the client machine that accepts distributed job executions to be managed. When running, GridJM listens a local socket through which users can pass ARC-compatible job descriptions to the manager. The jobs are passed simply by writing the contents of job description (.xrsl) file to the socket which GridJM listens to.
What GridJM has been used for?GridJM has been developed mainly by Antti Hyvärinen to manage distributed SAT solving with computational grids. Another application where GridJM has been used, was to distribute the feature extraction of large image databases in MedGIFT project at University Hospitals of Geneva. gridjm-0.6. GridJM is available from here, and is distributed under the GNU Public Licence.
DocumentationThe most comprehensive documentation for GridJM is currently in the Usage -part in this page. The distribution also contains man files and an example client, which you will hopefully find useful.
CompilingTo compile gridjm, you need to have a working source code of nordugrid-arc, and to be able to compile that, you need working distribution of globus toolkit and arc-enabled gsoap.
For example, you might use globus 4.0.5 from this location. The source code compiles by
% ./configure --prefix=/opt/globus-4.0.5
# make prews
# make install
# cd /opt/; ln -s globus-4.0.5 globus
Then you might want to use gsoap 2.7.8a-2ng from this location (nordugrid site). Compiling it needs a bit more work:
% rpm2cpio gsoap-2.7.8a-2ng.src.rpm > gsoap-2.7.8a-2ng.src.cpio
% cpio -i < gsoap-2.7.8a-2ng.src.cpio
% tar zxf gsoap_2.7.8a.tar.gz
% patch -p0 < gsoap_shared.patch
% patch -p0 < gsoap_build.patch
% patch -p0 < gsoap_openssl098.patch
% cd gsoap-2.7
% rm -rf autom4te.cache
% ./configure --prefix=/opt/gsoap-2.7.8a-2ng --libdir=/opt/gsoap-2.7.8a-2ng/lib
% make -C soapcpp2 libgsoap++.la
# make install
Finally, the nordugrid-arc sources can be downloaded from here (this is a very temporary solution, I couldn't find 0.6.0.2 from nordugrid site today), and compiled with
% ./configure --prefix=/opt/nordugrid-arc-0.6.0.2_gt-4.0.5/ --with-gsoap-location=/opt/gsoap
# make install
# cd /opt; ln -s nordugrid-arc-0.6.0.2_gt-4.0.5 nordugrid
Now you need to get the site-certificates to /opt/nordugrid/share/certificates (or /etc/grid-security/certificates), and then you are ready to compile and run gridjm.
Get the gridjm tarball, run
% ./configure --with-gsoap-location=/opt/gsoap --with-arc-location=/opt/nordugrid
% make install
UsageGridJM requires complete Nordugrid ARC (client) installation and headers (for compiling).
GridJM has an extensive help listing of its command line options. The distribution also contains man pages. Just type gridjm --help or man 8 gridjm.
After started, GridJM opens a tcp port (uds is also supported) for communication with the user. The port is used by GridJM to request jobs when it considers that there is free space in the grid. It indicates this by sending the message
# free n
where n stands for how many free CPUs GridJM thinks grid has at the time it sends the message. Users can answer to this request by sending an xrsl (see NorduGrid documentation). The end of the xrsl is denoted by an end-of-file character (hexadecimal 0xff). When GridJM notices a successfully finished job, it starts a download process. When all files associated with the job are downloaded, it prints the message
# dlfinish path
where path is the path in which the files are located.
If you need something simple to start with, you can use this python script to communicate with gridjm. It should provide a good starting point for different submitters. A more complicated example is the submitfiles -python script in the distribution.Here you can find a short presentation on the subject. However, the following works have extensively used GridJM on their experimental work.
Last modified Tuesday, 27-May-2008 13:47:24 EEST