Invoke Server Developer$B!G(Bs Manual

This document describes how to develop a Ninf-G Invoke Server

Contents
1. Introduction
1.1 Role
1.2 Requirements for underlying middleware
1.3 Implementation Overview
1.4 Execution flow
2. Specification
2.1 Details of Invoke Server
2.1 Protocol between Ninf-G Client and Invoke Server
Appendix A. How to specify the Invoke Server
Appendix B. Miscellaneous information


1. Introduction
Ninf-G Client invokes Ninf-G Executable on the server machine when a function for initialization of function/object handles such as grpc_function_handle_init() is called.  Ninf-G Version 2 implements the remote process invocation using Globus Toolkit$B!G(Bs Pre-WS GRAM.  Implemented using Globus API, the invocation mechanism was embedded in Ninf-G.  In order to utilize other systems such as WS GRAM, UNICORE, and Condor for remote process invocation, Ninf-G Version 4 implements the invocation mechanism as a separate module called $B!H(BInvoke Server$B!I(B.  This design enables users and developers to implement and add a new Invoke Server which utilizes any job invocation mechanism.

Ninf-G Version 4.1.0 includes the following Invoke Servers:
  - Invoke Server for WS GRAM implemented in Phython (GT4phy)
  - Invoke Server for WS GRAM implemented in Java (GT4java)
  - Invoke Server for UNICORE (UNICORE)
  - Invoke Server for Pre-WS GRAM implemented in C (GT2c)

Invoke Servers for Condor and ssh will be included in the next release of Ninf-G Version 4 (Version 4.2.0).  Invoke Server for NAREGI Super Scheduler is already implemented, but it is not included in Ninf-G package.

1.1 Role
Here is a typical flow of Ninf-G Client application: 

(1) grpc_initialize()
Initializes data structures used by Ninf-G Client.

(2) grpc_function_handle_init()
Creates a function/object handle which requests remote process invocation.  The request will be processed and a Ninf-G Executable will be created on the server machine.  When the Ninf-G Executable is created, it connects to the Ninf-G Client to establish a TCP connection between the Ninf-G Executable and the Ninf-G Client.

(3) grpc_call() $B$d(B grpc_call_async()/grpc_wait_any()
Calls the remote function, i.e. (1) the Ninf-G Client sends arguments to the Ninf-G Executable, (2) the Ninf-G Executable does computation, and (3) the Ninf-G Executable sends results to the Ninf-G Client.

(4) grpc_function_handle_destruct()
Requests the Ninf-G Executable to terminate its process.  If an error occurs for the termination, the Ninf-G Client requests the Invoke Server to kill the Ninf-G Executable.

(5) grpc_finalize()
Frees data structures used by the Ninf-G Client.

Invoke Server is required to implement initialization and finalization of function/object handles which are described in steps (2) and (4).

1.2 Requirements for underlying middleware 
The only one requirement for underlying middleware is that the middleware must be capable of remote process invocation.  Examples of such middleware include Globus Toolkit Pre-WS GRAM, Globus Toolkit WS GRAM, Unicore, Condor, and SSH.

1.3 Implementation overview 
Invoke Server is an adaptor for the underlying middleware and it handles requests from Ninf-G Client.  Invoke Server analyzes and processes the request sent from Ninf-G Client and replies to the Ninf-G Client.  For example, if Invoke Server receives JOB_CREATE request from Ninf-G Client, the Invoke Server creates a Job ID, returns the Job ID to the Ninf-G Client, and invokes job processes according to the request.
Invoke Server can be implemented by any language.  The details of the protocol between the Ninf-G Client and Invoke Server are described in Section 2.

1.4 Execution flow 
This section describes a sample flow of RPC to the server $B!F(BsererA$B!G(B via the Invoke Server $B!F(BIS_SAMPLE$B!G(B. 

(Prerequisite)
(1) Prepare a client configuration file which describes that IS_SAMPLE is used for RPC to serverA.

(grpc_function_handle_init())
(2) The Ninf-G Client requests the IS_SAMPLE to create a function/object handle.

(3) If this is the first time for IS_SAMPLE to create a function/object handle, IS_SAMPLE process is spawned by the Ninf-G Client on the same machine.  ${NG_DIR}/bin/ng_invoke_server.IS_SAMPLE is a command for spawning IS_SAMPLE.

(4) The Ninf-G Client and the IS_SAMPLE communicate using three pipes (stdin, stdout, and stderr).

(5) When grpc_function_handle_init() is called, the Ninf-G Client send JOB_CREATE to IS_SAMPLE followed by necessary information (e.g. hostname and port number of the remote serer) and JOB_CREATE_END.

(6) When IS_SAMPLE receives JOB_CREATE, IS_SAMPLE returns $B!H(BS$B!I(B to the Ninf-G Client which indicates that the request has been received by the Invoke Server.

(7) IS_SAMPLE generates a new Job ID that corresponds to the Request ID which was transferred with the JOB_CREATE request and returns the Job ID to the Ninf-G Client. Then, IS_SAMPLE invokes remote processes (Ninf-G Executable) on serverA using its underlying middleware.

(8) The Ninf-G Client waits the reply from IS_SAMPLE.  When the Ninf-G Client receives the reply, it resumes the execution without waiting actual job invocation at serverA.

(grpc_call())
(9) When the Ninf-G Executable is invoked on serverA, it connects to the Ninf-G Client using Globus IO.  The connection is used for the communication (e.g. argument transfers from the Ninf-G Client to the Ninf-G Executable) between the Ninf-G Client and the Ninf-G Executable.  IS_SERVER does nothing for grpc_call().  If underlying middleware for IS_SAMPLE returns an error on remote process invocation, IS_SAMPLE must notify the Ninf-G Client that the job invocation was failed.

(grpc_function_handle_destruct())
(10) When grpc_function_handle_destruct() is called, the Ninf-G Client requests the Ninf-G Executable to exit.  This communication is done between the Ninf-G Client and the Ninf-G Executable.  The Ninf-G Client does not wait the Ninf-G Executables to be terminated.

(11) When the Ninf-G Executable exit, the job status managed by IS_SAMPLE should be changed to DONE and IS_SAMPLE notifies the Ninf-G Client of DONE.

(12) The Ninf-G Client sends JOB_DESTROY request to I_SAMPLE.

(13) IS_SAMPLE returns $B!H(BS$B!I(B to the Ninf-G Client when it receives the JOB_DESTROY request.

(14) IS_SAMPLE returns DONE to the Ninf-G Client if the state of the corresponding job is DONE.  Otherwise, IS_SAMPLE cancels the job and notifies the Ninf-G Client of done when the cancel will be completed and status of the job becomes DONE.
	
(grpc_finalize())
(15) When grpc_finalize() is called, the Ninf-G Client sends EXIT request to IS_SAMPLE.

(16) IS_SAMPLE returns $B!H(BS$B!I(B to the Ninf-G Client when it receives the EXIT request.

(17) IS_SAMPLE cancels all jobs and when all jobs will be terminated.

(18) When the Ninf-G Client receives $B!H(BS$B!I(B from IS_SAMPLE, it continues its execution and does not wait the termination of all jobs.

The following figure illustrates the interaction between Ninf-G Client, Invoke Server, and Ninf-G Executable.

  +-----------+       +---------------+  WS GRAM   +------------+
  |Ninf-G     |  -->  | Invoke Server | ---------> |  Ninf-G    |-+
  |Client     |  <--  | (WS GRAM)     | <--------- | Executable | |
  |           |  <--  |               |            +------------+ | ...
  |           |       +---------------+              +------------+
  |           |
  |           |       +---------------+  UNICORE   +------------+
  |           |  -->  | Invoke Server | ---------> |  Ninf-G    |-+
  |           |  <--  | (UNICORE)     | <--------- | Executable | |
  |           |  <--  |               |            +------------+ | ...
  +-----------+       +---------------+              +------------+

  |<------ Client Side --------------->|<-Network->|<- Server Side ->|


2. Specification of Invoke Server
This section describes the details of Invoke Server and the protocol between Ninf-G Client and Invoke Server.

2.1 Details of Invoke Server

(1) Invoke Server is invoked when Ninf-G Client initializes a function/object handle on the remote server to which Ninf-G Client is configured to use the Invoke Server.

(2) The maximum number of jobs per Invoke Server is limited.  If the number of jobs exceeds the limit, new Invoke Server is invoked.

(3) Invoke Server exits if it receives EXIT request from the Ninf-G Client.  It is sent when the Ninf-G Client calls grpc_finalize().  Invoke Server also exits if it manages the maximum number of jobs and all jobs are terminated.

(4) Ninf-G Client and Invoke Server communicates using three pipes which are created by the Ninf-G Client when the Invoke Server is invoked.

(5) Ninf-G Client does not wait the termination of Invoke Server after the Ninf-G Client send EXIT request to the Invoke Server.

(6) If Ninf-G Client abnormally exits, pipes will be disconnected.  When Invoke Server detects that the pipes have been disconnected, Invoke Server must cancel all jobs and exit.

(7) Invoke Server is implemented as a Unix executable or script file which should be located in ${NG_DIR}/bin directory.  It can be located in the other directory if the Invoke Server is specified by an absolute path to the executable file.

(8) The file name of Invoke Server must follow the naming convention of $B!H(Bng_invoke_server$B!I(B + suffix where suffix corresponds to the underlying middleware for remote process invocation.

(9) Log file for Invoke Server can be specified as an optional argument of the Invoke Server command.
     -l [Log file name]
If this option is specified, Invoke Server output logs to the file specified by this argument.  Otherwise, logs are not recorded.

2.2 Protocol between Ninf-G Client and Invoke Server
2.2.1 Overview
Ninf-G Client and Invoke Server exchanges three types of messages, Request, Reply, and Notify.  Request message is sent from Ninf-G Client to Invoke Server.  Reply and Notify messages are sent from Invoke Server to Ninf-G Client.  Ninf-G Client assumes that Reply message will certainly be returned from Invoke Server when the Ninf-G Client sends a Request message.  Notify message is used to send messages from Invoke Server to Ninf-G Client asynchronously.  Three different pipes are used for sending three messages.

                                    direction
      + Request : stdin    | Ninf-G | ----> | Invoke |
      + Reply   : stdout   | Client | <---- | Server |
      + Notify  : stderr   |        | <---- |        |

2.2.2 Protocol
All messages are sent as plain text.  Return code (<RET>) is 0x0d0a.  Return code is a delimiter which decides the unit of messages.  Job ID is generated by Invoke Server.

2.2.2.1 Request
Four Request messages, JOB_CREATE, JOB_STATUS, JOB_DESTROY, and EXIT are supported.

(1) JOB_CREATE
- Format
          JOB_CREATE <Request ID><RET>
          executable .....<RET>
          vsite .....<RET>
          arg .....<RET>
          JOB_CREATE_END<RET>

- Explanation
This request is used to create and invoke a new job.  Necessary information for job invocation is described as a set of attributes which is transferred along with JOB_CREATE request.  The details of the attributes are described in 2.2.2.4.  JOB_CREATE is the only request which is described by multiple lines.  Other requests can be described by a single line.

Ninf-G Client transfers Reqeust ID to Invoke Server.  Invoke Server generates a unique Job ID and returns it to the Ninf-G Client.  The Job ID is used the Ninf-G Client to specify the job.

When Invoke Server receives JOB_CREATE, it must send Reply message to the Ninf-G Client.  Then, Invoke Server generates a unique Job ID and notifies the Job ID to the Ninf-G Client.  Finally, Invoke Server request job invocation on remote servers using underlying middleware for the Invoke Server.

(2) JOB_STATUS
- Format

          JOB_STATUS <Job ID><RET>

- Explanation
This request queries the status of jobs to Invoke Server.  Ninf-G Version 4.1.0 does not use JOB_STATUS message.

(3) JOB_DESTROY
- Format

          JOB_DESTROY <Job ID><RET>

- Explanation
This request is used to terminate and destroy jobs.  Invoke Server cancels all jobs if it receives the request and corresponding jobs are not comleted.  If Invoke Server confirms that all jobs are cancelled, it sends DONE to the Ninf-G Client.

(4) EXIT
- Format

          EXIT<RET>

- Explanation
This request is used to terminate Invoke Server.  If Invoke Server receives EXIT request, it must cancel all outstanding jobs and wait their termination. 

2.2.2.2 Reply
Invoke Server must send Reply message to Ninf-G Client if Invoke Server receives a Request message from the Ninf-G Client.

Reply to JOB_CREATE, JOB_DESTROY, and EXIT message is:

          [S   | F <Error String>]<RET>

where S is sent in case of Success.  Otherwise, F is returned followed by <Error String>.
Reply to JOB_STATUS message is:

          [S <Status>  | F <Error String>]<RET>

Where <Status> is denoted as: 
          <Status> : [PENDING | ACTIVE | DONE | FAILED]

Each status indicates the status such that:
           PENDING : Ninf-G Executable is waiting for invocation.
           ACTIVE : Ninf-G Executable is already invoked.
           DONE : Ninf-G Executable is already done.
           FAILED : Ninf-G Executable abnormally exited.

2.2.2.3 Notify
Notify message is used to send asynchronous message from Invoke Server to Ninf-G Client.  Two types of Notify message are provided.

(1) CREATE_NOTIFY
- Format

          CREATE_NOTIFY <Request ID> [S <Job ID> | F <Error String>]<RET>

This is used to notify Job ID to the Ninf-G Cient.  Job ID is case sensitive and cannot include invisible characters.

(2) STATUS_NOTIFY
- Format

          STATUS_NOTIFY <Job ID> <Status> <String><RET>

          <Status> : [PENDING | ACTIVE | DONE | FAILED]

- Explanation
This message is used to notify that the status of job has been changed.
<String> can be any string and the <String> is stored in output log.  It should be noted that the status of job can be changed from PENDING to DONE.

2.2.2.4 JOB_CREATE Request
This section describes the details of JOB_CREATE Request.
- Format

          JOB_CREATE <Request ID><RET>
          executable .....<RET>
          vsite .....<RET>
          arg .....<RET>
          ... (snip)
          JOB_CREATE_END<RET>

Attributes are put between JOB_CREATE<RET> and JOB_CREATE_DONE<RET>.  One attribute must be in one line and one line must include one and only one attribute.  Attributes can be lined in any order.  There are two types of attributes, mandatory attributes and optional attributes.  Invoke Server must return an error if mandatory attributes are not included.  Any unknown attributes must be ignored.

Attributes
The following is a list of attributes which are supported by Ninf-G.  Some of these attributes are basically provided for Globus Toolkit$B!G(Bs Pre-WS GRAM and WS-GRAM.  Any new attribute can be defined using $B!H(Binvoke_server_option$B!I(B attribute.

     name          mandatory      eanings
  hostname            yes    Host name of the server
  port                yes    Port number
  jobmanager          no     Job Manager
  subject             no     Subject of the GRAM
  client_name         yes    Host name of Ninf-G Client
  executable_path     yes    Path of Ninf-G Executable
  backend             yes    Backend of the remote function (e.g. MPI)
  count               yes    Number of Ninf-G Executables 
  staging             yes    A flag indicating staging will be used or not
  argument            yes    Arguments for Ninf-G Executable
  work_directory      yes    Working directory of the remote function
  gass_url            yes    URL of GASS
  redirect_enable     yes    A flag indicating redirection of stdout/stderr 
  stdout_file         no     file name of stdout
  stderr_file         no     file name of stderr
  environment         no     Environment variables (can be put multiple times)
  status_polling      yes    Interval of polling the status 
  refresh_credential  yes    Interval of refresh credential
  max_time            no     Maximum execution time
  max_wall_time       no     Maximum wall clock time
  max_cpu_time        no     Maximum CPU time
  queue_name          no     Name of the queue
  project             no     Name of the project
  host_count          no     Number of executables per host
  min_memory          no     Minimum size of requested memory
  max_memory          no     Maximum size of requested memory
  rsl_extensions      no     RSL extension

Detailed description
- hostname
  Host name of the server machine.

- port
  The server port number on which the Globus gatekeeper is listening.  The default value is 2119.

- jobmanager
  The job manager used on the server machine.

- subject [subject]
  Subject of resource manager contact.

- client_name [client name]
  Host name of the client machine.

- executable_path [path to the executable]
  Absolute path of Ninf-G Executable.
  The path represents remote path if staging is on.  Otherwise, the path represents local path.

- backend [backend]
  Method for launching Ninf-G Executable is specified bas backend.  The value is either NORMAL, MPI, or BLACS.  If MPI or BLACS is specified, Ninf-G Executable must be invoked via mpirun command.

- count [N]
  Number of Ninf-G Executables to be invoked.  If backend is MPI or BLACS, count means the number of nodes.

- staging [true/false]
  The value is true if staging is on and Invoke Server must transfer Ninf-G Executable file from the local machine to the remote machine.

- argument [argument]
  An argument for the Ninf-G Executable is specified by this attribute.  This attribute can specify one argument and multiple arguments must be specified by this attribute one by one.  The arguments must be passed to the Ninf-G Executable as arguments.
    Example:
         argument --client=$B!D(B
         argument --gass_server=$B!D(B

- work_directory [directory]
  This attribute specifies the directory in which Ninf-G Executable is invoked.

- gass_url
  This directory specifies the URL of the GASS server on the Client machine.  This attribute is used for Globus Toolkit$B!G(Bs Pre-WS GRAM.

- redirect_enable [true/false]
  This attribute is set to true if stdout/stderr of Ninf-G Executable is requested to be transferred to the Ninf-G Client.

- stdout_file [filename]
  If redirect_enable is set to true, this attribute specifies the name of the output file of stdout.  Invoke Server must output stdout to this file.  Ninf-G Client reads this file as an output file and writes the contents of the file to stdout of the Ninf-G Client.

- stderr_file [filename]
  If redirect_enable is set to true, this attribute specifies the name of the output file of stderr.  Invoke Server must output stderr to this file.  Ninf-G Client reads this file as an output file and writes the contents of the file to stderr of the Ninf-G Client.

- environment [ENV=VALUE]
  Environment variable for Ninf-G Executable is passed by this attribute.  Environment variable and its value are connected by =.  Only the variable is specified if it does not take a value.  Multiple environment variables must be specified one by one.

- status_polling [interval]
  Invoke Server may need to check the status of jobs by polling the status of the jobs.  This attribute specifies the interval of the polling.  The value is in second and if it is not specified, the default value 0 is used.

- refresh_credential [interval]
  This attribute specifies the interval of refreshing credentials.  The value is in second and if it is not specified, the default value 0 is used.

- max_time [time]
  This attribute specifies the maximum time of the job.

- max_wall_time [time]
  This attributes specifies the maximum wall clock time of the job.

- max_cpu_time [time]
  This attribute specifies the maximum cpu time of the job.

- queue_name [queue]
  This attribute specifies the name of the queue to which the Ninf-G Executable should be submitted.

- project [projectname]
  This attribute specifies the name of the project.

- host_count [number of nodes]
  This attribute specifies the number of nodes.

- min_memory [memory size (MB)]
  This attribute specifies the minimum requirements for the memory size of the job.

- max_memory [memory size (MB)]
  This attribute specifies the maximum memory size of the job.

- rsl_extensions [RSL extension]
  This attribute can be used to specify RSL extension which is available for Globus Toolkit$B!G(Bs WS GRAM.


Appendix A. How to specify the Invoke Server
Invoke Server is specified by the Ninf-G Client using Client configuration file.

A.1  How to specify Invoke Server
Invoke Server is specified by invoke_server attribute in <SERVER> section.

  invoke_server [type]

Type specifies the type of the invoke server such as GT4py and UNICORE.

A.2. How to pass information to Invoke Server
Invoke Server may require options for its execution.  Such options can be specified by option attribute in <INVOKE_SERVER> section or invoke_server_option attribute in <SERVER> section.

    option [String]
    invoke_server_option [String]

Multiple attributes can be specified in <SERVER> or <INVOKE_SERVER> section.

A.3. Polling interval
Invoke Server need to check the status of jobs and it may be implemented using polling.  The polling interval can be specified by status_polling interval in <INVOKE_SERVER> section.

    status_polling [interval(second)]

A.4 Logfile
Filename of Invoke Server$B!G(Bs execution log can be specified by invoke_server_log attriute in <CLIENT> section.

    invoke_server_log [filename]

If this attribute is specified, Invoke Server output logs to the file whose name is composed by the specified filename and the type of the Invoke Server.

log_filePath attribute in <INVOKE_SERVER> section can be used to specify log file for specific Invoke Server.

    log_filePath [Log file name]

A.5. Maximum number of jobs per Invoke Server
Maximum number of jobs per Invoke Server can be limited by max_jobs attribute in <INVOKE_SERVER> section.  If the number of requested jobs exceeds this value, Ninf-G Client invokes the new Invoke Server and request the Invoke Server to manage the new jobs.

    max_jobs [number of maximum jobs]

A.6. How to specify the path of the Invoke Server
If Invoke Server is not located in pre-defined directory, path attribute in <INVOKE_SERVER> can be used to specify the path of the Invoke Server.

    path [path of the Invoke Server]

Appendix B. Miscellaneous Information

B.1. Job Timeout
Job Timeout is managed by Ninf-G Client.  Invoke Server is not responsible for the timeout.

B.2. Redirect stdout/stderr is implemented using files.
- Ninf-G Client passes the filename to Invoke Server as an attribute for JOB_CREATE request.
- Invoke Server output stdout/stderr of the Ninf-G Executable to the file.
- Ninf-G Client output the contents of the file to stdout/stderr.

