Integrated load balancer description

Because 'node.js' is a single threaded server, several instances of 'node.js' have to be started to optimize multiprocessor machine capacities. Also, a cluster of machines can be implemented for larger customer sites that require a load balancing mechanism to distribute requests to the different 'node.js' instances available on these machines.

How the Load balancer works

All 'node.js' instances running on the same physical server are started and stopped by a nanny process. This process restarts a node.js instance when it crashes. Each nanny process can also serve as a load balancer. The load balancer is just the nanny process that is assigned for outside requests.

Distribution of the first requests to 'node.js' instances follows a round-robin mechanism based on the number of sessions in the session table running on the 'node.js' instance. This means that for all 'node.js' instances, the list of instances, in some fixed order, that have the least number of sessions is considered, and the first 'node.js' instance in this list gets the new session. This protocol distributes the new sessions equally to the 'node.js' instances. Instances on the same machine have priority over others because requests can be handled faster. All requests of the same session are executed on the same 'node.js' instance.

The requests are transparently sent to the server where they are executed. This allows for bookmarking links.

An entity named "host" includes all information about the installed servers in the cluster. Refer to the description in Host entity for details.

Nanny process functionality

Via command line

  • Installation (usually during setup): Add an entry to the "host" entity for the local host: Sample command line node nanny install 8000 2 for 2 child processes. This does not start the nanny process, but just performs the database operation. This step is usually done by the Syracuse setup. From AVENGER or AMBASSADOR, it is possible to add another optional parameter for the number of web service child processes, example node nanny install 8000 2 3 for 2 child processes and 3 web services processes.
  • Run (usually triggered by a Windows or Linux service): Read the host entity and start the associated 'node.js' instances and inform all other nanny processes that this process has been started. This is also used to exchange the version number of the code in order to ensure that each cluster node has the same code version. This information tells the load balancer that it is now possible to use 'node.js' processes on this host. Sample command line node nanny.
  • Stop (usually triggered by a Windows or Linux service): Inform the nanny processes that this nanny process should be stopped and can no longer be used for load balancing. This shuts down the nanny process of the local host. Sample command line node nanny stop.
  • Remove: Remove the corresponding entry in the Host table. Sample command line node nanny remove.

When a nanny process starts, it notifies the other started and active nannies that it has started. It receives their versions and the main version. When the local nanny process has the same version as the main version, its child process is started. When the child processes are all started and are ready to accept requests, the nanny process notifies the other nannies again.

Special setting in nodelocal.js

When load balancing should be only local (that is, only to Syracuse processes on the local server), in 'nodelocal.js' the following must be set (all other settings are not shown):

exports.config = {
  hosting: {
    localBalancer: true
  }
}

This is useful when an external load balancer is used that distributes requests to servers. But that external load balancer must also be able to send all requests for the same session to the same server.

Via http

When you append the path /nannyCommand/info to the URL, you can find out the status of all nanny processes in the cluster without having to start a session. Sample URL: http://server:8000/nannyCommand/info. The output is rather technical and useful mainly for error detection. Whenever possible, look directly at the host entity that displays the data in a much more user-friendly way.

Here are some commented sample outputs (comments will start with '#') - real data might not contain all of these items.

hostname: 'VIL-004674-NB',   # host name of that server, as returned by hostname command, is key in host entity
    connectionData:          # overview of all ports on which the nanny can listen
     [ { port: 8111,         # port number
         active: true,       # nanny will listen on that port?
         ssl: false,         # SSL connection?
         clientAuth: false,  # SSL with client authentication?
         serverCert: null,   # reference to instance of server certificate - not readable here
         clientCert: null,   # reference to instance of client certificate - not readable here
         host: [Object],     # for internal use
         _uuid: 'ffb9be61-278c-467c-a3c4-019d52e1d484' } ], # for interhal use
    children: 1,             # number of child processes
    wsChildren: 0,           # number of child processes for web service execution
    deactivated: false,      # server has explicitly been deactivated
    started: true,           # server has been started (using node nanny) and not yet stopped (using node nanny stop)
    respawnCount: 10,        # number of attempts to restart child processes before giving up
    respawnTime: 120,        # maximum time in seconds for start attempts for child process
    returnRequestTimeout: 30,  # number of seconds that a Syracuse process waits for an internal answer of the nanny process
    tcpHostName: 'vil-004674-nb', # TCP name or IP address (obtained from server certificate) for HTTP requests to that server
    pid: 3376,               # process ID
    status: -5,              # status (see explanation in host entity): 5: finish all, 4: finishing, 3: OK, 2: starting, 1: init, 0: inactive, -1: low version, -2: wrong version, -3: time difference, -4: respawn limit, -5: unknown, -6: unreachable, -7: not started, -8: no database, -9: no license
    challenge: 'adcfe27d-db5c-43fe-889f-94e94ea9e3e0', # for internal use
    pendingRequest: true,     # for internal use
    dhKey: '***',             # secure connection established to that host: certificate transfer possible
    version: '2.999.2.42-0',  # internal version number
    local: true,              # request has been executed on that server
    missingCert: [],          # missing certificates on that server
    missingCA: [],            # missing CA certificates on that server
    untrusted: [ 'VIL-SRV-DX3-2' ]  # hosts to which no secure connection has been established

Example: http://PO027493:8124/nannyCommand/info
PO027493:
[ { hostname: 'PO027493',
 connectionData: 
 [ { port: 8124,
 active: true,
 ssl: false,
 clientAuth: false,
 serverCert: null,
 clientCert: null,
 host: [Object],
 _uuid: 'b4b99912-8899-4d1d-9efb-467b7d419962' } ],
 children: 2,
 wsChildren: 1,
 deactivated: false,
 started: true,
 respawnCount: 10,
 respawnTime: 120,
 returnRequestTimeout: 30,
 tcpHostName: 'po027493',
 pid: 16980,
 local: true,
 status: 3,
 version: '2.999.1617-0',
 missingCert: [ 'my' ],
 missingCA: [],
 untrusted: [],
 pendingRequest: false } ]
N0: 0/0 0;0.84024|0, N1: 0/0 0;0.83999|1, W0: 0/0

When you append the path /nannyCommand/notifyNannies/details to the URL, you can find out the status of all nodes in the cluster without having to start a session. Sample URL: http://server:8000/nannyCommand/notifyNannies/details.
PO027493:
[{"hostname":"PO027493","version":"2.999.1617-0","status":3,"missingCert":[],"loaddata0":["0.83968|0","0.83968|1.37742"],"missingCA":["ca"],"untrusted":["PO027493"]}]
{"hostname":"PO027493","port":"N1","upTime":{"d":0,"h":0,"m":4,"s":29},"message":"Client 80 - C--;\n26 - S--;\n"}
{"hostname":"PO027493","port":"N0","upTime":{"d":0,"h":0,"m":4,"s":43},"message":"Client 9 - C--;\n"}
{"hostname":"PO027493","port":"W0","upTime":{"d":0,"h":0,"m":4,"s":29},"message":"Client 2 - C--;\n"}

version

Syracuse internal version number

status

=> ex: status: 3,

5: finish all,

4: finishing,

3: OK,

2: starting,

1: init,

0: inactive,

-1: low version,

-2: wrong version,

-3: time difference,

-4: respawn limit: Respawn limit exceeded. Applications cannot be started

-5: unknown,

-6: unreachable,

-7: not started,

-8: no database,

-9: no license

upTime

Time during the Node is in operation: {"days", "hours", "minutes", "secondes" }