To facilitate faculty and student research, the College operates two computing clusters for different usages:
- A general-purpose computing cluster that can distribute multiple copies of an application over many computing cores, to run:
- independently with different parameters, or
- correlated with messages passed between themselves
- A Hadoop cluster designed for big data distributed over many computing cores and processed using one of the frameworks MapReduce or Spark.
Such distributed computing is related to the ideas of parallel computing (which could occur within a single computer with multiple CPUs) and grid computing (describing a large number of mostly independent computers that cooperate on a project).
Features and Benefits
To request an account on the Computing Clusters, please contact Andy Anderson. He can also assist with the implementation of your project.
The General-Purpose Cluster
The general-purpose Cluster is accessible over the Internet through its head node, using either Remote Desktop Connection, X11 Connection, or Secure Shell (ssh – standard with the Mac Terminal, or putty for Windows). The head node is the only machine from which you should develop software and submit and control jobs.
Most traditional Cluster users use the Condor system to define how each instance of the software should be run (called a job), distribute the jobs automatically to the available nodes, and ensure sharing of the computing resources amongst all of the users.
There may be some situations where you won't want to use Condor, e.g. when using Mathematica's built-in parallel computing features.
More information and some examples can be found in the Knowledge Base.
A complete example using Condor and a problem written using the Python programming language can be found here.
The Hadoop Cluster
The Hadoop Cluster is accessible by several means:
There is more information on the HUE page about various tools as well as direct database connections with ODBC.