Skip to main content
Department of Information Technology

Software-as-a-Service for Analysis of Quantitative Trait Loci

Sub-project of: Computational Genomics

Participants

Summary

We have developed QTL as a Service (QTLaaS) using PruneDIRECT algorithm. QTLaaS automatically deploys an R cluster for using PruneDIRECT, or any statistical analysis in R, over your desired infrastructure.

cloud3.jpg

First the user should install Ansible. Then Ansible node will handle the orchestration of computational environment using our code. The codes for automatically enabling our architecture on any infrastructure is available via:
https://github.com/QTLaaS/QTLaaS

Three files are required for this method: ansible_install.sh, setup_var.yml,spark_deployment.yml

  1. Install Ansible using the bash script in the file: ansible_install.sh. Configure Ansible hosts.
  2. Modify the environment variables available in the file: setup_var.yml, if needed.
  3. For setup deployment run: spark_deployment.yml as root which is the actual file that contains all the installation restructures for all the components of our architecture. For example # ansible-playbook -s spark_deployment.yml, where -s is the sudo flag.

We will soon provide a demo through this webpage using the SNIC cloud resources. Any user can try QTLaaS over a few nodes in our cloud setting. For larger computation, one can download QTLaaS from the github repository and it automatically deploys the desired number of nodes over an infrastructure.

Setup details

  1. Setup up at least 3 nodes, one for the Ansible Master, one for the Spark Master, and at least one for the Spark Worker.
  2. Install Ansible using the bash script in the file: ansible_install.sh.
  3. Add the IP-address/hostnames of Spark Master and Spark Worker to /etc/hosts in Ansible Master node.
  4. Generate a key and copy its public part to ~/.ssh/authorized_keys in all the Spark nodes.
  5. Edit /etc/ansible/hosts using example-hosts-file available in the reprository. (Add [sparkmaster] followed by the name of sparkmaster node in the next line. Add [sparkworker] followed by the names of sparkworkers in the next lines, one per line).
  6. Modify the environment variables available in the file: setup_var.yml, if needed.
  7. Run ansible-playbook -s spark_deployment.yml, where -s is the sudo flag.
  8. Make sure the following ports are open on Spark Master node, 60060 for Jupyter Hub, 7077 Spark Context, 8080 Spark Web UI.
  9. Now you can access following services: http://sparkmaster:60060 http://sparkmaster:8080
  10. Use example-sparkR file to make sure your setup is working.

After all the steps above, Jupiter, Spark Master and R will be installed in Spark Master, and Spark Worker and R is installed in all Spark Workers.

How to add nodes

In order to add new nodes to your already configured cluster, you should just add the new hosts to the Ansible hosts file, under [sparkworker] tag.

Demo Service

You can access a demo Service of QTLaaS from the following url:
http://130.238.28.241:60060/

This demo is only for testing purposes with limited amount of resources.

Updated  2017-02-05 11:35:37 by Kurt Otto.