The Refinery Platform documentation is split up into three major sections: one for users, one for administrators and one for developers.
This section of the Refinery Platform documentation is intended for users of the web application.
This section of the Refinery Platform documentation is intended for administrator of a Refinery instance.
Refinery requires Python 2.7.3. For package dependencies please see requirements.txt. Requirements can be installed using pip as follows:
> pip install -U -r requirements.txt
You might need to install NumPy manually before running the above command (you can find the version in requirements.txt). For example:
> pip install numpy==1.7.0
We highly recommended to create a virtualenv for your Refinery installation.
Refinery uses Solr for searching and faceted browsing.
We recommend to run Solr using the bundled Jetty webserver. The Solr example configuration included in the standard download is sufficient and should be run like this:
cd <solr-download-directory>
java -Dsolr.solr.home=<refinery-installation-directory>/solr/ -jar start.jar > <path-to-solr-log-file> 2>&1 &
By default Jetty will allow connections to Solr from any IP address. This is not secure and not required to run Refinery. We recommend to allow connections to Solr only from localhost. Note that this requires Solr to run on the same host as Refinery. If Solr should run on another host change the IP address used below accordingly.
To configure Jetty to only accept connections from localhost do the following:
Go to <solr-download-directory>/etc.
Open jetty.xml.
Locate <Call name="addConnector"> in jetty.xml. Be aware that the default jetty.xml file contains an addConnector block that is commented out.
Supply a default value of “127.0.0.1” for the jetty.host system property used to configure Host as follows:
<Set name="Host"><SystemProperty name="jetty.host" default="127.0.0.1"/></Set>
Make sure that the jetty.host system variable is not set.
Restart Jetty using the command shown above.
In the settings_local.py of your Refinery installation configure REFINERY_SOLR_BASE_URL as follows:
REFINERY_SOLR_BASE_URL = "http://localhost:8983/solr/"
Restart the WSGI server running Refinery to reload your settings.
This is the preferred message broker for the Celery distributed task queue. Refinery uses Celery and RabbitMQ to handle long-running tasks.
Refinery settings are configured in settings_local.py.
Note
You should never edit the settings directly in settings.py to avoid conflicts when upgrading.
DATABASES
EMAIL_HOST = 'localhost'
EMAIL_PORT = 25
DEFAULT_FROM_EMAIL = 'webmaster@localhost'
ISA_TAB_DIR = ''
Example for user authentication via LDAP using django-auth-ldap:
from django_auth_ldap.config import LDAPSearch
# Baseline configuration
AUTH_LDAP_SERVER_URI = "ldap://ldap.example.com"
AUTH_LDAP_BIND_DN = ""
AUTH_LDAP_BIND_PASSWORD = ""
AUTH_LDAP_USER_SEARCH = LDAPSearch("OU=Domain Users,DC=rc,DC=Domain",
ldap.SCOPE_SUBTREE, "(uid=%(user)s)")
# Populate Django user from the LDAP directory.
AUTH_LDAP_USER_ATTR_MAP = {
"first_name": "givenName",
"last_name": "sn",
"email": "mail"
}
settings.AUTHENTICATION_BACKENDS += (
'refinery.core.models.RefineryLDAPBackend',
)
Refinery is a Django-based web application implemented in Python and JavaScript. For a full list of all external dependencies please see Dependencies.
The easiest method to install Refinery is to follow instructions in the README file. Note: Installation process will fail if any of the ports forwarded from the VM are in use on the host machine (please see Vagrantfile for the list of ports). After installation has finished you will need to create a Django superuser:
> python manage.py createsuperuser
To instantiate administrator-modifiable content on the Refinery website, e.g., the contents of the “About” page, load the default content into the database:
> python manage.py loaddata core/fixtures/default-pages.json
The source code for Refinery can be downloaded from the Github repository either by cloning the repository or by downloading a zip archive.
Before Refinery can be installed a number of variables - so called “settings” - have to be configured. In addition to the settings discussed here, please also see the complete list of all Refinery Settings that can be customized.
Galaxy is required to run analyses in Refinery and to provide support for archiving.
Refinery running in the VM can access Galaxy instance running on the host at http://192.168.50.1:8080
On the host you will need to:
To import a Galaxy workflow into Refinery, you first have to annotated the workflow. The amount of annotation required is minimal and you can conveniently add the annotation for the workflow in the Galaxy workflow editor.
In a nutshell, you have to provide simple Python dictionaries (see examples below if you are not familiar with Python) in the “annotation” text fields for the workflow and corresponding tools. These fields can be found on the right side of the workflow editor.
Annotation fields must either be empty of contain correctly formatted annotation dictionaries as described below. If other information is found in an annotation field, you will not be able to import the workflow into Refinery.
For Refinery to recognize a Galaxy workflow as a Refinery Workflow, you need to provide a set of simple annotations in the workflow annotation field in the Galaxy workflow editor. The annotation field is listed under “Edit Attributes” on the right side of the workflow editor.
Note
The annotation fields in the Galaxy workflow editor behave slightly differently for workflow-level and tool-level annotations. In order to confirm changes to a workflow-level annotation, move the cursor to the end of the input field and hit the Return key. This is not required in tool-level annotation fields. Be sure to save the workflow after editing an annotation field.
The workflow-level annotation is a Python dictionary with the following keys:
Optional | This field is used to describe relationships between inputs of the Workflow. For example, a Workflow that performs peak-calling on ChIP-seq data, requires that each ChIP file is associated with one input file (= genomic background). Such relationships are described using dictionary with three fields:
Schematic tool annotation (indentation only for better readability):
{
"refinery_type": "<workflow_type>",
"refinery_relationships": [
{
"category": "<relationship_type>",
"set1": "<name_of_input_1>",
"set2": "<name_of_input_2>"
}
]
}
A standard analysis workflow with a single input would be annotated as follows:
{
"refinery_type": "analysis"
}
A download workflow would be annotated like this:
{
"refinery_type": "download"
}
A more complex analysis workflow with two inputs and a 1-1 relationship between two inputs named “ChIP file” and “input file” would be annotated as follows: (the name fields of the two input datasets are set to “left input file” and “right input file”, respectively)
{
"refinery_type": "analysis",
"refinery_relationships": [
{
"category": "1-1",
"set1": "ChIP file",
"set2": "input file"
}
]
}
In order to import output files generated a tool in the workflow into Refinery, the tool has to be annotated. To access the annotation field for a tool, click on the tool representation in the workflow editor. The annotation field is named “Annotation / Notes”.
Note
You have to annotate at least one tool and one output file. Workflows that do not declare outputs for import into Refinery will not be imported.
Like in workflow-level annotations, the annotation needs to be provided as a Python dictionary. In order to import output files of the tool back into Refinery, the tool-level annotation dictionary needs to contain a key that is the same as the output declared by the tool, for example "output_file".
This key must be associated with a further dictionary that provides a name, that will be used to import the file into Refinery. Optionally, a description can be provided to further explain the content of the output file, as well as a file type, if the file extension provided by Galaxy is not sufficient to detect the actual file type automatically. This is typically the case when Galaxy uses “data” as the file extension.
Schematic tool annotation (indentation only for better readability)
{
"<tool_output_1>": {
"name": "<filename_1>",
"description": "<description_1>",
"type": "<extension_1>"
},
"<tool_output_2>": {
"name": "<filename_2>",
"description": "<description_2>",
"type": "<extension_2>"
}
}
The following example use indentation for better readability. Indentation is not required.
{
"output_narrow_peak": {
"name": "spp_narrow_peak",
"description": "",
"type": "bed"
},
"output_region_peak": {
"name": "spp_region_peak",
"description": "",
"type": "bed"
},
"output_plot_file": {
"name": "spp_plot_file",
"description": "",
"type": "pdf"
}
}
Before you can import Workflows from a Galaxy installation into Refinery, the following requirements have to be met:
You have to add a Galaxy Instance for the Galaxy installation in question to Refinery through the admin UI.
You have to create a Workflow Engine for this Galaxy Instance using the create_workflowengine command, which requires a Galaxy Instance id and the name of a group that should own the workflow engine, e.g. “Public”.
>>> python manage.py create_workflowengine <instance_id> "<group_name>"
Alternatively, you can also create a workflow engine through the admin UI, in that case, however, you have to manually assign ownership to the managers of the group that should own the workflow engine.
You have to annotate all workflows in the Galaxy installation that you want to import.
Once these requirements have been met, run the import_workflows command:
>>> python manage.py import_workflows
This command will attempt to import Workflows from all Workflow Engines registered in your Refinery server. All Galaxy workflows that are annotated as Refinery Workflows will be parsed and imported if annotated correctly. Annotation errors will be reported, as well as the total number of Workflows imported from each Workflow Engine.
Existing Workflows in your Refinery server will be deactivated but not deleted. Deactivated workflows can no longer be executed but their information can be accessed through the Analyses in which they were run.
Adding a genome build to refinery is a two-step process: first, you need to add in the taxon information for your organism, and then you create the associated genome build. These are both done through the admin interface.
Before logging into the admin interface however, we need to the taxonomy information for our organism, so go to NCBI’s taxonomy browser (http://www.ncbi.nlm.nih.gov/taxonomy) and search for your organism. You should eventually end up on a page that looks something like this:
Keep the page open and go to the refinery admin interface. After logging in, navigate to the Annotation Server and click on Taxons, then click on Add taxon. From there, you will be brought to a form with four fields:
The NCBI taxonomy page in your browser will help you fill in all of the values. See the picture below to know what information on the page goes where. Please note that you need to create a new entry for every name that you use. So in our example below, if you wished to put all of these names in the database, you would create an entry for Homo sapiens, human, man, and Homo sapiens Linnaeus, 1758 in addition to any other names you might wish to create (e.g. H. sapiens).
In the above image, the important pieces of information have been highlighted in colored boxes. Below are two examples.
Because the scientific name has no type associated with it, please annotate the the type field with “scientific name.” This is the official type designated by NCBI.
Even though H. sapiens is not on the taxonomy page, because many people use a species’ abbreviated name when annotating their data, fill out the form accordingly.
Now that the taxon information has been filled in for your organism, you can input the information for the genome build you’d like to support. Click Annotation server again in the admin interface and this time click Genome builds. Fill in the fields accordingly to the best of your knowledge, making sure to have the species point to the taxon that uses the full scientific name. Below are two examples. Please make sure only one genome build for each organism is selected as the default.
Please note that while it is not required that you fill in a UCSC equivalent for any non-UCSC genome builds provided, we are currently considering the UCSC genome builds to be the standard, so we’d prefer that it exist.
This section of the Refinery Platform documentation is intended for developers who are contributing to the Refinery core and extensions.
The source code for the Refinery Platform is available in the repository.
This section of the Refinery Platform documentation describes setting up Eclipse for Refinery development.
Main Module:
${workspace_loc:refinery-platform}/${DJANGO_MANAGE_LOCATION}
Program arguments:
runserver --noreload
Working directory:
${workspace_loc:}
Make sure to use the SSH repository URL (instead of HTTPS) if you want to push code to Github without entering username and password.
> git remote set-url origin git@github.com:parklab/refinery-platform.git
The Refinery Platform license is very similar to the MIT License but contains an additional clause the prohibits the use of the names of the copyright holders in most circumstances.
Copyright (c) 2011-2013 The President and Fellows of Harvard College.
All rights reserved. Copyright (c) 2011-2013 Boston Children's
Hospital. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
Except as contained in this notice, the name of Harvard University
and Boston Children's Hospital or any affiliate shall not be used in
advertising, publicity, news release or otherwise to promote the
sale, use or other dealings in this Software without prior written
authorization by Harvard University and Boston Children's Hospital.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.