Managing your tools
Tools and pipelines play a crucial role in the field of bioinformatics, enabling scientists to extract valuable insights and unravel underlying mechanisms from vast amounts of data. With thousands of tools available, applying them presents various challenges. Let's explore some of these challenges:
- Tools are often developed in different programming languages and environments, requiring specific dependencies. This can make the installation and smooth execution of tools a complex task.
- While workflow and pipeline orchestration tools assist in building pipelines, it can still be challenging, especially for nonprofessionals, to construct and execute pipelines effectively.
- Managing in-house tools locally poses difficulties in terms of accessibility, reproducibility, and maintenance.
Bioinfopipe offers effective solutions to overcome these challenges:
- In Bioinfopipe, each native tool or environment is packaged within an ECR image, ensuring reliable and consistent execution. Leveraging Docker container technology, native tools are wrapped into tool objects that can be easily configured with a user-friendly form interface. This allows users to directly apply the tools within the analysis session's GUI environment, streamlining the tool usage experience.
- When setting up an analysis job, users can swiftly create instant pipelines by linking the outputs and inputs of applied tools, supported by the pipeline flowchart. Under the hood, Bioinfopipe generates a Nextflow pipeline script for the analysis job, utilizing AWS Batch as the executor. These pipelines can be reused for future jobs and even marked as favourites for quick access.
- With Bioinfopipe, the management of in-house tools/scripts becomes significantly more manageable. Once your in-house tools are integrated into Bioinfopipe, team members can readily incorporate them into their pipelines and collectively maintain them, alleviating the challenges associated with local tool management.
By leveraging the capabilities of Bioinfopipe, users can overcome the complexities of tool installation, pipeline construction, and in-house tool management, streamlining their bioinformatics workflows and enhancing collaboration among team members.
In Bioinfopipe, tool objects are categorized into two types based on code locations: image-tool and script-tool.
Image-tool: These tool objects contain both the executable and code packed within the Docker image. Image-tools are suitable for cases where the tool's functionality is encapsulated within the image itself. When using image-tools, the entire environment required for tool execution is packaged, ensuring consistent and reliable performance.
Script-tool: On the other hand, script-tools separate the tool's scripts from their running environments. This approach is advantageous when you need to create numerous scripts that run in similar environments. With script-tools, you gain more flexibility in modifying a tool's functionality. You can simply update the script code without the need to create new environments. This not only saves ECR space but also allows for efficient management and modification of tool functionalities.
By offering both image-tools and script-tools, Bioinfopipe accommodates different scenarios and provides users with options to effectively manage and utilize tools in their bioinformatics workflows.
1. Browsing tools
Open tool browser interface (Menu bar -> Tools -> Browse tools), you can view list of tools in the tab 'Standalone tools' in 'Cards' view by default. You can view them in 'Table' view by clicking 'Table' label on the view switch button.
In 'Cards' view, each card shows tool's name, version in the head of card, and description and its categories in the card body. In 'Table' view there shows 4 columns: Name, Version, Created, and Categories (described below). Yo can view the tool details by clicking the 'View details' icon button, the properties are descried as follows:
ID : Tool ID assigned for users.
Name : Tool name.
Version : Tool version.
Categories : List of categories assigned to the tool.
Container image : The corresponding container image attached to the tool.
Description : Brief description of the tool.
Number of parameters : The number of parameters created for the tool.
Input files : List of tags for input file maps.
Output files : List of tags for output file maps.
Created at : The time when the tool created.
Updated at : The time when the tool last updated.
Created by : The username who created the tool.
Package : The package the tool belongs to.
License : The software license type assigned by tool's authors.
Owner : The owner of the wrapped tool object.
Shared : Indicating if the tool being shared or not.
You can access a tool's help documentation by clicking the 'Read more' link in the tool card or the 'View help' icon button in the table view. Additionally, if you own a tool, you can open its configuration page by clicking the 'Config' icon button. You can also mark a tool as a favourite by clicking the 'Set favourite' icon button.
Tab 'Standalone tools' shows all standalone tools which are individual tools manually wrapped into Bioinfopipe and can be used to create instant pipelines with other tools. Tab 'nf-core pipelines' shows all nf-core or nf-core compatible pipelines which only can be applied as independent pipeline and cannot be used to link to other tools.
Tools can be organized into collections. By navigating to the 'Collections' tab, you can see a list of available collections. In the table view, you can click the 'View tools' icon button to see the list of tools/pipelines included in a collection. This will open a sub-table displaying the tools/pipelines below the collection row. From there, you can view the tool/pipeline details and access their configuration pages if you are the owner.
Each page displays 25 items by default, and you can navigate through the pages using the 'First', 'Previous', 'Next', and 'Last' icon buttons. By default, only public items maintained by Bioinfopipe are shown. However, you can choose to view your own tools or tools shared with you by selecting the 'Owner <id>' or 'Shared with me' links from the dropdown button. To quickly find specific tools, you can use the search functionality by entering the tool name in the 'Search...' input box. Additionally, you can filter tools by public and private categories. When a category is selected, its category label will appear above the main button row, and the filtered tools belonging to that category or its descendant categories will be displayed. If multiple categories are selected, the intersection of tools between those categories will be shown. You can remove a category filter by clicking the cross icon button within the category label, and only the tools matching the remaining category labels will be displayed.
1.1. Creating a collection
To create a new collection, just click 'Create collection' button in tool browser, it will open a form titled 'Create a new collection'. The fields are explained as follows:
Name : Specify a name for the collection.
Order: Set a integer number for the order by which the collections show. (default=0)
Description : Give a brief introduction of the collection under 530 characters which will be shown in collection cards.
Invisible : Tick it if you don't want your collection shown up in collection list in tool browser.
Click 'Create' button, the collection object will be created. You will find the collection in tab 'Collections'. You can always edit the collection by clicking the 'Edit collection' icon button. You can delete the collection by clicking the 'Delete collection' icon button.
1.2. Sharing tools and collections
If you are a Org user, you can share your tools, pipelines and collections to other users in your organisation.
To share a tool/pipeline/collection, just click the 'Share' icon button, it will open a modal popup titled 'Share a tool/pipeline/collection' which has following fields:
Share to : The users or teams you would like to share to, you can put user email addresses or team IDs, all separated by ';'.
Share group : Or you can select a predefined sharing group in your account admin, which will override 'Share to' field.
Days for sharing : You can set sharing time span by days, it is 30 days by default.
2. Importing and configuring a nf-core pipeline
nf-core is a pipeline framework developed by the Nextflow community. It enables users to create complex analysis pipelines with predefined parameters and configurations. With Bioinfopipe, users can import and automatically wrap nf-core pipelines as separate tools. This allows users to apply the nf-core framework to create custom pipelines hosted on GitHub, which can then be easily imported into Bioinfopipe for use.
3.1. Importing a nf-core pipeline
To import a nf-core pipeline, just click 'Import nf-core' button in tool browser, it will open a form page titled 'Import nf-core pipeline'. The form are divided into 3 sub-forms: 'General settings', 'Pipeline settings', and 'Describe pipeline'. The fields are explained as follows:
Name : Specify a name for the pipeline, only space and hyphen allowed, connecting words with underlines, e.g. 'Pipeline_build'.
Version : Specify pipeline's version, which is shown in tag list in its Github repository.
Link : The pipeline's Github reposity URL for the specified version.
Command : The pipeline's executable command, e.g. "nf-core/rnaseq".
Samplesheet columns : The columns in the main data input file, 'samplesheet.csv', are separated by commas. If left blank, it will check for the 'samplesheet.csv' file in the 'assets' folder of the pipeline and extract its columns.
Basic parameter groups : Specify the parameter groups that will be set as basic parameters, which will be displayed in the 'Basic settings' section of the tool form interface. If left blank, the top 2 definitions from the 'nextflow_schema.json' file will be used as basic parameter groups. Alternatively, you can manually select a few definitions, separated by commas.
Description : Give a brief introduction of the pipeline under 530 characters which will be shown in its tool card in tool browser.
Click 'Import' button, the pipeline object will be created and all its parameters and its doc will be created, and then it will open a tool configuration page where you can view the parameters, and input/output file maps.
You can always edit the tool details by clicking the 'Edit tool' button, where you can add relevant categories for the pipeline. The samplesheet columns will be stored in the field 'Argument' in the parameter 'input', where you can specify properties for the columns, such as whether a column is required (which is the default) or optional, or if a column represents a file. To add properties, simply include attributes separated by commas after the column name, followed by ' : ', for example, 'column: file, optional'.
You can set specific parameter values which are different from default values in the field 'Set value' for parameters.
3. Handling tool categories
Categorizing tools is an effective method for organizing and managing a large number of tools, allowing users to quickly access tools using category filters. Bioinfopipe offers public categories that users can directly assign to their tools.
To view tool categories, simply click the "Tool categories" button in the tool browser. This action will open a page titled "Manage tool categories" where you can browse through public and private categories. If you are a Pro/Org user, you also have the ability to create your own categories for in-house tools. This feature enhances the organization and accessibility of tools within the Bioinfopipe platform.
Tool categories can be viewed in 'Table' or 'Tree' view by clicking the switch button. The columns in the table view are described as follows:
Numbering : Indicating the level of the tool category in the tree structure.
Name : The name of the tool category.
Parent : The parent name of the tool category.
Created at : The time when the tool category created.
Category : Indicating if the tool category is category type or not; here category type means it can only be used to group a set of related tool categories, and can't not be used as tool category.
By clicking the 'View details' icon button, you can view the more properties of selected file type, apart from above properties the other properties are:
ID : Pseudo ID of the tool category for a user.
Order : The order under its parent.
Description : A brief description for this tool category.
Invisible : Indicating if allow this tool category to be shown in modal popups.
Owner : The owner of this tool category.
3.1. Creating a root category
Before creating your own tool categories, you need to create root categories for holding the sub tool categories. For example, in public categories, there are 4 root categories which are 'DataType', 'Functionality', 'Omics' and 'BasicTool'.
To create a root category, clicking the button 'Create root category' which will pop up a form which contains following fields:
Name : The name of the root category.
Parent : The parent name of the root category which is fixed as your user ID.
Description : Put a brief description for the root category.
Order : The order of the root category, it is 0 by default which means it will automatically be set as the current largest order number plus one.
Category : It is fixed as category.
Invisible : Indicating if allowing this root category to be shown in the application for selecting.
You can edit root categories by clicking corresponding 'Edit' icon buttons. You can delete already created root categories by 'Delete' icon buttons.
3.2. Creating a child category
To create a child tool category for a root category or parent tool category, just clicking the corresponding 'Add child' icon button, it will pop up a modal form titled 'Add child category', which contains following fields:
Name : The name of the child tool category.
Parent : The parent name of the child tool category which is fixed as the selected parent tool category.
Description : Put a brief description for the child tool category.
Order : The order of the child tool category, it is 0 by default which means it will automatically be set as the current largest order number plus one.
Category : Check it if you want the tool category be a category type.
Invisible : Indicating if allowing this file type to be shown in the application for selecting.
You can edit root categories by clicking corresponding 'Edit' icon buttons. You can delete already created root categories by 'Delete' icon buttons.
4. Creating and configuring a tool
Bioinfopipe serves as a tool wrapper, enabling the wrapping of native tools into Bioinfopipe tool objects (referred to as tool objects or tool wrappers) for easy application.
Pro/Org users have the privilege to create their own tool objects within Bioinfopipe, which can be directly applied in analysis jobs. This feature allows you to establish your own set of in-house tools in your core facility or team, which can be shared with others. Consequently, your in-house tools become more manageable, accessible, productive, and reproducible. For example, you can create multiple versions of tools and document them, enabling collective maintenance.
When configuring your tools, you can set various parameters that will be presented as form fields in the analysis job. This simplifies the tool's usability for your users. Additionally, you have the option to use a command-line style to set tool parameters. If you prefer working with the command-line interface, you can focus on configuring input/output parameters and leave other parameters to be set within the command-line box.
Once you have created and configured your tool, you can create a tool document for it. This allows users to access the relevant sections of the help documentation directly by clicking the "Help doc" icon buttons positioned alongside parameter input boxes, tool titles, or outputs. This streamlined approach enhances the user experience and facilitates easy access to comprehensive tool documentation.
2.1. Creating a tool
To create a new tool object for wrapping a native tool, just click 'Create tool' button in tool browser, it will open a form page titled 'Create a new tool'. The form are divided into 5 sub-forms: 'General settings', 'Tool settings', 'Describe tool', 'Set parameter groups' and 'Other settings'. The fields are explained as follows:
Name : Specify a name for the tool, only space and hyphen allowed, connecting words with underlines, e.g. 'Bowtie2_build'.
Version : Specify tool's version, better use semantic versioning e.g. 2.3.4
Category : Specify one or multiple categories related to the tool, so the tool can be categorised for easy access.
Package : You can organise your tools into a package you created. For instance, assign all Samtools's all sub-command tools into a package 'Samtools'.
Container image : Select a container image from tool repositories. It can be image containing the installed tool or a environment with installed packages for the script to run.
Docker image URL : If you use a public Docker image repository, you can directly specify the Docker image URL. The container image will be used if provided; otherwise, it will look for the Docker image URL.
Command : The tool's executable command and sub-command if need, e.g. 'samtools mpileup'. Leave it blank if it is a script-tool.
Description : Give a brief introduction of the tool under 530 characters which will be shown in its tool card in tool browser.
Parameter group : Define the group names of parameters and group order, one group per line. Always following the order: 'Input' group, 'Output' group, and other groups.
Created by : Put author names who contributed to the tool.
License : The tool's license type, mainly is about free software license types, such as GPL-Compatible Free Software Licenses, MIT License.
Invisible : Tick it if you don't want your tool shown up in tool browser and in analysis session.
Click 'Create' button, the tool object will be created and it will open a tool configuration page where you can configure parameters, add input/output file maps, add validation rules, and also put your script code if it is a script-tool running on an environment. You can always edit the tool details by clicking the 'Edit tool' button.
Note: For tools with sub-commands, e.g. Samtools, treat each sub-command as individual tool, and put its command as 'Main_command sub-command' as they run in command-line. The name of tool can be '<toolname>_<sub-command>'.
2.2. Creating a parameter
Once a tool object is created, configuring parameters is necessary to create a user-friendly setting form for users to easily apply the tool in an analysis job. It is not mandatory to configure all parameters from the native tool. Instead, focus on frequently used parameters such as input parameters, output parameters, and other useful parameters. You can just configure minimal parameters for quick configuration, for instance just configuring inputs/outputs parameters and leave others in command-line settings box in analysis jobs.
To create a parameter, just clicking 'Create parameter' button in the tool configuration page, it will open a form page titled 'Create a new parameter' which has 5 sub-forms: 'General settings', 'Set command-line', 'Set arguments', 'Help information' and 'Other settings'. The fields and how to set are described as follows:
Tag name : Specify a short name which must be unique under the tool, e.g. just use the flag name (no hyphen) if it is unique. This tag can be used in validation rule to refer the parameter.
Fullname : A descriptive name under 35 characters, which will be shown in setting form in analysis jobs.
Level : Specify the level of use frequency for this parameter. You can put common parameters in 'Basic' level and other parameters in 'Advanced' level. It is best practice that classify the parameters into 3 classes: basic ones, advanced ones, and those go to command-line.
Group : Assign a parameter group for this parameter which will be shown within the group (field set) block in setting form in analysis jobs. You can edit the group by clicking 'Edit group' icon button, one group per line and the order of groups will be the order in setting form.
Group position : A number determining the order of the parameter in the group. Leave it '0' which will automatically put a order number as last one. Or insert at a position then the parameters with the same or larger positions will shift afterwards. And always put parameters in the order from highly used to less used in a group, especially for the 'Input' group since it will be used as naming order for output files of batch run.
Command position : Specify the parameter position used in command-line. It is 2 by default. For example, if parameters p1 must be in the head and p2 must be in the end as p1 ... p2, then you set p1's position as 1, p2's positon as 3, and leave other parameters' position as 2.
Flag : The parameter flag used in command-line. Leave it blank for auxiliary parameters.
Flag-Arg separator : Specify a separator between the flag and its argument for command-line. Leave it blank, by default, it will use a space for flags starting with '-' and use '=' for other flags. '\s' for one space.
Required : Check it if you want the parameter to be required in setting form, and an error pop-up will occur in setting form if the parameter haven't set in setting form.
Auxiliary : Check it if this is an auxiliary parameter which will not be used in tool command-line in constructing the job script. You can create a auxiliary category parameter for switching a set of other parameters. You can create a 'Batch run' auxiliary parameter used to set up a batch of runs for the same settings of tool's parameters, which is useful for tools with random outputs. Also you can create an auxiliary parameter of InputFile to get files which are required in the same folder as the main input file.
Type : Specify a parameter type which will affect the parameter field type in setting form. The list of types are described as follows:
- Sting : The argument is random string, or mixed number and symbols.
- Integer : The argument is integer number.
- Float : The argument is float number.
- CategoryDrop : The argument is from limit number of category options, the parameter field will be shown as dropdown with options in setting form.
- CategoryRadio : The argument is from limit number of category options, the parameter field will be shown as list of radio options in setting form.
- Boolean : The parameter is Boolean type, e.g. parameter with flag on means apply it (ture) and off means not apply it (false). The parameter field will be shown as checkbox in settings form.
- BooleanRadio : The parameter is Boolean type, e.g. parameter with flag on means apply it (ture) and off means not apply it (false). The parameter field will be shown as 2 radio options of True and False in setting form.
- InputFile : The argument is input file which can be one file or multiple files.
- InputDir : The argument is input directory which contains the input files.
- InputDirPfx : Used for input format as 'input-directory-path/prefix' which needs the input path and file prefix, e.g. Bowtie2 index input.
- OutputFile : The argument is in format 'output_folder/output_file_name'.
- OutputDir : The argument is one output directory path, which often contains different types of output files and sub-folders.
- OutputDirPfx : The argument is in format 'output_directory/prefix', which often contains different types of output files and sub-folders.
Arguments : Here you can define list category options for category parameters, one option per line. You can add empty option with '---', which gives a option to unset this parameter. The empty option shows as '---------' in setting form. If the parameter is type Outputdir, then you need to check if the tool can create output directory itself or not; if not then you can put 'mkdir' in Arguments field to require Bioinfopipe to create a output directory for the tool. To display different text instead of option itself, you can add showing text as format 'option : showing_text'.
Default : Specify the default value used by the tool, it will shown in the parameter field in tool setting form.
Set value : Set a value different from its default for the parameter, which will be used in constructing command line. Also it is recommended to put default output format for outputs, e.g. 'bowtie2_$$.sam' for OutputFile, 'bowtie2_$$' for OutputDir, 'bowtie2_$$/Prefix' for OutputDirPfx, where symbol '$$' represents the auto-generated basenames for batch runs (see details in section 'Adding an output file map').
Multiple : Check it if the parameter of type 'InputFile' can be set in multiple files, e.g 'file1,file2,file3'. By default files separated by comma. You can specify other separators in Arguments field where '\s' for space-separated, 'sq' for space-separated with quotes, 'b' for bar-separated, 'f' for separating with the same flag.
Batchable : Check it if the parameter is allowed to be set a batch of values and analysis job will run the tool in batch based on the unit combination of inputs and batch parameters. Inputs/Outputs are always batchable.
Help text : The help text will be shown under the parameter field in settings form.
Placeholder : A hint text which will be shown in the parameter field in settings form leave it blank if you want the default value be shown in the parameter field.
Attach to options : You can attach the parameter to a attachable parameter's options (one or multiple) so that it will be pop up immediately after clicking attachable parameter's option in setting form. It will make setting form more neat for the situation where parameters can be grouped for separate use cases or related to parameter options exclusively.
Attachable : Check it if you want this parameter to be an attachable parameter, its options will be shown in 'Attach to' field. Attachable parameter should be in type of 'CategoryDrop', so that it can group other parameters into category groups which are exclusive of each other in terms of settings. By clicking a category option the corresponding attached parameters will shown up, while other groups of parameters will be hidden.
Click 'Create' button, the parameter object will be created and redirected to its parameter page. It will also be shown up in the tab 'Parameters' in tool configuration page and in left pane for quick access. You can always edit the parameter details by clicking the 'Edit parameter' button. To delete the parameter you can click 'Delete parameter' button. You can continue to add a new parameter by click 'Add another parameter' button.
2.3. Adding a input file map
For parameters of type InputFile, you need to create one input file map for it, which is a object defines what file types and dataset type allowed for input parameters. To create a input file map, go to a parameter page and click 'Add input files' button, it will open a form page titled 'Add a new input object' with sub-forms: 'General settings' and 'Describe object'. The fields are described as follows:
Name : The name of the input file map, it is the parameter name by default.
Tag : Specify a tag name under 12 characters containing only letters, underlines and hyphens. This tag will be shown in its tool box in pipeline flowchart. You can put its major file type for it, e.g. 'FASTQ-PE1'.
File Type : Select file types to define the scope of input files. The principle here is to select file types as general as possible to make sure not miss out any file type it supports, for instance, if input file support any sequence file then better select SEQ instead of FASTA or FASTQ.
Dataset type : Choose the input dataset type it supports, which enables users to select a dataset of specified dataset type as input, if not sure simply choose 'BasicSamples'.
Description : Give a brief introduction of the input file under 300 characters.
The input file map will be created by clicking the 'Create' button. It will be shown in its related parameter page and in the tab 'Input files' of tool configuration page. You view details, edit and delete it by clicking the corresponding icon buttons in table.
Note: You don't need to add input file map for type InputDir and InputDirPfx.
2.4. Adding an output file map
For parameters of type OutputFile, OutputDir or OutputDirPfx, you need to create at least one output file map for it, which is a object defines what file types and dataset type allowed for output parameters. To create a output file map, go to a parameter page and click 'Add output files' button, it will open a form page titled 'Add a new output map' with sub-forms: 'General settings', 'Describe object' and 'Dataset settings'. The fields are described as follows:
Name : The name of the output file map which should be different from other file map names for the parameter.
Tag : Specify a tag name under 12 characters containing only letters, underlines and hyphens. This tag will be shown in its tool box in pipeline flowchart. You can put its major file type for it, e.g. 'FASTQ-PE1'.
File Type : Select file types to define the scope of input files. The principle here is to select file types as general as possible to make sure not miss out any file type it supports, for instance, if output file support any sequence file then better select SEQ instead of FASTA or FASTQ.
Path to file : A path-to-file for parameter type of OutputDir and OutputDirPfx, which will append to output directory. The asterisk and question mark can be used as a wildcard specifiers for multiple files, e.g 'sub-folder/sample_*.fastq'. Leave it empty if you want all output files/folders to be in this file map. It is recommended to set major output files as separate file maps so that they can be fed into downstream tools in a pipeline.
Apply it : Indicating if apply this output file map in setting form. If applied in a analysis job, its output files will be shown in analysis outputs.
Description : Give a brief introduction of the file under 200 characters, which will be shown in related output card in analysis outputs.
Dataset type : If you choose a output dataset type for its output files, analysis job will generate a corresponding dataset of selected dataset type for batch of output files generated from batch analysis job. You can leave it blank if you don't need its dataset.
Dataset name : Analysis job will automatically generate a dataset name, or put a name here to override it. For paired-end output files you must set the same dataset name for both of output file maps.
The output file map will be created by clicking the 'Create' button. It will be shown in its related parameter page and in the tab 'Output files' in tool configuration page. You can view details, edit and delete it by clicking the corresponding icon buttons in table.
For parameters of type 'OuputFile', its default should be in format: 'Optional_folder/Prefix_$$_Suffix.ext', where $$ is reserved place-holder representing auto-generated basename for the job (denoted as job-basename) in analysis job. The job-basename is constructed based on input file labels and the labels of batch parameters, and has a format: 'Main_input_file_label#other_input_label#param_tag=value', where Main_input_file_label is the input file label of first input parameter according to group order and parameter positions in group, then followed by other input file or folder label separated with '#' symbol, and then followed by batch parameters (set with batch values) in a format 'parameter_tag=value' and separated with '#' between parameters. Here are examples of output file and output folder:
bowtie2_sample1_label#REF#p1=xx#p2=xx.sam
megahit_sample1_label#p1=xx#p2=xx/final.contigs.fa
2.5. Creating a validation rule
For some tools, their parameters could be dependent each other, and even have more complicated relationship or constrained setting space between some of parameters. In tool configuration users are allowed to create validation rules to capture those relationships, such that users will get alarmed if those relationships are unsatisfied when reviewing job settings. So users can correct wrong settings before submitting the job, hence save users' the time and computation cost.
To create a validation rule, just click 'Create validation rule' in tool configuration page, it will open a modal popup titled 'Add validation rule', with 2 fields as follows:
Rule: It is boolean expression consists of parameters as variables and boolean operators and round/square brackets, all separated with one space, see more details below.
Message : Put a informative message which will be sent out if the rule is violated when reviewing the job settings. You can leave it blank, and it will provide default message as "The validation rule '<the rule>' was not met with <args in rule>". You can also add the rule formula with '%(rule)s' and argument list with '%(args)s' in your costumed message, such as "Parameter p1 and p2 not met the rule '%(rule)s' with %(args)s".
Rule
The elements of boolean expression of rules are described as follows:
- $<parameter_tag> : parameter variable, .e.g. $a, $b, $c.
- #<output map id> : variable of output map, e.g. #om1, #om2.
- and : logic operator 'AND'.
- or : logic operator 'OR'.
- not : logic operator 'NOT'.
- == : equal.
- != : not equal.
- ~= : match python regex pattern.
- '<string>' : string value.
- in : check if in a list of values.
- -> : logical connective of imply .
- <-> : logical connective of biconditional imply.
- > : larger than a number.
- >= : larger or equal a number.
- < : less than a number.
- <= : less or equal a number.
- ( ) : round brackets.
- [ ] : square barckets.
Following are some example rules:
$p1 -> $p2 # p2 must be set up if p1 has been set up. ( $p1 and $p2 ) or ( $p3 and $p4 ) # The tool must have both p1 and p2 being set up or both of p3 an p4 being set up. not ($p1 and $p2) # You can't set up p1 and p2 at the same time. $p1 < 10 -> $p2 > 5 # If p1's value less than 10, then p2's value must larger than 5. $p1 == 'c1' -> $p2 != 'c2' # If p1 equals 'c1', then p2 must not equal 'c2'. $p1 == 'c1' -> $p2 ~= '.+.png$' # If p1 equals 'c1', then p2 must match regex '.+.png$'. $p1 == 'c1' -> $p2 in [ 'c2', 'c3' ] # If p1 equals 'c1', then p2 can only choose value 'c2' or 'c3'. $p1 >= $p2 # p1 must larger or equals p2. $p1 == 'a' <-> $p2 == 'b' # p1 equals 'a' if and only if p2 equals 'b'. $p1 == 'c1' -> #om1 # if p1 equals 'c1' then output map 'om1' must be applied.
By clicking the 'Create' button, the validation rule will be created. It will be shown in the tab 'Validation rules' in tool configuration page. You can edit and delete it by clicking the corresponding icon buttons in table.
2.6. Import tool configuration
Once a tool wrapper is created, users need to configure its parameters. Instead of setting parameters by filling out the form one by one, users can also create parameters in batch by creating a JSON script containing the parameter configurations.
To import a configuration, simply click the 'Import Config' button on the tool configuration page. A form titled 'Import Tool Configuration' will appear, where you can edit the parameter settings in JSON format. An example configuration JSON script is shown below.
{
"param_group":{
"name": "test param group",
"level": "Basic",
"parameters":{
"mode":{
"name": "Choose the setting mode",
"type": "CategoryRadio",
"flag": "--mode",
"argument": ["mode-1", "mode-2"],
"default": "mode-1",
"help_text": "This is help text.",
"attachable": true,
"auxiliary": true,
"batchable": false,
},
"infile":{
"name": "Input sequence file",
"type": "InputFile",
"flag": "--infile",
"help_text": "This is help text.",
"placeholder": "",
"attachable": false,
"attach_to": "mode : mode-1",
"required": true,
"auxiliary": false,
"batchable": true,
"multiple": false,
"cmd_position": 2,
"inputmap": {
"name": "inputmap name",
"tag": "myinput",
"file_type": ["FASTA", "FASTQ"],
"description": "This is test input file",
"dataset_type": "PairedEndSeq"
}
},
"outdir":{
"name": "Output directory",
"type": "OutputDir",
"grp_position": 0,
"cmd_position": 2,
"flag": "--outdir",
"separator": "",
"default": "results",
"set_value": "results",
"help_text": "This is help text.",
"placeholder": "",
"attachable": false,
"attach_to": "mode : mode-2",
"required": true,
"auxiliary": false,
"batchable": true,
"multiple": false,
"cmd_position": 2,
"outputmap":{
"fastq": {
"name": "output file 1",
"file_type": ["FASTQ"],
"path": "output_folder/*.fastq",
"description": "This is output sequence file",
"dataset_type": "PairedEndSeq",
"dataset_name": "testdataset",
"save": true
},
"data": {
"name": "output file 2",
"file_type": ["CSV"],
"path": "output_folder/*.csv",
"description": "This is output data file",
"save": false
}
}
}
}
}
}
By clicking the 'Import' button, Bioinfopipe will read the JSON script and create the configured parameter group and parameters if they do not already exist.
By clicking the 'AI' button, Bioinfopipe can automatically generate a configuration JSON script using AI LLMs.
-
For script-based tools, the parameter configuration is generated based on the script itself.
-
For image-based tools, the configuration is generated based on your tool description, so it's recommended to provide a clear description and include links to the tool’s manual.
The generated configuration will appear in the 'Config' field, where you can review and modify it as needed.
2.7. Creating a script-tool
For script-tools, you will need to create script to run in a environment. To start write your script, go to the tab 'Script' in tool configuration page and click the 'Edit' icon button, then you can start to edit your script. You need to choose a script language for your script so that the editor will highlight the corresponding syntax of selected language.
The script should contain 3 main parts: shebang, parameter declarations, and the body of business logic. The top line should be shebang which should point to the location of the interpreter executable file. For example:
#!/usr/bin/RscriptTo make script as command-line tool, you need to declare the parameters. You can declare positional parameters directly in most of script languages, or you can use flag-option style through a library, e.g. 'getopt' package in python and R. Following example shows a R script with positional parameters.
#!/usr/bin/python
#!/usr/bin/perl
#!/usr/bin/env python3
#!/usr/bin/RscriptThis example shows a R script with flag-option style parameters.
args = commandArgs(trailingOnly=TRUE)
library(NOISeq)
count <- get(load(args[1]))
mylen=read.table(file=args[2],sep='\t', header=T, row.names=1, check.names=F, stringsAsFactors=F);
mygc=read.table(file=args[3],sep='\t', header=T, row.names=1, check.names=F, stringsAsFactors=F);
mychr=read.table(file=args[4],sep='\t', header=T, row.names=1, check.names=F, stringsAsFactors=F);
mybio=read.table(file=args[5],sep='\t', header=T, row.names=1, check.names=F, stringsAsFactors=F);
myfactors=data.frame(TreatRun=c('5T','16T','5C','16C','5B'), Treat=c('T', 'T', 'C', 'C', 'B'), Run=c('5','16','5','16','5'))
data <- readData(data=count, length=mylen, gc=mygc, biotype=mybio, chromosome=mychr, factors=myfactors)
saturation = dat(data, k = 0, ndepth = as.numeric(args[8]), type = "saturation")
png(file=args[6],width=20,height=20, units = "in", res=100)
explo.plot(mysaturation, toplot = "protein_coding", samples = 1:4)
dev.off()
save(saturation, file=args[7])
#!/usr/bin/Rscript
library("getopt")
library(NOISeq)
optspec <- matrix(c(
'data', 'd', 1, 'character', '/path/to/data/',
'out', 'o', 1, 'character', 'Path to output file',
'factor', 'f', 1, 'character', 'param factor',
'nss', 'n', 1, 'integer', 'param nss'
),byrow=TRUE,ncol=5)
opt <- getopt(optspec)
mydata=get(load(opt$data))
myout=noiseq(mydata, factor=opt$factor, k=NULL, norm='n', pnr=0.2, nss=opt$nss, v=0.02, lc=1, replicates='no')
save(myout, file=opt$out)
4.8. Creating a script-tool using AI
Another great way to quickly generate an analysis script is by using AI LLMs. Simply click the 'AI' button under the 'Script' tab, and this will open a form where you can define your script, including the following fields.
Language: choose a programming language for the script.
Objective: put a short description about the script's purpose. For example:
- Perform data processing and filtering for single-cell RNA-seq analysis.
Inputs: specify the list of input files. For example:
- An AnnData object containing raw counts.
Outputs: specify the list of output files. For example:
- A filtered AnnData object saved to file.
- Quality control and statistics plots.
Functionality: describe the main functions/analyses required for the script. For example:
- Filter low-quality cells and unwanted cells based on best practices.
- You must explore all possible filtering options thoroughly.
After clicking the 'Generate' button, it may take a moment to create the script. It will appear in the script editor area, where you can review and adapt it to suit your needs.
4.9. Documenting a tool
After completing the tool configuration, it is recommended to create a document for the tool. The tool document provides instant help links for users, appearing around parameter fields, tool titles, and outputs in the analysis job.
To create a tool document, simply click the 'Create doc' button in the tool configuration. This action will automatically generate an article and lead you to the article console page. The tool article consists of four main sections: 'Introduction,' 'Settings,' 'Computation,' and 'Use cases.' Within the 'Settings' section, there are sub-sections for 'Parameters,' 'Output,' and 'Parameters in command-line.'
You can add content to each section by clicking the corresponding section name. If the tool is a different version from another tool and shares most of its content, there is no need to add the same content separately. Instead, you can merge the other tool's document by clicking the "Merge article" button in the article console and only add the contents that differ from the other tool. You are also welcome to add additional sections to the tool article, such as benchmark tests between similar tools.
If you add new parameters, remove parameters, or change the tool's name and version after creating its article, the tool document will automatically add new parameter sections, remove sections of deleted parameters, or update the tool's name and version accordingly. You only need to add content to the new parameter sections.
Note: don't change the sections names which are automatically created because they will be used to get help links with icon button.
By using AI LLMs, Bioinfopipe can automatically generate the entire tool documentation by clicking the 'AI' button in the 'Head' section. You can also generate content for individual sections by clicking the 'AI' button within each section.
4.10. Saving tool as new
If you need to create separate new version of a tool which is already created, the quick way is to copy the tool already created and then modify it. Open the tool configuration of existing tool, then click 'Save as new' button, it will pop up a form titled 'Copy tool' with 2 fields as follow:
Name : You can change the tool name, but normally keep the same name, just change the version.
Version : Change the version here.
Click 'Save as new' button to create a new tool, it will open new tool configuration page where you can have further modification on it. After finished modification, you can created its document and merging the contents of existing tool's article.
4.11. Publishing your tools
You are welcome to publish your in-house tools, so that your lovely tools can be used world wide and make it shine.
To apply for publishing a tool, navigate to your tool's configuration page and click the 'Publish' button. This will open a modal where you can click 'Apply to Publish Tool' to submit your publishing application. After submitting, the 'Publish' button will change to 'Review', indicating that your tool is under review for publishing.
During the review process, we will:
- Test your tool using test data.
- Check the tool's configurations to ensure proper setup.
- Review the code if it is a script-based tool.
If your tool passes the review, both the tool and its help document will be published by copying them as the published version. We will contact you if any issues arise during the review.
Once the tool is successfully published, the 'Publish' button will change to 'Update', allowing you to submit updates if you make significant improvements or upgrades to your tool. By clicking the 'Update' button, you can apply for updating your published tool. As with the initial publishing process, the updated tool will undergo review. If approved, your published tool and help document will be updated, or a new version of your tool will be created if there are major upgrades.