GenoFig - version 1.1

Tool for graphical vizualisation of annotated genetic regions, and homologous regions comparison. It is an independent recoding of Easyfig 2 initially developped by at the S. Beatson Lab [https://mjsull.github.io/Easyfig/]
Authors: LECLERCQ Sébastien - BRANGER Maxime
Table of content
License
Copyright 2023 INRAE, Université de Tours
The GenoFig logo is trademark of INRAE, Université de Tours
Some parts of the logo are under a CC BY 3.0 license: Dna Icon Designed by vecteezy
Citation
Maxime Branger, Sébastien O Leclercq, GenoFig: a user-friendly application for the visualization and comparison of genomic regions, Bioinformatics, Volume 40, Issue 6, June 2024, btae372, [https://doi.org/10.1093/bioinformatics/btae372]
Installation
For MACOS users: You may experience mouse pointer icon weird behavior, due to system-specific implementation of GTK3 libraries. It shouldn't affect functionality.
For MACOS users (bis): Freezing have been reported when using GenoFig on a laptop with a shared screen. If sequences do not import properly of if you can't create new features, try to move the GenoFig window on the laptop screen.
Compiled versions
Compiled versions of release 1.1 for Windows and MacOS can be found here
For Mac users working on new architectures (M1/M2/M3), consider installing Rosetta 2 if the compiled version doesn't start.
These versions does not include latest bux fixes and improvements. Note also that it may take a while to start, without any prompt (~1 min, please be patient after clicking on the icon!).
Conda Installation
On Linux/MacOS
Download the GenoFig source code using the 'Download' button on top of this page. Cloning is currently not available for people not member of the INRAE French Institution. After decompression, open a terminal in the folder containing the decompressed files and run:
conda env create -f extras/requirements.yml
extras/SETUP.sh
For MAC users on M1/M2/M3: Blast is not available yet in bioconda for these platforms. Please replace requirements.yml by requirement_mac_arm64.yml and install blast from source here: Blast+ on NCBI
You can then run GenoFig with:
./Genofig
You can also add the GenoFig directory to your PATH by executing
echo 'export PATH="'$(pwd)':$PATH"' >> ~/.bashrc
if you are using zsh and not bash command line terminal (mostly macOS users), run this instead:
echo 'export PATH="'$(pwd)':$PATH"' >> ~/.zshrc
After opening a new terminal, this will allow to execute the program from any directory by simply typing 'Genofig'.
For Ubuntu Users: The "Application" menu seems to not display properly under WayLand with NVIDIA drivers (either not present or with only the "Quit" option). To solve the problem, just disable Wayland: Revert Wayland to X
On Windows
On Windows, Blast is not included in conda. You will need to install it on your own : Blast+ on NCBI
Please pay attention to the Configuration section of NCBI tutorial to correctly configure your installation.
blast executable must be in your env PATH.
You first need to install Miniconda: Conda installation for windows.
Then, download the GenoFig source code using the Download button on top of this page and decompress it. Cloning is currently not available for people not member of the INRAE French Institution. Open an anaconda prompt (i.e. powersheel terminal), navigate to the folder containing the decompressed GenoFig code and run:
conda env create -f extras\requirements_windows.yml
Launch Genofig from an anaconda powersheel terminal in the directory where GenoFig was decompressed, with:
conda activate genofig
python Genofig.py
conda deactivate
These steps can be inserted in an executable .bat file.
Interface
GenBank format
GenoFig have a powerful selection menu (the feature panel) to display genes and other features of a genomic sequence using annotations provided in the GenBank format. This format, developed by the NCBI, defines genetics loci from a dictionary of predefined features, themselves having different attributes, also called fields.
For instance, in the GenBank sequence shown below, there are four features: two genes, one CDS, and one tRNA. Each feature have a genomic position, and different fields.
- The CDS feature have 8 fields, starting with '/', each having a value. The field 'product' of the feature CDS have the value "Gamma-glutamyl phosphate reductase".
- The tRNA feature have 5 fields, and its 'gene' field value is "thrW".

GenoFig scans the value of feature fields to decide how to display features (see the Add features tutorial and the Features panel description for more information) Most useful fields are 'gene', 'product', 'note', and 'any'. This last option includes all searchable fields.
In prokaryotic sequences, gene features generally 100% overlap other features, in which case they have the same locus_tag and gene fields than the feature they overlap. It is therefore usually better to not display gene features in GenoFig to avoid redundancy (except for advanced display tricks).
Tutorial
Import sequences
The first step is to collect sequences you want to display with GenoFig. GenoFig accepts sequences in fasta format, but provides a better experience with sequences in GenBank format. GenBank files (.gb, .gbk, .gbff, .flat) can be downloaded as part of a genome Assembly, can be produced locally by automatic annotation softwares, or can be obtained from the NCBI Nucleotide website.
This last option is the preferred one, since GenoFig is designed to compare small genomic regions (a few hundred Kbp at most). The tool can deal with the comparison of two or three complete bacterial genomes (although the homology search will take ages to run), but users are strongly discouraged to use more.
In this example we will compare the Icm/Dot secretion system of two Legionella species, downloaded from the NCBI website
- Load the region for Legionella quateirensis found at position 121560 to 145670 on contig LNYR01000001:

then download the region using the "Send to > File > GenBank (full) > Create File" button
- The download can also be performed from the graphical view of NCBI by selecting the region of interest and clicking on "Download > GenBank Flat File > Visible range". This is done here for the region for Legionella santicrucis on the contig LNYU01000091, position 271306 to 295700:

A good practice is to move all the downloaded sequences in a new folder, dedicated to the project (for instance 'LegionellaT4SS').
After launching GenoFig, import the downloaded sequences using the plus
icon in the Sequence panel and navigating to the newly created folder.

The two sequences can already be displayed by pressing the CREATE FIGURE
button. A promp will ask you a name and path to save the figure.
Linux users could save in SVG format, while MacOS/Windows users may prefer the PNG format, easier to open with default image viewers on these systems.
The figure should look like this:

By default, all CDS are printed in gray and labelled according to their product, except CDS coding for hypothetical proteins which are printed in light gray.
Add features
Here we can see that most CDS encode for Icm/Dot proteins related to the secretion system, we will therefore colour them in yellow.
For this, go to the Features panel, add a new feature with the plus
icon, and type 'icm' in the column filter
of the new feature.
Select 'product' in the column in field
and change the column color
to yellow by clicking on the box.
This action tells GenoFig to color in yellow all features of all sequences which have the term 'icm' in their product GenBank field.
The figure needs to be created again by pressing the CREATE FIGURE
button to include modifications made in the GenoFig interface.


As you can see, the LphA/DotK was not coloured because the product does not include 'icm'.
Since the filter engine of GenoFig accepts regular expressions, change 'icm' by 'icm|dot' in the column filter
to search either 'icm' OR 'dot'.
If you want to emphasize on the IcmE/DotG protein, you can add a new feature with a more specific filter, by entering 'icmE' and colouring in orange. The size of the CDS can also be enlarged by a ratio of 1.5, and the line width set to 3.


Improve representation
We can now see clearly the IcmE-encoding gene in both species, but the figure is still not very clear. Some improvements can be made:
In the Sequence panel:
- revert one sequence to have the operon in the same direction, by checking one box in the column
reverse
- change sequence identifiers to their respective species names, by choosing 'organism' instead of 'locus' in the seq. label
type
column. The checkbox in columI
(for italic) can also be selected. - move the species names to the left by selecting 'left' instead of 'top/left' in the seq. label
position
column. To avoid the name to go over the image border, increase the left image margin from 200 to 400 (in the top panel) - Since most features are repeated, we may want to display their products only for the top sequence. In the column `feat. label uncheck the box for General and check it for the first sequence.

In the Feature panel:
- Change feature names for clarity, for instance rename 'Feature 3' by 'Secretion system' and 'Feature 4' by 'icmE (dotG)' by cliking on the name. These names will be used to create the legend in the next section.
- remove labels for every feature except for the icmE feature by deselecting boxes in the
label
column. - change the label color, size, rotation and font type (I for italic, B for bold) for the icmE feature.

In the Legends panel:
- activate the Display features button
- increase the scaling to 0.75 and the font size to 35 for the Display features. Only features selected in the
in legend
column of the Feature panel will be put in the legend. In our example, the icmE feature will not appear. - Change the scaling of the 'Display scale' to 0.22 to set up the scale to 5 Kbp. For now, the scale is computed as a fraction of the image width, and it requires some trials to get the expected value.

After creating the figure again, it will look like this. Better! :

Note that the displayed gene name is dotG and not icmE as expected. This is because the CDS is annotated as dotG in the GenBank file, and cannot be changed interactively in GenoFig. A solution would be to hide the feature label and display the icmE feature in the legend, or to manually edit the created figure with a drawing application or PowerPoint. A last possibility would be to modify the downloaded GenBank file (with a basic text editor) and replace dotG by icmE. In this case, the annotation needs to be loaded again, by clicking on the info column (the bulb icon) next to the sequence name in the Sequence panel, and pressing 'Reload'
Search homologies
Since both secretion systems should be homologous, a final step will be to display homologies between the two species.
To do that, go to the Homologies panel and click the Run BlastN
button. A prompt will ask you to save the blast results in a file, that you can put in the same directory than the sequences.
DO NOT run Homology search if you imported more than 2 complete genomes in GenoFig, even if you display only a small sub-region (with min/max options). The blast will still be performed between all complete genomes and will be extremely long Instead, download only the sub-regions of interest as explained above (here)
If you re-create the figure, you will see that not so much genes seems homologous. To display more homologies, decrease the minimum similarity
option to 50 in the Homologies panel.
Minimum and maximum colors can also be changed to create a color gradient, and the show labels
checkbox can be selected to display the similarity.

The quality of the figure can be even more refined by setting up the minimum length
option to '500' and the minimum similarity
option to '68' in the Homologies panel. This will hide all the small hits caused by the repeated nature of the icmE gene.
A last improvement will be to increase the space between sequences by putting '300' instead of '200' in the column space below
of the Sequence panel.

Add some decoration
To help other people recovering the T4SS region in the Legionella genome, it position will be added to the figure. For this, go to the Decoration panel and click the plus
button.
Then change the hheight ratio to 2 and select the 'use original GenBank position' checkbox.
Finally, click on the 'show on sequences' icon and select only the first sequence.

And that's it, the figure is now ready for a presentation or a publication.

Save the project
A final step will be to save your project, i.e. the GenoFig settings set up to produce the figure. For this, click on the menu
button, and choose save
. Best practice is to save the project file (a .genofig) in the same directory than the blast outfile and the downloaded sequences.
Depending on the system, the menu button can be the icon at the top left of the application (Windows), the application name (or 'python') in the taskbar (MacOS) , or the Application
menu in the taskbar or at the top left of the application (Linux)
Many more options are available in GenoFig, which are detailed in the sections below.
Panels presentation
General

General settings are proposed on top of the application.
You can manually set the figure output width (the height is automatically determined by the number of sequences to print).
Left/right/top/bottom margins can also be determined independently.
The output figure path and name can be set by clicking on the Draw figure in
icon.
The CREATE FIGURE
button generates the figure in SVG or PNG format. If no figure path and name is set when pressing it, a prompt window will ask you to.
See Descriptions options for more information
Saving/Loading

By clicking on the Application
menu, you will be able to save your current project (every sequences and display settings) with Save/Save as...
, or only the display settings (features, blast, legends and decorum) with Save configuration
.
You will also be able to load previous saved projects with Open
or configuration files with Load configuration
. Loading a configuration file will keep already imported sequences and apply the saved display settings on them.
Under LINUX, a specific configuration can also be saved as default, and will be automatically loaded upon GenoFig start.
Status bar

Information about actions is displayed in the status bar at the bottom of the application. Standard information are in black, while it turns green when the figure was successfully created or red when an error occurred.
Sequences panel

This panel provides all parameters for each imported sequence. The General
line defines default values that will be used if no specific parameter is set for a sequence (i.e. when '-' is set in a field).
Sequences in fasta or GenBank formats can be imported using the plus
icon. Only nucleotide/genomic sequences are accepted. Multi-fasta or -Genbank are accepted ; each internal sequence will be imported as an independent sequence. The sequence name is equal to the LOCUS field in the GenBank file and cannot be modified in GenoFig. Sequence identification displayed in the figure can be changed using the label type
field (species, description, size,...).
GenBank sequence information is provided by clicking on the infos
icon. If any information (except the LOCUS field) is modified in GenBank files while GenoFig is running, the file must be loaded again using the reload
button from the sequence information pop-up.
Sequences can be deleted using the cross
icon on the left of the sequence name. Beware, there is currently no warning for sequence deletion !
You can enable/disable a sequence drawing with the active
field. Sequence ordering can be changed by dragging up and down the sequence name.
Vertical space between sequences can be changed using the space below
field.
For MACOS users: drag-and-drop will copy the sequence instead of moving it, due to system-specific implementation of GTK3 libraries. You will need to delete the original sequence using the cross
icon.
See Descriptions options for detailed information on each field
Features panel

This panel will help you to display the features included in the GenBank annotation. You can add as many as you want using the plus
icon. Like for the Sequences panel, the General
line defines common parameters if no specific value is set up for a feature's field.
You can choose to display or not a feature with the selected
field.
Feature definitions can be deleted or copied using the cross
and copy
icons on the left of the feature name. Feature names are editable.
The most important parameter for Features is the filter
field. It will help you define on which GenBank features to apply the selected parameters. For instance, defining a CDS feature with a fill color
set as green and a filter
set as "kinase" will print in green all CDS having the term "kinase" in any of the fields of the GenBank file. The in field
field allows you to reduce the scope of the seach to a specific field of the GenBank feature (gene, product, etc.).
Values in the filter
field are interpreted using regex syntax, in a case-insensitive way. This means that the OR operator '|' or metacharacters such as \d+ will work. More information on regex syntax here: https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html
Features definition are interpeted from top to bottom, and only the last valid definition will be selected for each feature in the figure. For instance, a feature definition with filter
set to "kinase" will never be applied if another feature definition with filter
set to "kin" is present lower. Features ordering can be changed by dragging up and down the feature name.
For MACOS users: drag-and-drop will copy the feature instead of moving it, due to system-specific implementation of GTK3 libraries. You will need to delete the original feature using the cross
icon.
See Descriptions options for more information
Homologies panel

Homologies between sequences can be calculated using BlastN or TBlastX. When all sequences are loaded, the Run BlastN
button produces an All-vs-All blast output which needs to be saved on the disk. Blast needs to be run again only if new sequences are imported. Various parameters allow you to filter which blast hits to display from the total hits stored in the Blast file.
Homologies can be displayed between adjacent sequences, between all sequences, or all against a single sequence using the Matches selection
area. Used-defined homology comparisons can also be set up using the Custom
option.
The TBlastX area is hidden by default and can be displayed by clicking on the TBLASTX
square button.
DO NOT run Homology search if you imported more than 2 complete genomes in GenoFig, even if you display only a small sub-region (with min/max options). The blast will still be performed between all complete genomes and will be extremely long Instead, download only the sub-regions of interest as explained in the tutorial (here)
See Descriptions options for more information
Legends panel

This panel allows you to manage legends displayed on the figure. By default, only the scale is printed. You can easily enable or disable each type of legend and choose where to print them out. The Scaling
parameter is a factor of the figure size. Blast legends use the vertical figure size while features and scale legends use the horizontal figure size. The scale is automatically reduced to the closest informative size (1Kbp, 2Kbp, 10Kbp, 20Kbp,...).
The features legend uses feature names as identifiers. Spaces are allowed, but try to avoid other special characters as much as possible.
Beware that legends are displayed in the margins of the figure. If a legend overlaps the sequences, you can enlarge the corresponding margin in the figure's General options.
See Descriptions options for more information
Decorations panel

This panel allows you to manage some additional sequence information, like GC%, GC skew or sequence-specific scales.
It works like Sequences and Features panels.
For GC% and GC skew information, the value is calculated every X nucleotides, given by the field step
, on a fragment of size Y given by the window
field.
For the Scale information, step
and window
fields represent minor and major tick bars, respectively.
By default, decorations are displayed for all sequences. They can be displayed only on some sequences by using the show on sequences
icon field which will open a pop-up menu for sequence selection.
For MACOS users: drag-and-drop will copy the decoration instead of moving it, due to system-specific implementation of GTK3 libraries. You will need to delete the original decoration using the cross
icon.
See Descriptions options for more information
Example of generated figure
This example shows the figure generated with the display options presented in the above example images

Description of all options
General panel
Option | Description | Values |
---|---|---|
Image width | Define the exact size of the image width in pixels. Image height will be automatically computed | Integer |
Figure path | Define the output file path for the figure. | SVG and PNG formats are supported |
Margins | Define the top/right/bottom/left margins of the figure in pixels. | Integer |
Background color | Define the background color of the image. | RGB color or transparent |
Enhance graph | Add some shadow effects to features | Checkbox |
Sequences panel
Option | Description | Values |
---|---|---|
Sequence name | Name of the sequence | Automatically set to the GenBank/fasta locus name |
infos | Provide information about the sequence | - |
active | Draw/hide the sequence | Checkbox |
position | Determine the horizontal position of the sequence | left/right/center/best blast/custom |
min | Start to draw the sequence at this position (default=1) | Integer |
max | End to draw the sequence at this position (default=max) | Integer or 'max' |
reverse | Reverse complement the sequence | Checkbox |
height | Define the height of features on the sequence | pixel |
space below | Define the space below the current sequence | pixel |
line width | Define the sequence line width | pixel |
line color | Define the color of the sequence line | - |
seq. label | Print the sequence label | Checkbox |
label type | Select which information to print as label. Information are extracted from the GenBank definition and may be empty for FASTA sequences | locus/accession/organism/strain/description/size or a combination |
label position | Select the label position relative to the sequence | left/right/top/bottom or a combination |
label offset | distance between the sequence and its label | pixel |
label color | Sequence label color | - |
label size | Sequence label size | pixel |
B | Sequence label in bold | Checkbox |
I | Sequence label in italic | Checkbox |
feat. label | Display features labels in this sequence | Checkbox |
feat. label type | Select which information to print as features label | gene/product/note/locus_tag/mobile_element_type |
feat. label position | Label position relative to the features | top/middle/bottom |
feat color | Features label color | - |
feat label size | Features label size | pixel |
feat label rot | Features label rotation (default=0) | angle |
Features panel
Option | Description | Values |
---|---|---|
selected | activate the feature's definition | Checkbox |
in legend | Determine if this feature will be shown in the feature legend | Checkbox |
type | Type of the feature as defined in GenBank | CDS/gene/mobile_element/tRNA/rRNA/misc_feature |
filter | Value to select some specific features on | Regex command |
in field | Field on which to apply the regex filter | any/gene/product/note/mobile_element_type/operon/regulatory_class/rpt_type |
strand | Display feature according to its strand orienttion | none/lead up/lag up |
shape | Shape of the feature in the picture | arrow/rectangle/frame/signal/range/rangeL |
height ratio | Size ratio of the feature compared to the sequence height value | Float |
fill | Fill the feature with color | Checkbox |
fill color | Filling color | - |
line width | feature line width | pixel |
line color | feature line color | - |
hatching | Fill the feature with hatching, using line color and width | several choices |
label | Print the feature label | Checkbox |
label type | Which GenBank field to use as label | gene/product/note/locus_tag/mobile_element_type |
label position | Label position relative to the feature | Opposite/top/middle/bottom |
label color | Feature label color | - |
label size | Feature label size | pixel |
label rot | Feature label rotation | angle |
Homologies panel
Hits correspond to homologous regions detected by Blast.
Option | Description | Values |
---|---|---|
Run BlastN/TBlastX | Execute blast comparisons between all sequences from the Sequence panel. Save the results in an output file | - |
Load file | Load a previous Blast file run. Warning: only hits matching sequences (according to the LOCUS entry) from the sequence panel will be considered | - |
minimum length | minimum hit size to display | nucleotides |
minimum similarity | minimum hit nucleotide similarity to display | percent |
minimum e-value | minimum hit e-value to display | Float |
distance to sequence | distance of the beginning of the hit to the Sequence line center. | pixel |
color (min/max) | Color gradient for balst hits display, according to min and max hit similarities | - |
reversed color | Allow the definition of an alternative gradient for hits in reversed orientation | Checkbox |
opacity | transparency of displayed hits | Float |
outline matches | draw a thin outer line around hits | Checkbox |
show labels | display hit nucleotide similarity in the center of the hit | Checkbox |
min. match size | Minimum hit size to display the label | nucleotides |
label color | hit label color | - |
label size | hit label size | pixel |
decimals | number of decimal to display | Integer |
Matches selection | Selet between which sequences to display homologies | None/adjacent/All vs All/All vs one/Custom |
Legends panel
Option | Description | Values |
---|---|---|
General display | Display/Hide this legend | Checkbox |
Horizontal position | Horizontal position of the legend in the picture | left/middle/right |
Vertical position | Vertical position of the legend in the picture | top/middle/bottom |
Scaling | Scale ratio for this legend relative to the total picture size (default=0.3) | Float |
Font Size | Font size of the legend | pixel |
Font color | Font color of the legend | - |
Cols number | Number of columns into which feature legends should be distributed (default=1) | Integer |
Decorations panel
Option | Description | Values |
---|---|---|
selected | Show/hide the decoration when drawing | Checkbox |
show on sequences | Select on which sequences to apply the decoration | Switches on popup |
type | Type of decoration | GCskew/GC%/scale |
step | GC% and GC-skew window step OR Scale minor tick interval | nucleotides |
window | GC% and GC-skew window size OR Scale major tick interval | nucleotides |
position | Position of the decoration relative to the sequence | On sequence/above/below |
height ratio | Size ratio of the decoration compared to the sequence height value | Float |
line width | Decoration line width | pixel |
line color | Decoration line color | - |
reverse | Reverse min/max position | Checkbox |
print label | Print the GC%/GC-skew min max values OR positions of the scale major tick bars | Checkbox |
label color | Decoration label color | - |
label size | Decoration label size | pixel |
label position | Position of the label relative to the decoration | left/right/middle |
use original GenBank position | Display scale positions relative to the GenBank original region, when the sequence originates from a GenBank file with a REGION property (typically after downloading a subpart of a large GenBank annotated sequence) | Checkbox |