4 Compute genomic features of SARS-CoV-2 lineages and sublineages

The LinToFeats.pl utility computes high level genomic features of SARS-CoV-2 lineages. A complete list of such high level features along with a brief description is provided in the features.csv file in the main Github repo of HaploCoV. The tool uses pre-computed annotations of SARS-CoV-2 variants obtained by CorGAT to derive its scores. Such annotations are available from the globalAnnot file,and are updated on a bi-weekly basis. At every execution the most recent version of the annotations is downloaded automatically. Users are kindly encouraged to not modify this behaviour, unless for a very good reason.

LinToFeats.pl takes the output of augmentClusters.pl as its main input, the output file is a simple tab delineated table where for every lineage/group in input, genomic features are computed.

Options The program requires only 3 parameters:

  • –infile file with lineages/groups and their characteristic genomic variants. 1 lineage per line (main output of augmentClusters.pl);

  • –annotfile file with CorGAT annotations of SARS-CoV-2 variants. Defaults to globalAnnot;

  • –update update globalAnnot to the most recent version? T=true. F=false. Default=T.;

  • –outfile name of the output file.

Execution

An example of a valid command line for the execution of LinToFeats.pl is:

perl LinToFeats.pl --infile lvar.txt --outfile lvar_feats.tsv `

The main output file: lvar_feats.tsv will contain genomic features in tabular format for all SARS-CoV-2 groups/lineages newly formed groups/sub-lineages. The output consists in a simple table, where for every variant, the numeric value associated with each feature is reported