|
|
dbigcg |
Please help by correcting and extending the Wiki pages.
A GCG-format database consists of *.seq and *.ref files - only the *.seq files are used. The data in these is often compressed.
The resulting index-file format is used by the software on the EMBL database CD-ROM distribution and by the Staden package in addition to EMBOSS, and appears to be the most generally used and publicly available index file format for these databases.
% dbigcg
Index a GCG formatted database
Database name: EMBL
EMBL : EMBL
SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
GENBANK : Genbank, DDBJ
PIR : NBRF
Entry format [EMBL]: EMBL
Database directory [.]: embl
Wildcard database filename [*.seq]:
Release number [0.0]:
Index date [00/00/00]:
General log output file [outfile.dbigcg]:
|
Go to the output files for this example
Index a GCG formatted database
Version: EMBOSS:6.3.0
Standard (Mandatory) qualifiers:
[-dbname] string Database name (Any string from 2 to 19
characters, matching regular expression
/[A-z][A-z0-9_]+/)
-idformat menu [EMBL] Entry format (Values: EMBL (EMBL);
SWISS (Swiss-Prot, SpTrEMBL, TrEMBLnew);
GENBANK (Genbank, DDBJ); PIR (NBRF))
-directory directory [.] Database directory
-filenames string [*.seq] Wildcard database filename (Any
string)
-release string [0.0] Release number (Any string up to 9
characters)
-date string [00/00/00] Index date (Date string dd/mm/yy)
-outfile outfile [*.dbigcg] General log output file
Additional (Optional) qualifiers: (none)
Advanced (Unprompted) qualifiers:
-fields menu [acc] Index fields (Values: acc (acnum
accession number index); sv (seqvn sequence
version and gi number index); des (des
description index); key (keyword keywords
index); org (taxon taxonomy and organism
index))
-exclude string Wildcard filename(s) to exclude (Any string)
-maxindex integer [0] Maximum index length (Integer 0 or more)
-sortoptions string [-T . -k 1,1] Sort options, typically '-T .'
to use current directory for work files and
'-k 1,1' to force GNU sort to use the first
field (Any string)
-[no]systemsort boolean [Y] Use system sort utility
-[no]cleanup boolean [Y] Clean up temporary files
-indexoutdir outdir [.] Index file output directory
Associated qualifiers:
"-directory" associated qualifiers
-extension string Default file extension
"-outfile" associated qualifiers
-odirectory string Output directory
"-indexoutdir" associated qualifiers
-extension string Default file extension
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write first file to standard output
-filter boolean Read first file from standard input, write
first file to standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options and exit. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
-version boolean Report version number and exit
|
| Qualifier | Type | Description | Allowed values | Default | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||||||||||||
| [-dbname] (Parameter 1) |
string | Database name | Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/ | Required | ||||||||||
| -idformat | list | Entry format |
|
EMBL | ||||||||||
| -directory | directory | Database directory | Directory | . | ||||||||||
| -filenames | string | Wildcard database filename | Any string | *.seq | ||||||||||
| -release | string | Release number | Any string up to 9 characters | 0.0 | ||||||||||
| -date | string | Index date | Date string dd/mm/yy | 00/00/00 | ||||||||||
| -outfile | outfile | General log output file | Output file | <*>.dbigcg | ||||||||||
| Additional (Optional) qualifiers | ||||||||||||||
| (none) | ||||||||||||||
| Advanced (Unprompted) qualifiers | ||||||||||||||
| -fields | list | Index fields |
|
acc | ||||||||||
| -exclude | string | Wildcard filename(s) to exclude | Any string | |||||||||||
| -maxindex | integer | Maximum index length | Integer 0 or more | 0 | ||||||||||
| -sortoptions | string | Sort options, typically '-T .' to use current directory for work files and '-k 1,1' to force GNU sort to use the first field | Any string | -T . -k 1,1 | ||||||||||
| -[no]systemsort | boolean | Use system sort utility | Boolean value Yes/No | Yes | ||||||||||
| -[no]cleanup | boolean | Clean up temporary files | Boolean value Yes/No | Yes | ||||||||||
| -indexoutdir | outdir | Index file output directory | Output directory | . | ||||||||||
| Associated qualifiers | ||||||||||||||
| "-directory" associated directory qualifiers | ||||||||||||||
| -extension | string | Default file extension | Any string | |||||||||||
| "-outfile" associated outfile qualifiers | ||||||||||||||
| -odirectory | string | Output directory | Any string | |||||||||||
| "-indexoutdir" associated outdir qualifiers | ||||||||||||||
| -extension | string | Default file extension | Any string | |||||||||||
| General qualifiers | ||||||||||||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||
| -warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||
| -error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||
| -version | boolean | Report version number and exit | Boolean value Yes/No | N | ||||||||||
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
########################################
# Program: dbigcg
# Rundate: Fri 15 Jan 2010 12:00:00
# Dbname: EMBL
# Release: 0.0
# Date: 15/01/10
# CurrentDirectory: /homes/user/test/qa/dbigcg-ex-keep/
# IndexDirectory: ./
# IndexDirectoryPath: /homes/user/test/qa/dbigcg-ex-keep/
# Maxindex: 0
# Fields: 2
# Field 1: id
# Field 2: acc
# Directory: /homes/user/test/embl/
# DirectoryPath: /homes/user/test/embl/
# Filenames: *.seq
# Exclude:
# Files: 9
# File 1: /homes/user/test/embl/eem_ba1.seq
# File 2: /homes/user/test/embl/eem_est.seq
# File 3: /homes/user/test/embl/eem_fun.seq
# File 4: /homes/user/test/embl/eem_htginv1.seq
# File 5: /homes/user/test/embl/eem_hum1.seq
# File 6: /homes/user/test/embl/eem_in.seq
# File 7: /homes/user/test/embl/eem_ov.seq
# File 8: /homes/user/test/embl/eem_ro.seq
# File 9: /homes/user/test/embl/eem_vi.seq
########################################
# Commandline: dbigcg
# -dbname EMBL
# -idformat EMBL
# -directory ../../embl
########################################
filename: '/homes/user/test/embl/eem_ba1.seq'
id: 10
acc: 14
filename: '/homes/user/test/embl/eem_est.seq'
id: 1
acc: 1
filename: '/homes/user/test/embl/eem_fun.seq'
id: 1
acc: 1
filename: '/homes/user/test/embl/eem_htginv1.seq'
id: 5
acc: 5
filename: '/homes/user/test/embl/eem_hum1.seq'
id: 15
acc: 18
filename: '/homes/user/test/embl/eem_in.seq'
id: 2
acc: 2
filename: '/homes/user/test/embl/eem_ov.seq'
id: 2
acc: 2
filename: '/homes/user/test/embl/eem_ro.seq'
id: 3
acc: 3
filename: '/homes/user/test/embl/eem_vi.seq'
id: 1
acc: 2
Index acc: maxlen 8 items 48
Total 9 files 40 entries (0 duplicates)
|
dbigcg creates four index files. All are binary but with a simple format.
Having created the EMBOSS indices for this file, a database can then be defined in the file emboss.defaults as something like:
DB embl [ type: N format: embl method: gcg directory: /data/gcg/gcgembl ]
| Program name | Description |
|---|---|
| dbiblast | Index a BLAST database |
| dbifasta | Index a fasta file database |
| dbiflat | Index a flat file database |
| dbxfasta | Index a fasta file database using b+tree indices |
| dbxflat | Index a flat file database using b+tree indices |
| dbxgcg | Index a GCG formatted database using b+tree indices |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.