Ulrich, L.E. and Zhulin, I.B. Bioinformatics (2014)

SeqDepot

Introduction

Screen scraping

Many database projects consist of at least two major layers: 1) the underlying database itself and 2) a custom front end with very specific views of the database contents. These views typically decorate the raw data within an HTML page for presentation purposes. Often this is the only publicly accessible representation of the data. For example, the MiST database website displays a large amount of annotation details about individual genes and their corresponding protein sequence; however, to use this information in other contexts, users must resort to "screen scraping" scripts that parse and extract the desired information from the HTML. There is no direct means of retrieving this data.

While this works to varying degrees, "screen scraping" suffers from several problems:

  • Brittle: these scripts will no longer function properly whenever the website is updated because they rely on a particular html document structure for data extraction.
  • Error-prone: parsing HTML is complicated, requires careful attention to detail, and may change depending on various rendering conditions, etc.
  • Missing data: many web pages only display a portion of the actual data in the database. Data not exposed on the website is simply not accessible.
  • Formatting issues:: reshaping data for presentation purposes may make its HTML form unsuitable for downstream use (e.g. precision loss due to numerical rounding).
  • Inefficient: in addition to fetching the raw data, the web server transforms and encapsulates this content in HTML.

REST: the way forward

More current databases have begun to provide more direct access via a RESTful interface, which provides a straightforward system for communication between a client and a server. Such interfaces utilize a simple and reusable vocabulary to make interacting with "resources" straightforward and easy (compare with the relatively encumbered SOAP protocol). The REST architecture and its philosophies have been thoroughly documented by Roy Fielding in his doctoral thesis and on Wikipedia.

Structure of a REST URI

Clients perform actions by requesting a resource identified by a given URI with an HTTP method. For example:

Example REST URI
  1. GET: HTTP method; other common methods include POST, DELETE, PUT
  2. http://seqdepot.net/api/v1: Base API URL, v1 indicates API version 1
  3. aseqs: references the amino acid sequences resource
  4. lXFJ1g8Tyb15AP8olkZ1eQ: identifier for the requested aseq
  5. .json: data representation (other formats may also be supported, e.g. xml or png)
  6. fields=l,s: additional parameters for tweaking the response

Simple REST example

The figure below illustrates many of the basic concepts of a simple REST request and response:

Example REST URL
  1. Acting as the client, the web browser makes a GET HTTP request to seqdepot.net (the server) for the aseq resource identified by lXFJ1g8Tyb15AP8olkZ1eQ.

  2. The server responds with a successful HTTP status code (200), a Content-Type header indicating that the response body is encoded in JSON, and finally the JSON-encoded representation of this aseq.

Note: The client may be any program capable of sending HTTP requests and is not restricted to web browsers.

Alernate representations

Different representations of resources may be requested by changing the extension appended to the id. The following returns a graphical representation of the aseq:

A few notable points about RESTful interfaces:

  • Typically implemented over HTTP and leverages HTTP's existing verbs (e.g. GET, POST, etc.), status codes, and so forth. The SeqDepot REST interface solely utilizes HTTP.
  • Resources are user-defined entities that reflect a meaningful asset. For example, the aseqs resource represents specific amino acid sequences and associated features.
  • Resources are accessed via URI's.

SeqDepot is solely accessible via its REST API

All access to SeqDepot must use the REST interface documented here. All database interaction from this HTML5 website solely utilizes the RESTful interface. Minor processing is done on the client side before sending requests to the server. The results are then visualized and rendered.

To ease becoming familiar with this API, the Request / Response tab on the home page shows the specific request, the server response, and any relevant headers for each query performed via this website (see figure below).

Request / Response tab of home page

Works with any programming language

Any programming / scripting language may be used to construct queries. Simply construct the appropriate HTTP request and then consume the response. Most responses return JSON (even error messages) and since virtually every programming language has libraries for decoding JSON into native data structures, parsing is reduced to at most a few lines of code.

Aseqs

The aseqs resource aggregates basic attributes (e.g. length), precomputed attributes (e.g. transmembrane regions), and cross-references for unique amino acid sequences.

Schema

Field Type Description
id string 22-character primary key identifier composed of alphanumeric characters (A-Z, a-z, 0-9), underscores (_), and hyphens (-).
_s string Encodes the status of which computational tools have analyzed this amino acid sequence. The length of this field equals the number of tools in SeqDepot with each character indicating the status of a different tool. The nth character corresponds to the nth tool element in the array returned when querying the tools resource. Possible values for each status character include: - (not yet analyzed), d (analyzed, but no results predicted), or T (analyzed and has at least one result).
l integer Sequence length
s string Ungapped, upper-case full-length sequence
t hash Contains all pre-computed tool data using a key-value dictionary structure. Each key corresponds to a tools.id. Most tools typically have multiple results per sequence. Thus, associated values typically consist of an array of results, where each result is usually an array of tool-dependent, strings and/or numbers. For example, a single sequence often contains multiple Pfam domain matches and each match contains many fields such as name, score, E-value, and so forth.
Key Value
agfam1 Agile Genomics family models - version 1.0
array of results; each result is an array of the following fields:
name, start, stop, extent, hmm_start, hmm_stop, hmm_extent, score, evalue
coils Predicts coiled coil regions in protein sequences (Russell and Lupas, 1999)
array of results; each result is an array of the following fields:
start, stop
das DAS-TMfilter version 5.0: Predict transmembrane regions
array of results; each result is an array of the following fields:
start, stop, peak, peak_score, evalue
ecf Predict Extracytoplasmic Function (ECF) domains
array of results; each result is an array of the following fields:
name, start, stop, extent, hmm_start, hmm_stop, hmm_extent, score, evalue
gene3d Structures assigned to genomes
array of results; each result is an array of the following fields:
code, description, start, stop, evalue
hamap High-quality Automated and Manual Annotation of Proteins
array of results; each result is an array of the following fields:
rule, description, start, stop, evalue
panther Protein ANalysis THrough Evolutionary Relationships
array of results; each result is an array of the following fields:
accession, start, stop, evalue
patscan ProSite motif patterns
array of results; each result is an array of the following fields:
accession, description, start, stop
pfam26 Pfam-A hidden Markov model database version 26.0 (November 2011, 13672 families)
array of results; each result is an array of the following fields:
name, start, stop, extent, bias, hmm_start, hmm_stop, hmm_extent, env_start, env_stop, env_extent, score, c_evalue, i_evalue, acc
pir Protein Information Resource HMMs
array of results; each result is an array of the following fields:
accession, description, start, stop, evalue
prints Compendium of protein fingerprints
array of results; each result is an array of the following fields:
accession, description, start, stop, evalue
proscan ProSite profile scan
array of results; each result is an array of the following fields:
accession, description, start, stop, evalue
segs Predicts regions of low-complexity
array of results; each result is an array of the following fields:
start, stop
signalp Signal peptide prediction
hash; the keys - e, gn, and gp - are numeric values corresponding to the signal peptide cleavage site for eukaryotic, gram negative, and gram-positive species, respectively.
smart Simple Modular Architecture Research Tool
array of results; each result is an array of the following fields:
accession, name, start, stop, evalue
superfam Database of structural and functional annotation for all proteins and genomes
array of results; each result is an array of the following fields:
accession, description, start, stop, evalue
targetp Predicts subcellular location of eukaryotic proteins
hash; the keys - p and np - are numeric values corresponding to the plant and non-plant predictions, respectively.
tigrfam HMM resource to support automated annotation of proteins
array of results; each result is an array of the following fields:
accession, name, start, stop, evalue
tmhmm Prediction of transmembrane helices in proteins
array of results; each result is an array of the following fields:
start, stop
pfam27 Pfam-A hidden Markov model database version 27.0 (March 2013, 14831 families)
array of results; each result is an array of the following fields:
name, start, stop, extent, bias, hmm_start, hmm_stop, hmm_extent, env_start, env_stop, env_extent, score, c_evalue, i_evalue, acc
tigrfam14 TIGRFAM 14.0 HMM resource to support automated annotation of proteins
array of results; each result is an array of the following fields:
name, start, stop, extent, bias, hmm_start, hmm_stop, hmm_extent, env_start, env_stop, env_extent, score, c_evalue, i_evalue, acc
pfam28 Pfam-A hidden Markov model database version 28.0 (May 2015, 16230 families)
array of results; each result is an array of the following fields:
name, start, stop, extent, bias, hmm_start, hmm_stop, hmm_extent, env_start, env_stop, env_extent, score, c_evalue, i_evalue, acc
tigrfam15 TIGRFAM 15.0 HMM resource to support automated annotation of proteins
array of results; each result is an array of the following fields:
name, start, stop, extent, bias, hmm_start, hmm_stop, hmm_extent, env_start, env_stop, env_extent, score, c_evalue, i_evalue, acc
pfam29 Pfam-A hidden Markov model database version 29.0 (November 2015, 16295 families)
array of results; each result is an array of the following fields:
name, start, stop, extent, bias, hmm_start, hmm_stop, hmm_extent, env_start, env_stop, env_extent, score, c_evalue, i_evalue, acc
x hash Key-value dictionary structure containing cross-references to this amino acid sequence.
Key Value
gi array of Genbank identifiers (GI)
pdb array of PDB identifiers
uni array of UniProt identifiers

REST URIs

[GET] http://seqdepot.net/api/v1/aseqs(/{id}(.json|.png|.svg))(?{parameters})*
General form.
* Portions enclosed in parentheses are optional; elements separated by pipe symbols (|) indicate a boolean OR.
[GET] http://seqdepot.net/api/v1/aseqs/{aseq-id}(.json)
Fetch all fields for the aseq identified by {aseq-id} as a JSON encoded hash. If {aseq-id} is not found, the server will respond with a status code of 404 and an error message encoded in JSON.
[GET] http://seqdepot.net/api/v1/aseqs/{aseq-id}.png
Visualizes the domain architecture for the aseq identified by {aseq-id} as a PNG image. If {aseq-id} is not found, the server will respond with a status code of 404 and an error message encoded in JSON.
[GET] http://seqdepot.net/api/v1/aseqs/{aseq-id}.svg
Visualizes the domain architecture for the aseq identified by {aseq-id} as a SVG image. If {aseq-id} is not found, the server will respond with a status code of 404 and an error message encoded in JSON.
Queries may also be performed with supported external database identifiers (GI, PDB, and UniProt identifiers). Using the same REST-URIs as above, replace {aseq-id} with the external database identifier and specify the relevant external database using the {type} parameter.
[GET] http://seqdepot.net/api/v1/aseqs(/{external-database-id}(.json|.png|.svg))?type={external-database-type}
General form for retrieving data for the sequence identified by {external-database-id} where {external-database-type} corresponds to the source external database (gi for Genbank, pdb for RCSB Protein Data Bank, and uni for UniProt). Alternatively, one may identify aseqs with the hexadecimal MD5 digest of its ungapped, uppercase sequence and using md5_hex for {external-database-type}.
[POST] http://seqdepot.net/api/v1/aseqs
Batch querying: submits up to 1,000 aseq queries in a single HTTP request and returns the results in tab-delimited format (with embedded JSON). More details
Note: Though not explicitly shown, {parameters} may be applied to each of the above URI's to modify the result.

Variables

{aseq-id}
22-character primary key identifier for an amino acid sequence (e.g. yg8A8H8N-4x1Ezf8WW-YbA)
{external-database-id}
GI, PDB, UniProt identifier, or hexadecimal MD5 digest
{external-database-type}
String indicating the external database for {external-database-id}. Must be one of the following: gi, pdb, uni, or md5_hex
{parameters}

The following key-value pairs (key=value) are supported:

  • type: string indicating the type of {id}. Allowed values include aseq, gi, pdb, and uni. If excluded, an aseq_id is expected.
  • fields: comma-separated list of aseq field names; indicates which fields to return. If not specified, all fields are returned. Specify subfields by appending a pipe delimited list of subfield names enclosed in parentheses to the parent field name. For example, to access pfam26 and das results, use: t(pfam26|das). Examples
  • pretty: include in the URL without a value; pretty-prints any JSON results by adding spacing and indentation

Batch query

Multiple aseq queries may be performed in a single HTTP POST request by sending each query (and any associated user data) in the request body to the URI:

http://seqdepot.net/api/v1/aseqs

Request body format

The request body must consist of one or more tab-delimited lines. Each line must begin with the identifier to be queried and may be followed by zero or more user fields (which will be returned in the response).

Note: As described above, supported identifiers include Aseq, GI, PDB, or UniProt identifiers or MD5 hexadecimal digests.
  1. If querying with identifiers other than Aseq IDs, you must specify the type of identifer with the {external-database-type} parameter (see aseqs parameters)
  2. All identifiers must be of the same type.
A maximum of 1,000 queries may be sent per request. Any query lines beyond this limit will be ignored.

Response

The server will return the results formatted in tab-delimited lines - one result line per query line and in the same order as in the request. Each line contains the following tab-delimited fields:

  1. query identifier: the identifier used to search for matches
  2. HTTP status code: an integer indicating whether the query succeeded or not. Specifically, 200 denotes that a sequence matched that identifier; 404 indicates that the identifier was not found; and 500 indicates that a server error occurred. If this value is not 200, the aseq_id and json fields will both be empty.
  3. aseq_id: Aseq ID corresponding to this sequence record.
  4. json: all requested JSON-encoded data fields (if sequence is found)
  5. user fields: all fields associated with the query identifier in the request
Note: By default all data fields are returned for each sequence. Unless all fields are necessary, it is faster and more efficient to request only those fields of interest by defining the {fields} parameter.

For detailed examples, please visit the home page, which demonstrates the batch query API documented here by using AJAX to communicate with the SeqDepot database.

Examples

Visual illustration of many common tasks

Enlarge window to increase image size

visual overview of common rest requests
Fetch all fields for an amino acid sequence

[GET] http://seqdepot.net/api/v1/aseqs/npORB8GGfXLBYjo0s3_QNQ
[GET] http://seqdepot.net/api/v1/aseqs/npORB8GGfXLBYjo0s3_QNQ.json
[GET] http://seqdepot.net/api/v1/aseqs/6980436?type=gi
[GET] http://seqdepot.net/api/v1/aseqs/6980436.json?type=gi
[GET] http://seqdepot.net/api/v1/aseqs/1drm?type=pdb
[GET] http://seqdepot.net/api/v1/aseqs/1drm.json?type=pdb
[GET] http://seqdepot.net/api/v1/aseqs/F5CGH2_9HIV1?type=uni
[GET] http://seqdepot.net/api/v1/aseqs/F5CGH2_9HIV1.json?type=uni
[GET] http://seqdepot.net/api/v1/aseqs/9e939107c1867d72c1623a34b37fd035?type=md5_hex
[GET] http://seqdepot.net/api/v1/aseqs/9e939107c1867d72c1623a34b37fd035.json?type=md5_hex

All of the above produce the same result:

{"_s":"ddddTdT-TddTddTTdTd","l":131,"s":"MRETHLRSILHTIPDAMIVIDGHGIIQLFSTAAERLFGWSELEAIGQNVNILMPEPDRSRHDSYISRYRTTSDPHIIGIGRIVTGKRRDGTTFPMHLSIGEMQSGGEPYFTGFVRDLTEHQQTQARLQELQ","t":{"gene3d":[["3.30.450.20","",2,118,2.1e-23]],"panther":[["PTHR24423",2,131,3.9e-34],["PTHR24423:SF165",2,131,3.9e-34]],"pfam26":[["PAS",5,116,"..",0.007,2,112,"..",4,117,"..",82.031,3.841e-26,1.601e-23,0.973],["PAS_9",14,119,"..",0.003,1,104,"[]",14,119,"..",52.262,1.019e-16,4.076e-14,0.922],["PAS_8",5,60,"..",0.006,2,53,"..",4,71,"..",31.386,1.114e-10,7.426e-8,0.844],["PAS_4",10,120,"..",0.009,1,108,"[.",10,122,"..",27.74,2.983e-9,1.356e-6,0.808]],"proscan":[["PS50112","PAS",2,55,16.25],["PS50113","PAC",70,129,8.727]],"smart":[["SM00091","PAS",4,70,5.9e-13]],"superfam":[["SSF55785","PYP-like sensor domain (PAS domain)",15,130,9.4e-31]],"tigrfam":[["TIGR00229","sensory_box",2,127,1.0e-37]]},"x":{"uni":["F5CGH2_9HIV1"],"pdb":["1dp6","1dp8","1dp9","1drm","1lsv","1lsw","1lsx","1lt0"],"gi":[6980436,12084284,12084285,12084286,27065445,27065447,27065448,27065451]},"id":"npORB8GGfXLBYjo0s3_QNQ"}
Same as the above except with pretty printing

[GET] http://seqdepot.net/api/v1/aseqs/npORB8GGfXLBYjo0s3_QNQ?pretty
...

{
    "_s": "ddddTdT-TddTddTTdTd",
    "l": 131,
    "s": "MRETHLRSILHTIPDAMIVIDGHGIIQLFSTAAERLFGWSELEAIGQNVNILMPEPDRSRHDSYISRYRTTSDPHIIGIGRIVTGKRRDGTTFPMHLSIGEMQSGGEPYFTGFVRDLTEHQQTQARLQELQ",
    "t": {
        "gene3d": [
            [
                "3.30.450.20",
                "",
                2,
                118,
                2.1e-23
            ]
        ],
        "panther": [
            [
                "PTHR24423",
                2,
                131,
                3.9e-34
            ],
            [
                "PTHR24423:SF165",
                2,
                131,
                3.9e-34
            ]
        ],
        "pfam26": [
            [
                "PAS",
                5,
                116,
                "..",
                0.007,
                2,
                112,
                "..",
                4,
                117,
                "..",
                82.031,
                3.841e-26,
                1.601e-23,
                0.973
            ],
            [
                "PAS_9",
                14,
                119,
                "..",
                0.003,
                1,
                104,
                "[]",
                14,
                119,
                "..",
                52.262,
                1.019e-16,
                4.076e-14,
                0.922
            ],
            [
                "PAS_8",
                5,
                60,
                "..",
                0.006,
                2,
                53,
                "..",
                4,
                71,
                "..",
                31.386,
                1.114e-10,
                7.426e-8,
                0.844
            ],
            [
                "PAS_4",
                10,
                120,
                "..",
                0.009,
                1,
                108,
                "[.",
                10,
                122,
                "..",
                27.74,
                2.983e-9,
                1.356e-6,
                0.808
            ]
        ],
        "proscan": [
            [
                "PS50112",
                "PAS",
                2,
                55,
                16.25
            ],
            [
                "PS50113",
                "PAC",
                70,
                129,
                8.727
            ]
        ],
        "smart": [
            [
                "SM00091",
                "PAS",
                4,
                70,
                5.9e-13
            ]
        ],
        "superfam": [
            [
                "SSF55785",
                "PYP-like sensor domain (PAS domain)",
                15,
                130,
                9.4e-31
            ]
        ],
        "tigrfam": [
            [
                "TIGR00229",
                "sensory_box",
                2,
                127,
                1.0e-37
            ]
        ]
    },
    "x": {
        "uni": [
            "F5CGH2_9HIV1"
        ],
        "pdb": [
            "1dp6",
            "1dp8",
            "1dp9",
            "1drm",
            "1lsv",
            "1lsw",
            "1lsx",
            "1lt0"
        ],
        "gi": [
            6980436,
            12084284,
            12084285,
            12084286,
            27065445,
            27065447,
            27065448,
            27065451
        ]
    },
    "id": "npORB8GGfXLBYjo0s3_QNQ"
}
Fetch the length, superfam, pfam26, and das fields for a specific amino acid sequence

[GET] http://seqdepot.net/api/v1/aseqs/naytI0dLM_rK2kaC1m3ZSQ?fields=t(superfam|pfam26|das),l&pretty
[GET] http://seqdepot.net/api/v1/aseqs/naytI0dLM_rK2kaC1m3ZSQ.json?fields=t(superfam|pfam26|das),l&pretty
[GET] http://seqdepot.net/api/v1/aseqs/441604264?fields=t(superfam|pfam26|das),l&type=gi&pretty
[GET] http://seqdepot.net/api/v1/aseqs/441604264.json?fields=t(superfam|pfam26|das),l&type=gi&pretty
[GET] http://seqdepot.net/api/v1/aseqs/C9R0S4_ECOD1?fields=t(superfam|pfam26|das),l&type=uni&pretty
[GET] http://seqdepot.net/api/v1/aseqs/C9R0S4_ECOD1.json?fields=t(superfam|pfam26|das),l&type=uni&pretty
[GET] http://seqdepot.net/api/v1/aseqs/9dacad23474b33facada4682d66dd949?fields=t(superfam|pfam26|das),l&type=md5_hex&pretty
[GET] http://seqdepot.net/api/v1/aseqs/9dacad23474b33facada4682d66dd949.json?fields=t(superfam|pfam26|das),l&type=md5_hex&pretty

All of the above produce the same result:

{
    "l": 894,
    "t": {
        "das": [
            [
                403,
                423,
                411,
                4.116,
                0.0006308
            ],
            [
                425,
                445,
                434,
                5.243,
                1.185e-5
            ],
            [
                448,
                464,
                456,
                3.252,
                0.01334
            ],
            [
                476,
                493,
                485,
                4.305,
                0.0003238
            ],
            [
                850,
                851,
                851,
                2.544,
                0.1621
            ]
        ],
        "pfam26": [
            [
                "KdpD",
                21,
                230,
                "..",
                0.006,
                2,
                211,
                ".]",
                20,
                230,
                "..",
                329.179,
                4.617e-103,
                4.617e-99,
                0.995
            ],
            [
                "HATPase_c",
                778,
                881,
                "..",
                0.003,
                6,
                110,
                "..",
                774,
                882,
                "..",
                84.573,
                1.307e-26,
                2.421e-24,
                0.965
            ],
            [
                "DUF4118",
                407,
                499,
                "..",
                9.917,
                5,
                103,
                "..",
                402,
                501,
                "..",
                54.331,
                1.598e-18,
                7.991e-15,
                0.836
            ],
            [
                "HisKA",
                664,
                730,
                "..",
                1.214,
                2,
                68,
                ".]",
                663,
                730,
                "..",
                43.184,
                6.792e-14,
                1.887e-11,
                0.878
            ],
            [
                "GAF_3",
                528,
                644,
                "..",
                0.002,
                2,
                129,
                ".]",
                527,
                644,
                "..",
                38.631,
                4.443e-13,
                6.347e-10,
                0.855
            ],
            [
                "Usp",
                251,
                365,
                "..",
                0.429,
                3,
                133,
                "..",
                249,
                373,
                "..",
                21.742,
                1.263e-7,
                0.0001149,
                0.847
            ]
        ],
        "superfam": [
            [
                "SSF52402",
                "Adenine nucleotide alpha hydrolases-like",
                248,
                378,
                5.2e-6
            ],
            [
                "SSF55781",
                "GAF domain-like",
                508,
                659,
                2.8e-6
            ],
            [
                "SSF47384",
                "Homodimeric domain of signal transducing histidine kinase",
                645,
                732,
                1.4e-15
            ],
            [
                "SSF55874",
                "ATPase domain of HSP90 chaperone\/DNA topoisomerase II\/histidine kinase",
                719,
                893,
                1.5e-41
            ]
        ]
    },
    "id": "naytI0dLM_rK2kaC1m3ZSQ"
}
Visualize the domain architecture (as PNG image) for a specific amino acid sequence

[GET] http://seqdepot.net/api/v1/aseqs/fiUs-3vh34LxGVAdbheipg.png
[GET] http://seqdepot.net/api/v1/aseqs/CHEA_ECOLI.png?type=uni
[GET] http://seqdepot.net/api/v1/aseqs/16129840.png?type=gi
[GET] http://seqdepot.net/api/v1/aseqs/7e252cfb7be1df82f119501d6e17a2a6.png?type=md5_hex

All of the above produce the same result:

Visualize the domain architecture (as SVG image) for a specific amino acid sequence

Exactly like the above URLs, except replace the .png extension with .svg:

[GET] http://seqdepot.net/api/v1/aseqs/fiUs-3vh34LxGVAdbheipg.svg
[GET] http://seqdepot.net/api/v1/aseqs/CHEA_ECOLI.svg?type=uni
[GET] http://seqdepot.net/api/v1/aseqs/16129840.svg?type=gi
[GET] http://seqdepot.net/api/v1/aseqs/7e252cfb7be1df82f119501d6e17a2a6.svg?type=md5_hex


To dynamically interact and perform many other queries, check out the home page interface.

Tools

The tools resource provides metadata (e.g. field names and description) describing the various computational tools used to analyze each Aseq within SeqDepot. The predictive results of each tool are associated with each Aseq in its t field.

Exposing the available tools within the database via REST makes it possible to programmatically determine which tools are present within SeqDepot, predicted result fieldnames, and descriptions.

Schema

Field Type Description
id string Primary key identifier; also used as the key name for any results located in the Aseqs t field hash
d string Short description documenting this tool
f Array of strings Column names for each row of data in the Aseqs t field
h string Human friendly alias for this tool
hf Array of strings Human friendly column names; same number of entries as the f field

REST URIs

[GET] http://seqdepot.net/api/v1/tools(/{tool-id}(.json))(?{parameters})*
General form.
* Portions enclosed in parentheses are optional.
[GET] http://seqdepot.net/api/v1/tools(.json)
Returns all tools and their associated fields as a JSON encoded array of hashes.
[GET] http://seqdepot.net/api/v1/tools/{tool-id}(.json)
Fetch all fields for a specific tool identified by {tool-id} as a JSON encoded hash. If {tool-id} is not found, the server will respond with a status code of 404 and an error message encoded in JSON.
Note: Though not explicitly shown, {parameters} may be applied to each of the above URI's to modify the result.

Variables

{tool-id}
String identifier (e.g. pfam26)
{parameters}

The following key-value pairs (key=value) are supported:

  • fields: comma-separated list of tool field names; indicates which fields to return. If not specified, all fields are returned.
  • pretty: include in the URL without a value; pretty-prints the JSON results by adding spacing and indentation

Examples

Fetch all tools in SeqDepot in JSON

[GET] http://seqdepot.net/api/v1/tools
[GET] http://seqdepot.net/api/v1/tools.json

{"results":[{"d":"Agile Genomics family models - version 1.0","f":["name","start","stop","extent","hmm_start","hmm_stop","hmm_extent","score","evalue"],"h":"AGfam 1","hf":["Name","Start","Stop","Extent","HMM start","HMM stop","HMM extent","Score","E-value"],"id":"agfam1"},{"d":"Predicts coiled coil regions in protein sequences (Russell and Lupas, 1999)","f":["start","stop"],"h":"Coiled-coils","hf":["Start","Stop"],"id":"coils"},{"d":"DAS-TMfilter version 5.0: Predict transmembrane regions","f":["start","stop","peak","peak_score","evalue"],"h":"Transmembrane","hf":["Start","Stop","Peak","Peak Score","E-value"],"id":"das"},{"d":"Predict Extracytoplasmic Function (ECF) domains","f":["name","start","stop","extent","hmm_start","hmm_stop","hmm_extent","score","evalue"],"h":"ECF","hf":["Name","Start","Stop","Extent","HMM start","HMM stop","HMM extent","Score","E-value"],"id":"ecf"},{"d":"Structures assigned to genomes","f":["code","description","start","stop","evalue"],"h":"Gene 3D","hf":["Code","Description","Start","Stop","E-value"],"id":"gene3d"},{"d":"High-quality Automated and Manual Annotation of Proteins","f":["rule","description","start","stop","evalue"],"h":"HAMAP","hf":["Rule","Description","Start","Stop","E-value"],"id":"hamap"},{"d":"Protein ANalysis THrough Evolutionary Relationships","f":["accession","start","stop","evalue"],"h":"Panther","hf":["Accession","Start","Stop","E-value"],"id":"panther"},{"d":"ProSite motif patterns","f":["accession","description","start","stop"],"h":"Patterns","hf":["Accession","Description","Start","Stop"],"id":"patscan"},{"d":"Pfam-A hidden Markov model database version 26.0 (November 2011, 13672 families)","f":["name","start","stop","extent","bias","hmm_start","hmm_stop","hmm_extent","env_start","env_stop","env_extent","score","c_evalue","i_evalue","acc"],"h":"Pfam 26","hf":["Name","Start","Stop","Extent","Bias","HMM start","HMM stop","HMM extent","Env start","Env stop","Env extent","Score","Cond. E-value","Ind. E-value","Acc"],"id":"pfam26"},{"d":"Protein Information Resource HMMs","f":["accession","description","start","stop","evalue"],"h":"PIR","hf":["Accession","Description","Start","Stop","E-value"],"id":"pir"},{"d":"Compendium of protein fingerprints","f":["accession","description","start","stop","evalue"],"h":"PRINTS","hf":["Accession","Description","Start","Stop","E-value"],"id":"prints"},{"d":"ProSite profile scan","f":["accession","description","start","stop","evalue"],"h":"Profiles","hf":["Accession","Description","Start","Stop","E-value"],"id":"proscan"},{"d":"Predicts regions of low-complexity","f":["start","stop"],"h":"Low-complexity segments","hf":["Start","Stop"],"id":"segs"},{"d":"Signal peptide prediction","f":["gp","gn","e"],"h":"SignalP","hf":["Gram+","Gram-","Eukaryotic"],"id":"signalp"},{"d":"Simple Modular Architecture Research Tool","f":["accession","name","start","stop","evalue"],"h":"SMART","hf":["Accession","Name","Start","Stop","E-value"],"id":"smart"},{"d":"Database of structural and functional annotation for all proteins and genomes","f":["accession","description","start","stop","evalue"],"h":"SuperFamily","hf":["Accession","Description","Start","Stop","E-value"],"id":"superfam"},{"d":"Predicts subcellular location of eukaryotic proteins","f":["p","np"],"h":"TargetP","hf":["Plant","Non-plant"],"id":"targetp"},{"d":"HMM resource to support automated annotation of proteins","f":["accession","name","start","stop","evalue"],"h":"TIGRFAM","hf":["Accession","Name","Start","Stop","E-value"],"id":"tigrfam"},{"d":"Prediction of transmembrane helices in proteins","f":["start","stop"],"h":"TM-HMM","hf":["Start","Stop"],"id":"tmhmm"}],"count":19}
Fetch all tools and pretty-print

[GET] http://seqdepot.net/api/v1/tools?pretty

{"results": [
    {
        "d": "Agile Genomics family models - version 1.0",
        "f": [
            "name",
            "start",
            "stop",
            "extent",
            "hmm_start",
            "hmm_stop",
            "hmm_extent",
            "score",
            "evalue"
        ],
        "h": "AGfam 1",
        "hf": [
            "Name",
            "Start",
            "Stop",
            "Extent",
            "HMM start",
            "HMM stop",
            "HMM extent",
            "Score",
            "E-value"
        ],
        "id": "agfam1"
    },
    {
        "d": "Predicts coiled coil regions in protein sequences (Russell and Lupas, 1999)",
        "f": [
            "start",
            "stop"
        ],
        "h": "Coiled-coils",
        "hf": [
            "Start",
            "Stop"
        ],
        "id": "coils"
    },
    {
        "d": "DAS-TMfilter version 5.0: Predict transmembrane regions",
        "f": [
            "start",
            "stop",
            "peak",
            "peak_score",
            "evalue"
        ],
        "h": "Transmembrane",
        "hf": [
            "Start",
            "Stop",
            "Peak",
            "Peak Score",
            "E-value"
        ],
        "id": "das"
    },
    {
        "d": "Predict Extracytoplasmic Function (ECF) domains",
        "f": [
            "name",
            "start",
            "stop",
            "extent",
            "hmm_start",
            "hmm_stop",
            "hmm_extent",
            "score",
            "evalue"
        ],
        "h": "ECF",
        "hf": [
            "Name",
            "Start",
            "Stop",
            "Extent",
            "HMM start",
            "HMM stop",
            "HMM extent",
            "Score",
            "E-value"
        ],
        "id": "ecf"
    },
    {
        "d": "Structures assigned to genomes",
        "f": [
            "code",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "h": "Gene 3D",
        "hf": [
            "Code",
            "Description",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "gene3d"
    },
    {
        "d": "High-quality Automated and Manual Annotation of Proteins",
        "f": [
            "rule",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "h": "HAMAP",
        "hf": [
            "Rule",
            "Description",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "hamap"
    },
    {
        "d": "Protein ANalysis THrough Evolutionary Relationships",
        "f": [
            "accession",
            "start",
            "stop",
            "evalue"
        ],
        "h": "Panther",
        "hf": [
            "Accession",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "panther"
    },
    {
        "d": "ProSite motif patterns",
        "f": [
            "accession",
            "description",
            "start",
            "stop"
        ],
        "h": "Patterns",
        "hf": [
            "Accession",
            "Description",
            "Start",
            "Stop"
        ],
        "id": "patscan"
    },
    {
        "d": "Pfam-A hidden Markov model database version 26.0 (November 2011, 13672 families)",
        "f": [
            "name",
            "start",
            "stop",
            "extent",
            "bias",
            "hmm_start",
            "hmm_stop",
            "hmm_extent",
            "env_start",
            "env_stop",
            "env_extent",
            "score",
            "c_evalue",
            "i_evalue",
            "acc"
        ],
        "h": "Pfam 26",
        "hf": [
            "Name",
            "Start",
            "Stop",
            "Extent",
            "Bias",
            "HMM start",
            "HMM stop",
            "HMM extent",
            "Env start",
            "Env stop",
            "Env extent",
            "Score",
            "Cond. E-value",
            "Ind. E-value",
            "Acc"
        ],
        "id": "pfam26"
    },
    {
        "d": "Protein Information Resource HMMs",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "h": "PIR",
        "hf": [
            "Accession",
            "Description",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "pir"
    },
    {
        "d": "Compendium of protein fingerprints",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "h": "PRINTS",
        "hf": [
            "Accession",
            "Description",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "prints"
    },
    {
        "d": "ProSite profile scan",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "h": "Profiles",
        "hf": [
            "Accession",
            "Description",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "proscan"
    },
    {
        "d": "Predicts regions of low-complexity",
        "f": [
            "start",
            "stop"
        ],
        "h": "Low-complexity segments",
        "hf": [
            "Start",
            "Stop"
        ],
        "id": "segs"
    },
    {
        "d": "Signal peptide prediction",
        "f": [
            "gp",
            "gn",
            "e"
        ],
        "h": "SignalP",
        "hf": [
            "Gram+",
            "Gram-",
            "Eukaryotic"
        ],
        "id": "signalp"
    },
    {
        "d": "Simple Modular Architecture Research Tool",
        "f": [
            "accession",
            "name",
            "start",
            "stop",
            "evalue"
        ],
        "h": "SMART",
        "hf": [
            "Accession",
            "Name",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "smart"
    },
    {
        "d": "Database of structural and functional annotation for all proteins and genomes",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "h": "SuperFamily",
        "hf": [
            "Accession",
            "Description",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "superfam"
    },
    {
        "d": "Predicts subcellular location of eukaryotic proteins",
        "f": [
            "p",
            "np"
        ],
        "h": "TargetP",
        "hf": [
            "Plant",
            "Non-plant"
        ],
        "id": "targetp"
    },
    {
        "d": "HMM resource to support automated annotation of proteins",
        "f": [
            "accession",
            "name",
            "start",
            "stop",
            "evalue"
        ],
        "h": "TIGRFAM",
        "hf": [
            "Accession",
            "Name",
            "Start",
            "Stop",
            "E-value"
        ],
        "id": "tigrfam"
    },
    {
        "d": "Prediction of transmembrane helices in proteins",
        "f": [
            "start",
            "stop"
        ],
        "h": "TM-HMM",
        "hf": [
            "Start",
            "Stop"
        ],
        "id": "tmhmm"
    }
],
"count": 19
}
Fetch the description (d) and field names (f) for all tools

[GET] http://seqdepot.net/api/v1/tools?fields=d,f&pretty

{"results": [
    {
        "d": "Agile Genomics family models - version 1.0",
        "f": [
            "name",
            "start",
            "stop",
            "extent",
            "hmm_start",
            "hmm_stop",
            "hmm_extent",
            "score",
            "evalue"
        ],
        "id": "agfam1"
    },
    {
        "d": "Predicts coiled coil regions in protein sequences (Russell and Lupas, 1999)",
        "f": [
            "start",
            "stop"
        ],
        "id": "coils"
    },
    {
        "d": "DAS-TMfilter version 5.0: Predict transmembrane regions",
        "f": [
            "start",
            "stop",
            "peak",
            "peak_score",
            "evalue"
        ],
        "id": "das"
    },
    {
        "d": "Predict Extracytoplasmic Function (ECF) domains",
        "f": [
            "name",
            "start",
            "stop",
            "extent",
            "hmm_start",
            "hmm_stop",
            "hmm_extent",
            "score",
            "evalue"
        ],
        "id": "ecf"
    },
    {
        "d": "Structures assigned to genomes",
        "f": [
            "code",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "id": "gene3d"
    },
    {
        "d": "High-quality Automated and Manual Annotation of Proteins",
        "f": [
            "rule",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "id": "hamap"
    },
    {
        "d": "Protein ANalysis THrough Evolutionary Relationships",
        "f": [
            "accession",
            "start",
            "stop",
            "evalue"
        ],
        "id": "panther"
    },
    {
        "d": "ProSite motif patterns",
        "f": [
            "accession",
            "description",
            "start",
            "stop"
        ],
        "id": "patscan"
    },
    {
        "d": "Pfam-A hidden Markov model database version 26.0 (November 2011, 13672 families)",
        "f": [
            "name",
            "start",
            "stop",
            "extent",
            "bias",
            "hmm_start",
            "hmm_stop",
            "hmm_extent",
            "env_start",
            "env_stop",
            "env_extent",
            "score",
            "c_evalue",
            "i_evalue",
            "acc"
        ],
        "id": "pfam26"
    },
    {
        "d": "Protein Information Resource HMMs",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "id": "pir"
    },
    {
        "d": "Compendium of protein fingerprints",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "id": "prints"
    },
    {
        "d": "ProSite profile scan",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "id": "proscan"
    },
    {
        "d": "Predicts regions of low-complexity",
        "f": [
            "start",
            "stop"
        ],
        "id": "segs"
    },
    {
        "d": "Signal peptide prediction",
        "f": [
            "gp",
            "gn",
            "e"
        ],
        "id": "signalp"
    },
    {
        "d": "Simple Modular Architecture Research Tool",
        "f": [
            "accession",
            "name",
            "start",
            "stop",
            "evalue"
        ],
        "id": "smart"
    },
    {
        "d": "Database of structural and functional annotation for all proteins and genomes",
        "f": [
            "accession",
            "description",
            "start",
            "stop",
            "evalue"
        ],
        "id": "superfam"
    },
    {
        "d": "Predicts subcellular location of eukaryotic proteins",
        "f": [
            "p",
            "np"
        ],
        "id": "targetp"
    },
    {
        "d": "HMM resource to support automated annotation of proteins",
        "f": [
            "accession",
            "name",
            "start",
            "stop",
            "evalue"
        ],
        "id": "tigrfam"
    },
    {
        "d": "Prediction of transmembrane helices in proteins",
        "f": [
            "start",
            "stop"
        ],
        "id": "tmhmm"
    }
],
"count": 19
}
Fetch all details about Gene3D

[GET] http://seqdepot.net/api/v1/tools/gene3d?pretty

{
    "d": "Structures assigned to genomes",
    "f": [
        "code",
        "description",
        "start",
        "stop",
        "evalue"
    ],
    "h": "Gene 3D",
    "hf": [
        "Code",
        "Description",
        "Start",
        "Stop",
        "E-value"
    ],
    "id": "gene3d"
}
Fetch the description (d) and human-friendly column names (hf) for SMART

[GET] http://seqdepot.net/api/v1/tools/smart?pretty&fields=d,hf

{
    "d": "Simple Modular Architecture Research Tool",
    "hf": [
        "accession",
        "name",
        "start",
        "stop",
        "evalue"
    ],
    "id": "smart"
}