| ID-translation {TCGAutils} | R Documentation |
These functions allow the user to enter a character vector of
identifiers and use the GDC API to translate from TCGA barcodes to
Universally Unique Identifiers (UUID) and vice versa. These relationships
are not one-to-one. Therefore, a data.frame is returned for all
inputs. The UUID to TCGA barcode translation only applies to file and case
UUIDs. Two-way UUID translation is available from 'file_id' to 'case_id'
and vice versa. Please double check any results before using these
features for analysis. Case / submitter identifiers are translated by
default, see the id_type argument for details. All identifiers are
converted to lower case.
UUIDtoBarcode(id_vector, id_type = c("case_id", "file_id"),
end_point = "participant", legacy = FALSE)
UUIDtoUUID(id_vector, to_type = c("case_id", "file_id"),
legacy = FALSE)
barcodeToUUID(barcodes, id_type = c("case_id", "file_id"),
legacy = FALSE)
filenameToBarcode(filenames, legacy = FALSE)
id_vector |
A |
id_type |
Either |
end_point |
The cutoff point of the barcode that should be returned,
only applies to |
legacy |
(logical default FALSE) whether to search the legacy archives |
to_type |
The desired UUID type to obtain, can either be "case_id" or "file_id" |
barcodes |
A |
filenames |
A |
The end_point options reflect endpoints in the Genomic Data Commons
API. These are summarized as follows:
participant: This default snippet of information includes project, tissue source site (TSS), and participant number (barcode format: TCGA-XX-XXXX)
sample: This adds the sample information to the participant barcode (TCGA-XX-XXXX-11X)
portion, analyte: Either of these options adds the portion and analyte information to the sample barcode (TCGA-XX-XXXX-11X-01X)
plate, center: Additional plate and center information is returned, i.e., the full barcode (TCGA-XX-XXXX-11X-01X-XXXX-XX)
Only these keywords need to be used to target the specific barcode endpoint.
These endpoints only apply to "file_id" type translations to TCGA barcodes
(see id_type argument).
A data.frame of TCGA barcode identifiers and UUIDs
Sean Davis, M. Ramos
## Translate UUIDs >> TCGA Barcode
uuids <- c("0001801b-54b0-4551-8d7a-d66fb59429bf",
"002c67f2-ff52-4246-9d65-a3f69df6789e",
"003143c8-bbbf-46b9-a96f-f58530f4bb82")
UUIDtoBarcode(uuids, id_type = "file_id", end_point = "sample")
UUIDtoBarcode("ae55b2d3-62a1-419e-9f9a-5ddfac356db4", id_type = "case_id")
## Translate file UUIDs >> case UUIDs
uuids <- c("0001801b-54b0-4551-8d7a-d66fb59429bf",
"002c67f2-ff52-4246-9d65-a3f69df6789e",
"003143c8-bbbf-46b9-a96f-f58530f4bb82")
UUIDtoUUID(uuids)
## Translate TCGA Barcode >> UUIDs
fullBarcodes <- c("TCGA-B0-5117-11A-01D-1421-08",
"TCGA-B0-5094-11A-01D-1421-08",
"TCGA-E9-A295-10A-01D-A16D-09")
sample_ids <- TCGAbarcode(fullBarcodes, sample = TRUE)
barcodeToUUID(sample_ids)
participant_ids <- c("TCGA-CK-4948", "TCGA-D1-A17N",
"TCGA-4V-A9QX", "TCGA-4V-A9QM")
barcodeToUUID(participant_ids)
library(GenomicDataCommons)
fquery <- files() %>%
filter(~ cases.project.project_id == "TCGA-COAD" &
data_category == "Copy Number Variation" &
data_type == "Copy Number Segment")
fnames <- results(fquery)$file_name[1:6]
filenameToBarcode(fnames)