XStringSet-io           package:Biostrings           R Documentation

_R_e_a_d/_w_r_i_t_e _a_n _X_S_t_r_i_n_g_S_e_t _o_r _X_S_t_r_i_n_g_V_i_e_w_s _o_b_j_e_c_t _f_r_o_m/_t_o _a _f_i_l_e

_D_e_s_c_r_i_p_t_i_o_n:

     Functions to read/write an XStringSet or XStringViews object
     from/to a file.

_U_s_a_g_e:

       ## XStringSet object:
       read.BStringSet(file, format)
       read.DNAStringSet(file, format)
       read.RNAStringSet(file, format)
       read.AAStringSet(file, format)
       write.XStringSet(x, file="", append=FALSE, format, width=80)

       ## XStringViews object:
       read.XStringViews(file, format, subjectClass, collapse="")
       write.XStringViews(x, file="", append=FALSE, format, width=80)

       ## FASTQ utilities:
       fastq.geometry(file)

       ## Some related helper functions:
       FASTArecordsToCharacter(FASTArecs, use.names=TRUE)
       CharacterToFASTArecords(x)
       FASTArecordsToXStringViews(FASTArecs, subjectClass, collapse="")
       XStringSetToFASTArecords(x)

_A_r_g_u_m_e_n_t_s:

    file: A character vector with no NAs. If '""' (the default for
          'write.XStringSet' and 'write.XStringViews'), then the
          functions write to the standard output connection (the
          console) unless redirected by 'sink'. 

  format: Either '"fasta"' or '"fastq"'. Note that 'write.XStringSet'
          and 'write.XStringViews' only support '"fasta"' for now. 

       x: For 'write.XStringSet' and 'write.XStringViews', the object
          to write to 'file'. For 'CharacterToFASTArecords', the
          (possibly named) character vector to be converted to a list
          of FASTA records as one returned by 'readFASTA'. For
          'XStringSetToFASTArecords', the XStringSet object to be
          converted to a list of FASTA records as one returned by
          'readFASTA'. 

  append: 'TRUE' or 'FALSE'. If 'TRUE' output will be appended to
          'file'; otherwise, it will overwrite the contents of 'file'.
          See '?cat' for the details. 

   width: Only relevant if 'format' is '"fasta"'. The maximum number of
          letters per line of sequence. 

subjectClass: The class to be given to the subject of the XStringViews
          object created and returned by the function. Must be the name
          of one of the direct XString subclasses i.e. '"BString"',
          '"DNAString"', '"RNAString"' or '"AAString"'. 

collapse: An optional character string to be inserted between the views
          of the XStringViews object created and returned by the
          function. 

FASTArecs: A list of FASTA records as one returned by 'readFASTA'. 

use.names: Whether or not the description line preceding each FASTA
          records should be used to set the names of the returned
          object. 

_D_e_t_a_i_l_s:

     Only FASTA and FASTQ files are supported for now. The identifiers
     and qualities stored in the FASTQ records are ignored (only the
     sequences are returned).

     Reading functions 'read.BStringSet', 'read.DNAStringSet',
     'read.RNAStringSet', 'read.AAStringSet' and 'read.XStringViews'
     load sequences from a file into an XStringSet or XStringViews
     object. Only one FASTA file, but more than one FASTQ file, can be
     read at a time (by passing a character vector of length > 1). In
     that case, all the FASTQ files must have the same "width" (i.e.
     all their sequences must have the same length) and the sequences
     from all the files are stored in the returned object in the order
     they were read.

     The 'fastq.geometry' utility returns an integer vector describing
     the "geometry" of the FASTQ files i.e. a vector of length 2 where
     the first element is the total number of sequences contained in
     the FASTQ files and the second element the "width" of these files
     (this width is 'NA' if the files have different "widths").

     Writing functions 'write.XStringSet' and 'write.XStringViews'
     write an XStringSet or XStringViews object to a file or
     connection. They only support the FASTA format for now.

     'FASTArecordsToCharacter', 'CharacterToFASTArecords',
     'FASTArecordsToXStringViews' and 'XStringSetToFASTArecords' are
     helper functions used internally by 'write.XStringSet' and
     'read.XStringViews' for switching between different
     representations of the same object.

_S_e_e _A_l_s_o:

     'fasta.info', 'readFASTA', 'writeFASTA', XStringSet-class,
     XStringViews-class, BString-class, DNAString-class,
     RNAString-class, AAString-class

_E_x_a_m_p_l_e_s:

       ## ---------------------------------------------------------------------
       ## A. READ/WRITE FASTA FILES
       ## ---------------------------------------------------------------------
       file <- system.file("extdata", "someORF.fa", package="Biostrings")
       x <- read.DNAStringSet(file, "fasta")
       x
       write.XStringSet(x, format="fasta")  # writes to the console

       ## ---------------------------------------------------------------------
       ## B. READ FASTQ FILES
       ## ---------------------------------------------------------------------
       file <- system.file("extdata", "s_1_sequence.txt", package="Biostrings")
       fastq.geometry(file)
       read.DNAStringSet(file, "fastq")  # only the FASTQ sequences are returned
                                         # (identifiers and qualities are dropped)

       ## ---------------------------------------------------------------------
       ## C. SOME RELATED HELPER FUNCTIONS
       ## ---------------------------------------------------------------------
       ## Converting 'x'...
       ## ... to a list of FASTA records (as one returned by the "readFASTA" function)
       x1 <- XStringSetToFASTArecords(x)
       ## ... to a named character vector
       x2 <- FASTArecordsToCharacter(x1) # same as 'as.character(x)'

