rawFasta (deprecated) - Sylvain Mareschal's portfolio

Sylvain Mareschal, Ph.D.

Bioinformatics engineer

March 8, 2013 at 14:56

rawFasta (deprecated)

This package implements memory efficient storage of letter sequences (DNA, RNA, protein ...) in R, coding sequence elements on less than 8 bits (1, 2, 3, 4, 5, 6 or 8). It was mainly developed as a showpiece for R capabilities to handle binary data, as more featured classes can achieve the same purpose in Biostrings. Support to the R package hosted as CRAN was discontinued in 2015, considering Rgb now offers a better interface to FASTA files (see ?track.fasta).

The package relies on a main rawFasta interface for several classes handling sequences coded on less than 8 bits (S4 class system). Objects can be instantiated from FASTA files via the rawFasta parser, which chooses the correct implementation to be used. A common extract method is finally provided to subset the sequence by coordinates.

It is intended to store very large sequences (such as whole chromosomes) in memory, in order to subset the sequence by coordinates. The default 3-bit implementation can handle the 4 DNA letters, "N" ambiguities and "-" gaps in a memory space 3 time smaller than what can be achieved with a standard character vector.

Typical use

# Generate a dummy FASTA file
seq <- sample(c("A","C","G","T"), size=1000, replace=TRUE)
cat(">Random DNA sequence\n", file="test.fa")
write(seq, ncolumns=100, sep="", file="test.fa", append=TRUE)

# Default (DNA allowing ambiguities and gaps)
object <- rawFasta("test.fa")
print(object)
print(extract(object, 1, 10))

# Unambiguous DNA
object <- rawFasta("test.fa", alpha="ACGT")
print(object)
print(extract(object, 1, 10))

Blog posts

About tools

About informatics

About research

Papers

Thèse

Authored

As main contributor

Congresses

Attended with an oral communication

Attended with a poster

Attended

Internet ressources

Websites

Internet emissions

Contributed papers

Contributed papers

Contributed communications

Oral communications

Posters (main contributor)

Formations

Performed

Attended

Experiences

Tools

R packages

R functions

Scripts