Daniel Dvorkin (danielmedic) wrote in stat_geeks,
Daniel Dvorkin
danielmedic
stat_geeks

yet another "how to do this in R" question

Okay, here's what I'm trying to do. This is a genomics-related question, but it's a general problem. I have a two-column matrix or data frame representing gene start and stop positions. I want a vector of length n, where n is the total number of base pair positions on the chromosome, where each element has a value of 1 if the position is in a gene and 0 otherwise. For a very simple example, suppose n = 10 (surely this organism has the smallest genome ever!) and I have the following data frame "gene":

start stop
2 5
7 8

and I want the vector "isGene":

0111101100

Now, the mindlessly inefficient way to this would be:

isGene = rep(0, n)
for(i in 1:nrow(gene))
{
geneRow = gene[i,]
isGene[geneRow$start:geneRow$stop] = 1
}

but surely there must be a better way? I'm dealing with real chromosomes here, not toy examples, and this kind of clumsy iteration eats up a lot of computing cycles.

(x-posted to statisticians)
  • Post a new comment

    Error

    default userpic
  • 0 comments