Daniel Dvorkin (danielmedic) wrote in stat_geeks,
Daniel Dvorkin

yet another "how to do this in R" question

Okay, here's what I'm trying to do. This is a genomics-related question, but it's a general problem. I have a two-column matrix or data frame representing gene start and stop positions. I want a vector of length n, where n is the total number of base pair positions on the chromosome, where each element has a value of 1 if the position is in a gene and 0 otherwise. For a very simple example, suppose n = 10 (surely this organism has the smallest genome ever!) and I have the following data frame "gene":

start stop
2 5
7 8

and I want the vector "isGene":


Now, the mindlessly inefficient way to this would be:

isGene = rep(0, n)
for(i in 1:nrow(gene))
geneRow = gene[i,]
isGene[geneRow$start:geneRow$stop] = 1

but surely there must be a better way? I'm dealing with real chromosomes here, not toy examples, and this kind of clumsy iteration eats up a lot of computing cycles.

(x-posted to statisticians)
  • Post a new comment


    default userpic