Does anyone know a simple formula for the expected value of the logarithm of the determinant of a Wishart-distributed matrix? It seems like there ought to be one, but my search so far has been frustrating.

(x-posted to statisticians)
Wiley Genius 01

Textbook Topics

I am currently working on an introductory textbook and some additional materials (including textbook/class materials for an intermediate course). I have a question for you.

Set aside any preconceptions about what must be in an introductory (or intermediate) book on statistics. What should be in such a book? Introductory stats books still suffer from being very 1950's oriented in terms of topics, emphasis, lack of computing, etc. (If you think computing should still play no role in such a book, let me know that, too.) Statistics has changed a lot over the decades, should not the books change, too?

If you are a statistician, what is important that is missing? If you are from another field, what do you find missing or what should not be there? Please let me know if you are a statistician or not, too, that would help.

Any input would be great, and be really appreciated! Thanks!

Crossposted at statisticians

Means and Standard Deviations

Hi all,

I'm looking at intraindividual variability as a section of my dissertation, and different ways of capturing that.  I know there are a lot of ways of doing so (e.g., range, sd, signal to noise ratio, coefficient of variance, pulse, spin), but the most common always seems to be the standard deviation (especially in the area I'm examining - mood lability). 

As I've heard from other more learn-ed faculty, and through my own experience, standard deviation often tends to be confounded by the mean.  With this, I'm leaning toward signal to noise ratio or coefficient of variance.  My issue right now is that I need to show some sort of citation noting a strong correlation between means and SD's to help justify my exploration of other measures of intraindividual variation. 

I'm figuring that the skewed nature of my data also play into this (4 items coded 0-3, so summed to 0-12, still positively skewed with a preponderance of 0's).  Does anyone here know of any good citations I can use to back this up?  Online searching for me has mostly yielded "this is what the standard deviation is, and this is how you calculate it" responses.

Any help will be greatly appreciated!

How many free parameters are there in a covariance matrix?

I'm trying to calculate BIC for a fairly complicated model which includes several covariance matrices, and it just occurred to me that I don't actually know how many free parameters this represents. If I have a D-by-D covariance matrix, the naive answer is that this represents D(D+1)/2 free parameters, because the matrix must be symmetric -- but the matrix must also be positive definite, which is a stronger condition, so I'm guessing that the actual number is something less than that. Any thoughts?

(x-posted to statisticians)
Eeeeearly sunrise

Dynamic Systems Analyses

Hi all,

I haven't been using LJ for a while, so not sure how active this community is anymore. Either way I thought I'd throw this out here.

I'm working on plotting out my dissertation -- analyzing ecological momentary assessment. I don't fully understand the capabilities of what types of questions can be asked/answered with dynamic systems modeling. I have a pretty 'interesting' variability scheme. Looking at negative affect, which understandably on intervals of 30-40 minutes will have lots of 0's. Further, increases in negative affect are (not surprisingly) usually followed by decreases a bit later. I also know that if you simply lag observations, higher NA at T-1 are associated with higher NA at T-2 (again not surprising). Given the ebb and flow nature of the phenomena, I have a hard time picturing that a linear association would best capture the nature of the construct.

In my mind, I picture the function as a type of sinusoidal oscillation, with perturbations associated with exogenous events (in my case, child misbehaviors). The literature seems pretty complex, and I don't want to spend too much time chasing a dead end (advancement deadline fast approaches).

To synthesize, here's my question: Can dynamic systems models be done using advanced non-linear, non-quadratic equations (again, in my instance, oscillations I picture like sinusoidal parameterization). And beyond that, can you use time-varying child misbehavior observations to predict fluctuations in this model? I see child misbehavior as one of many different things that could cause a perturbation to the system, but it's pretty much the big one that this dataset has.

Thanks for all your help in advance!
Coach McGuirk

Stats Software for Intro Class (of non-statisticians)

I am slated to teach a class in intro stats to psych majors this upcoming semester, and I would like some software to use consistently throughout the course. I refuse to use commercial software; having myself been trained extensively in SAS only to find it unavailable due to my current job, I don't want to create that situation for my students. In short, I'd like to give them something they can keep using as long as they want to use it.

I'd use R, if they were science students (more comfortable with computing/etc.) or stats majors, but these people are from psychology. So I need something easy to install, and which can be made easy (easier) to use. Don't get me wrong, I have used R in intro work before, but it was not smooth.

I would consider R with a front end. Is R-commander better than it used to be a couple years ago?

I have already rejected PSPP, as it has no graphics (right?), and Statistical Lab (Statistiklabor) as, well, I am having trouble getting that working well. It also lacks the needed detailed support in English that my students might require.

Any other ideas? Any users of OpenEpi or Gretl? Would these work for general stats?

Ideally I'd like the following: some tools for simple non-parametrics, resampling (wishful thinking?), the usual suspects in normal theory statistics with both 1 and 2 way ANOVA, Fisher's exact test (wishful thinking again?), lots of good graphics, relatively easy data transformations, and the ability to do simple simulations (this may have to be done elsewhere than the main package). Obviously I'd like it to be Free (beer) and/or free (libre) and/or open source. But I'll settle for anything the students can get for 0.00 USD legally and give to there friends.

I realize that what I want is simplified-R. Any ideas? Thanks in advance for any leads you can give me.

X-post-to: statisticians.

Survey on Short Courses and Tutorials in Biostatistics

Statisticians and Biostatisticians:

I invite you to take the following survey (it will only take a couple of minutes, I promise):


I am the sole graduate student on a planning committee for a major annual conference in biostatistics, and we are trying to gauge interest in specific topics for short courses and computer tutorials. You'll be doing me a HUGE favor by taking this survey!

Thanks so much, guys !!!

(cross-posted to statisticians)

Taking out selected rows of a data frame in R

I have the following dataframe in R, data1:
  v1 v2  v3
a  1 1.1
a  3 1.6
a  4 4.8
b  1 4.1
b  5 2.6
c  2 3.2
c  6 5.4
c  2 1.8

I want to create a new dataframe with only the rows where v1 has the value "b". That is,
v1 v2  v3
b  1 4.1
b  5 2.6

There should be a straightforward, simple way to do that, right? But the only thing I can get to work at all is
v1 <- data1$v1[data1$v1 == "b"]
v2 <- data1$v2[data1$v1 == "b"]
v3 <- data1$v3[data1$v1 == "b"]
data2 <- data.frame(v1, v2, v3)

Can anyone help me out?