Log in

No account? Create an account
the statistics discussion community's Journal
[Most Recent Entries] [Calendar View] [Friends]

Below are the 20 most recent journal entries recorded in the statistics discussion community's LiveJournal:

[ << Previous 20 ]
Thursday, March 15th, 2012
8:31 am
Truncated(?) Negative Binomial

If I run Bernoulli experiments till either r successes or total number of N trials
What would be the expected ratio of success/number of experiments?

Saturday, November 26th, 2011
10:45 am
Does anyone know a simple formula for the expected value of the logarithm of the determinant of a Wishart-distributed matrix? It seems like there ought to be one, but my search so far has been frustrating.

(x-posted to statisticians)
Tuesday, October 4th, 2011
12:41 pm
Textbook Topics

I am currently working on an introductory textbook and some additional materials (including textbook/class materials for an intermediate course). I have a question for you.

Set aside any preconceptions about what must be in an introductory (or intermediate) book on statistics. What should be in such a book? Introductory stats books still suffer from being very 1950's oriented in terms of topics, emphasis, lack of computing, etc. (If you think computing should still play no role in such a book, let me know that, too.) Statistics has changed a lot over the decades, should not the books change, too?

If you are a statistician, what is important that is missing? If you are from another field, what do you find missing or what should not be there? Please let me know if you are a statistician or not, too, that would help.

Any input would be great, and be really appreciated! Thanks!

Crossposted at statisticians
Thursday, June 23rd, 2011
4:35 pm
Means and Standard Deviations
Hi all,

I'm looking at intraindividual variability as a section of my dissertation, and different ways of capturing that.  I know there are a lot of ways of doing so (e.g., range, sd, signal to noise ratio, coefficient of variance, pulse, spin), but the most common always seems to be the standard deviation (especially in the area I'm examining - mood lability). 

As I've heard from other more learn-ed faculty, and through my own experience, standard deviation often tends to be confounded by the mean.  With this, I'm leaning toward signal to noise ratio or coefficient of variance.  My issue right now is that I need to show some sort of citation noting a strong correlation between means and SD's to help justify my exploration of other measures of intraindividual variation. 

I'm figuring that the skewed nature of my data also play into this (4 items coded 0-3, so summed to 0-12, still positively skewed with a preponderance of 0's).  Does anyone here know of any good citations I can use to back this up?  Online searching for me has mostly yielded "this is what the standard deviation is, and this is how you calculate it" responses.

Any help will be greatly appreciated!
Sunday, November 21st, 2010
5:48 pm
How many free parameters are there in a covariance matrix?
I'm trying to calculate BIC for a fairly complicated model which includes several covariance matrices, and it just occurred to me that I don't actually know how many free parameters this represents. If I have a D-by-D covariance matrix, the naive answer is that this represents D(D+1)/2 free parameters, because the matrix must be symmetric -- but the matrix must also be positive definite, which is a stronger condition, so I'm guessing that the actual number is something less than that. Any thoughts?

(x-posted to statisticians)
Saturday, October 23rd, 2010
5:33 pm
Dynamic Systems Analyses
Hi all,

I haven't been using LJ for a while, so not sure how active this community is anymore. Either way I thought I'd throw this out here.

I'm working on plotting out my dissertation -- analyzing ecological momentary assessment. I don't fully understand the capabilities of what types of questions can be asked/answered with dynamic systems modeling. I have a pretty 'interesting' variability scheme. Looking at negative affect, which understandably on intervals of 30-40 minutes will have lots of 0's. Further, increases in negative affect are (not surprisingly) usually followed by decreases a bit later. I also know that if you simply lag observations, higher NA at T-1 are associated with higher NA at T-2 (again not surprising). Given the ebb and flow nature of the phenomena, I have a hard time picturing that a linear association would best capture the nature of the construct.

In my mind, I picture the function as a type of sinusoidal oscillation, with perturbations associated with exogenous events (in my case, child misbehaviors). The literature seems pretty complex, and I don't want to spend too much time chasing a dead end (advancement deadline fast approaches).

To synthesize, here's my question: Can dynamic systems models be done using advanced non-linear, non-quadratic equations (again, in my instance, oscillations I picture like sinusoidal parameterization). And beyond that, can you use time-varying child misbehavior observations to predict fluctuations in this model? I see child misbehavior as one of many different things that could cause a perturbation to the system, but it's pretty much the big one that this dataset has.

Thanks for all your help in advance!
Friday, July 23rd, 2010
2:08 pm
Stats Software for Intro Class (of non-statisticians)

I am slated to teach a class in intro stats to psych majors this upcoming semester, and I would like some software to use consistently throughout the course. I refuse to use commercial software; having myself been trained extensively in SAS only to find it unavailable due to my current job, I don't want to create that situation for my students. In short, I'd like to give them something they can keep using as long as they want to use it.

I'd use R, if they were science students (more comfortable with computing/etc.) or stats majors, but these people are from psychology. So I need something easy to install, and which can be made easy (easier) to use. Don't get me wrong, I have used R in intro work before, but it was not smooth.

I would consider R with a front end. Is R-commander better than it used to be a couple years ago?

I have already rejected PSPP, as it has no graphics (right?), and Statistical Lab (Statistiklabor) as, well, I am having trouble getting that working well. It also lacks the needed detailed support in English that my students might require.

Any other ideas? Any users of OpenEpi or Gretl? Would these work for general stats?

Ideally I'd like the following: some tools for simple non-parametrics, resampling (wishful thinking?), the usual suspects in normal theory statistics with both 1 and 2 way ANOVA, Fisher's exact test (wishful thinking again?), lots of good graphics, relatively easy data transformations, and the ability to do simple simulations (this may have to be done elsewhere than the main package). Obviously I'd like it to be Free (beer) and/or free (libre) and/or open source. But I'll settle for anything the students can get for 0.00 USD legally and give to there friends.

I realize that what I want is simplified-R. Any ideas? Thanks in advance for any leads you can give me.

X-post-to: statisticians.
Monday, June 7th, 2010
2:01 pm
Survey on Short Courses and Tutorials in Biostatistics
If you are a statistician or a biostatistician (student or practicing), please take this survey!


Your responses will plan a role in planning short courses and tutorials for a major upcoming conference, and more importantly, it will really help me out! Thanks!
Wednesday, May 26th, 2010
2:39 am
Survey on Short Courses and Tutorials in Biostatistics
Statisticians and Biostatisticians:

I invite you to take the following survey (it will only take a couple of minutes, I promise):


I am the sole graduate student on a planning committee for a major annual conference in biostatistics, and we are trying to gauge interest in specific topics for short courses and computer tutorials. You'll be doing me a HUGE favor by taking this survey!

Thanks so much, guys !!!

(cross-posted to statisticians)
Thursday, May 13th, 2010
2:11 pm
Taking out selected rows of a data frame in R
I have the following dataframe in R, data1:
  v1 v2  v3
a  1 1.1
a  3 1.6
a  4 4.8
b  1 4.1
b  5 2.6
c  2 3.2
c  6 5.4
c  2 1.8

I want to create a new dataframe with only the rows where v1 has the value "b". That is,
v1 v2  v3
b  1 4.1
b  5 2.6

There should be a straightforward, simple way to do that, right? But the only thing I can get to work at all is
v1 <- data1$v1[data1$v1 == "b"]
v2 <- data1$v2[data1$v1 == "b"]
v3 <- data1$v3[data1$v1 == "b"]
data2 <- data.frame(v1, v2, v3)

Can anyone help me out?
Monday, May 3rd, 2010
5:32 pm
yet another "how to do this in R" question
Okay, here's what I'm trying to do. This is a genomics-related question, but it's a general problem. I have a two-column matrix or data frame representing gene start and stop positions. I want a vector of length n, where n is the total number of base pair positions on the chromosome, where each element has a value of 1 if the position is in a gene and 0 otherwise. For a very simple example, suppose n = 10 (surely this organism has the smallest genome ever!) and I have the following data frame "gene":

start stop
2 5
7 8

and I want the vector "isGene":


Now, the mindlessly inefficient way to this would be:

isGene = rep(0, n)
for(i in 1:nrow(gene))
geneRow = gene[i,]
isGene[geneRow$start:geneRow$stop] = 1

but surely there must be a better way? I'm dealing with real chromosomes here, not toy examples, and this kind of clumsy iteration eats up a lot of computing cycles.

(x-posted to statisticians)
Sunday, February 21st, 2010
12:32 am
Hi, I'm stuck on this step from a textbook:

Pr(C = t, X > C)
= d/dt { Int_0^t Int_v^inf f(u, v) du dv }

where f(x, c) is the joint density function of X and C. Int_0^t is the integral evaluated from 0 to t, likewise from v to infinity in the next one.

I'm not sure how the "d/dt" gets in there. I'm familiar with evaluating probabilities of the form P(X < a, Y < b) by a double integral over the joint density of X and Y, but I don't totally get what's going on in this case.

Can anyone explain? I can provide more information from the text (survival analysis by Klein) if I've left too much out.

Monday, February 8th, 2010
8:06 pm
Does anyone have THIS TEXTBOOK?
Hello all,

I'm in a statistics class that uses the book, "Design and Analysis of Experiments" by Montgomery, 7th edition. I lent my book to someone who won't answer my calls/messages, and we have homework due tomorrow from it. If anyone has the book, could anyone tell me what these problems are?:


3.16 (a)(b)(c)

3.18 (a)(b)(c)

in the book? i would be eternally grateful. THANK YOU!

Thursday, February 4th, 2010
10:46 pm
Method of Moments, MLEs, and Standard Errors
Hi, I have a question about standard errors in the context of this problem. Any help would be greatly appreciated:

    Suppose X is a discrete random variable with

    P(X=0) = 2y/3
    P(X=1) = y/3
    P(X=2) = 2(1-y)/3
    P(X=3) = (1-y)/3

    Where 0<=y<=1. The following 10 independent observations were taken from such a distribution: (3,0,2,1,3,2,1,0,2,1).

    Find the method of moments estimate of y, an approximate standard error for your estimate, the MLE of y, and an approximate standard error of the MLE.

I have found the method of moments estimate of y (5/12) and the MLE (.5) but I'm not sure how to go about approximating the standard errors. What I initially did for the SE of the first estimate was to calculate the different y's based on the observed probabilities of the X's, then add the squared differences between them and 5/12, divide by 4, and take the squared root, but that doesn't seem quite right. Sorry to ask such an elementary question, but I'm really puzzled as to how to do this. Thanks in advance!
Sunday, January 3rd, 2010
12:44 pm
Hougaard-Weibull question
Okay, so I have a question about the Hougaard multivariate Weibull distribution that I'm hoping someone can help me answer. The distribution is as given in [1], and is most easily defined by the survival function. Let T = (T1, ..., Tn) be a vector of r.v.s with marginal Weibull distributions, and let t = (t1, ..., tn) be a vector of observations. Then the multivariate survival function is given by:

S(t) = P(T1 > t1, ..., Tn > tn) = exp{-(Σi=1,...,nεitiγ)α}

for constants α, γ, ε1, ..., εn > 0.

Hougaard claims that this is only a legitimate survival function with the additional constraint α ≤ 1. What I'm trying to understand is why. It seems to me that for any positive α, the expression obeys all the rules for a proper survival function: S(0, ..., 0) = 1, the limit as any ti goes to infinity is 0, and S is strictly decreasing in the ti's.

Now, I've read the derivation (partly in [1], partly in [2]) and I understand that S was derived via a positive stable frailty distribution, and that this derivation imposes the constraint. I also understand that in general, Archimedean copulas, of which this is an example, require concave generator functions, and although I haven't gone through the math I can guess that α > 1 might violate this requirement for some values of t. But again, looking at the specific S given above, I still don't see how any positive value of α can make it not be a legitimate survival function. Honestly, how it was derived seems kind of irrelevant to its legitimacy; once you've got the function, if it meets the requirements, why not use it?

Any insight that anyone can offer on this will be greatly appreciated.

[1] A Class of Multivariate Failure Time Distributions, P. Hougaard (1986), Biometrika 73(3):671-678

[2] Survival Models for Heterogeneous Populations Derived from Stable Distributions, P. Hougaard (1986), Biometrika 73(2):387-396

x-posted to statisticians
Tuesday, November 24th, 2009
4:54 pm
covariance structures for measures repeated at multiple levels
Hi! I'm here with another request for a reference.

I'm working with linear mixed modeling to analyze some tricky repeated measures data: Participants provided saliva samples three times each day, for three days, at three separate timepoints (baseline, 3 months, and 10 months). That is, we have samples nested within days nested within timepoints nested within participants.

I'm looking for advice on how, theoretically or in any piece of software, to specify a covariance structure for the repeated measures. The issue is that the model should predict different covariances depending on whether the saliva samples in question are within the same day, on different days, or at different timepoints.

more details below the cut...Collapse )
Sunday, November 15th, 2009
4:22 pm
I need help :(
Hey everyone,

I am a student currently taking a stats class, and having trouble with test statistics. For some reason this stuff just does not compute in my brain. I was wondering if anyone knew off the top of their heads the equations needed for a couple problems I have to complete. I have a list of equations, but they are all for proportions and I have a feeling they are not the equations that I need (this is what I get for being out of class for a week due to swine flu, eeek!).

Problem 1:
A study found that the mean number of hours of TV watched per day was 4.09 for black (N=101, Standard Error = 0.3616) and 2.59 for white (N=724, Standard Error = 0.0859).
a. What type of test should you run?
b. Construct Hypotheses
c. Conduct a significance test using an alpha-level of 0.01 and interpret.
d. interpret your P value
e. Construct a confidence interval and interpret.
f. Interpret as a ratio.

Problem 2:
An experiment of responses for noise detection under 2 conditions used a sample of twelve 9-month old children. The study found a sample mean difference of 70.1 and a standard deviation of 49.4 for the difference.
a. What type of test should you run?
b. Construct Hypotheses
c. Conduct a significance test using an alpha-level of 0.01 and interpret.
d. interpret your P value
e. Construct a confidence interval and interpret.

If anyone could help me - it would be MUCH APPRECIATED!
Tuesday, November 10th, 2009
10:34 am
Does anyone know of any good literature reviews on the consequences of treating Likert scale data as interval rather than ordinal?
Monday, November 9th, 2009
9:15 pm
 Hello! My name is Lena. I am a student. I am writing my term paper “associations about managers”. I have to get public opinion. Please, answer some questions and you’ll help me (just what you personally think, your associations):
1. Behave like a Manager, that is(continue)...
2.Work like a Manager, that is(continue)...
3. What features do belong to the Manager?
4. What are your associations with the word "Manager"?
Friday, October 23rd, 2009
11:53 am
help out a journalist...
Hi guys and gals, I'm a music journalist, and I need help for a story.  If someone could answer this stuff for me, you'd really make my day...

My hypothesis is that there is a large discrepancy between the standard of deviation of music reviews and that of film reviews.  Meaning that any individual record is more likely to polarize critical responses to it than a film.  I'm collecting the raw data from metacritic.com, and as a way of randomizing I'm limiting the reviews pulled from date ranges.

My questions:
- How many reviews do I need to collect to give me a reasonable error rate?
- How big would the difference in standard deviation need to be to make it statistically significant?
- Is my selection method actually random?
- Are there are any better statistics I could look at other than standard of deviation that would make my case better?

Thanks for reading, hope to hear from you,
[ << Previous 20 ]
About LiveJournal.com