An important thing to know about statistical significance is that it depends on having a representative sample. If your corpus is not representative of whatever you want to generalize it to (the whole language, usually), you are simply not justified in generalizing, no matter what the significance tests say. I blogged about this:

That said, many conferences, journals and tenure committees just ignore the whole representativeness thing. Usually when I bring it up here on the corpora list, there's an embarrassed silence, and then a few people just go on talking about measuring the "significance" of non-representative samples. Feel free to do that as usual, everybody.

On the contrary (and I had to look at Wikipedia for this) chi-square is for frequencies, where the value is between 0 and 1. For averages where you expect a normal distribution, you can use Student's /t/-test.

What you really want is the /envelope of variation/: how often does the phenomenon occur relative to the amount of time it has a chance to occur? If predicative and attributive adjective phrases are the only possibilities, you can add up the frequencies and use that as your denominator. It all depends on your hypothesis. I wrote an article about this; if you don't have free access I can send you a copy:

Good luck, Georg!

