[Corpora-List] Where can I find a English for Children corpus?

chris brew cbrew at acm.org
Thu Mar 10 18:48:05 CET 2011


Another of Charniak's papers from that time has my favourite title ever:

@inproceedings{Charniak:1973:JJS:1624775.1624816,

author = {Charniak, Eugene},

title = {Jack and Janet in search of a theory of knowledge},

booktitle = {Proceedings of the 3rd international joint conference on Artificial intelligence},

year = {1973},

location = {Stanford, USA},

pages = {337--343},

numpages = {7},

url = {http://portal.acm.org/citation.cfm?id=1624775.1624816},

acmid = {1624816},

publisher = {Morgan Kaufmann Publishers Inc.},

address = {San Francisco, CA, USA}, }

On Thu, Mar 10, 2011 at 11:18 AM, John F. Sowa <sowa at bestweb.net> wrote:


> On 3/10/2011 8:46 AM, Michael Israel wrote:
>
>> There is also a great deal of research based on this data showing that
>> the words and grammatical constructions which children learn are in many
>> (but not all) respects highly correlated with the frequency with these
>> occur in the spoken input that the children hear. So, CHILDES might be
>> more relevant than you think.
>>
>
> An analysis of the stages of language learning may provides some useful
> clues to the underlying mechanisms.
>
> But stories written for children are notoriously difficult to interpret.
> The major problem is that they depend very heavily on background
> knowledge that is not easy to verbalize.
>
> Charniak discovered that point 40 years ago:
>
> Charniak 1972: Eugene Charniak, “Toward a Model Of Children's Story
> Comprehension,” PhD thesis 1972, MIT, MIT Artificial Intelligence
> Laboratory Technical Report TR-266. Also at
> ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-266.pdf
>
> A notorious example is the first story in the Dick & Jane series.
> Every page is filled with a picture and one line of text,
> such as "Oh, look." and "Oh, Oh, Oh." Eventually it reaches
> the level of "See Spot run."
>
> A machine learning system might learn simple grammatical patterns.
> But if it can't interpret pictures, it won't learn semantics.
>
> John Sowa
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3288 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20110310/617278b4/attachment.txt>



More information about the Corpora mailing list