Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures.
View Article and Find Full Text PDFWe compare translations of single words, made by bilingual speakers in a laboratory setting, with contextualized translation choices of the same items, made by professional translators and extracted from parallel language corpora. The translation choices in both cases show moderate convergence, demonstrating that decontextualized translation probabilities partially reflect bilinguals' life experience regarding the conditional distributions of alternative translations. Lexical attributes of the target word differ in their ability to predict translation probability: form similarity is a stronger predictor in decontextualized translation choice, whereas word frequency and semantic salience are stronger predictors for context-embedded translation choice.
View Article and Find Full Text PDFTo evaluate theoretical proposals regarding the course of child language acquisition, researchers often need to rely on the processing of large numbers of syntactically parsed utterances, both from children and from their parents. Because it is so difficult to do this by hand, there are currently no parsed corpora of child language input data. To automate this process, we developed a system that combined the MOR tagger, a rule-based parser, and statistical disambiguation techniques.
View Article and Find Full Text PDF