1. Introduction.- 2. Digital Speech Corpora.- 3. Language Corpora: Indian Scenario.- 4. Issues in Corpus Generation.- 5. Process of Corpus .- 6. Corpus Sanitation and Pre-Editing.- 7. Statistical Studies on Corpus.- 8. Corpus Text Processing.- 9. Corpus as Primary Resource for ELT.- 10. Corpus as Secondary resource for ELT.- 11. Corpus and Lexicography.- 12. Corpus and Dialect Study Chapter.- 13. Corpus and Word Sense Disambiguation.- 14. Corpus and Language Technology.- 15. Corpus and Other Branches of Linguistics.- 16. Corpora: Future Indian Needs.
About the Author: Niladri Sekhar Dash, PhD, is an associate professor at the Linguistic Research Unit of the Indian Statistical Institute, Kolkata, where his interests include corpus linguistics, language technology, natural language processing, language documentation and digitization, computational lexicography, computer assisted language teaching, and manual and machine translation for over two decades. He has published 15 research monographs and 160 research papers in peer-reviewed national and international journals, anthologies, and conference proceedings. He has delivered lectures and taught courses as an invited scholar at more than 30 universities and institutes in India and abroad, and has acted as a consultant for several organizations working in the field of Language Technology and Natural Language Processing. Dr. Dash is the principal investigator for 5 language technology projects funded by the Government of India and the Indian Statistical Institute, Kolkata. He is the editor-in-chief of the Journal of Advanced Linguistic Studies--an international peer-reviewed journal; and editorial board member of 5 international journals. He is a member of several linguistics associations across the globe and a regular PhD thesis adjudicator for several Indian universities. Dr. Dash is currently working on a digital pronunciation dictionary for Bangla, Hindi-Bangla parallel translation corpus generation, endangered language documentation and digitization, POS tagging and chunking, word sense disambiguation, manual and machine translation, and computer-assisted language teaching.
L. Ramamoorthy, PhD, is the head of the Linguistic Data Consortium for Indian Languages (LDC-IL) at the Central Institute of Indian Languages (CIIL), Mysuru, Ministry of Human Resource Development, Government of India. He is one of the leading corpus linguists in India, and is in charge of the Corpus Development Project for Indian Languages at the CIIL. Under his leadership, more than 30 scholars have been working on this mega project. He is a member/active participant of several corpus-oriented projects in India. He has conducted numerous workshops on computational and corpus linguistics in various universities and colleges in India. He has published 7 research monographs, edited 8 volumes, and published/presented 140 research papers at national and international seminars and conferences. He has guided more than 15 PhD scholars and trained school teachers, college and university teachers in and outside India in language technology, linguistics, and teaching methods. He was also the director of the Pondicherry Institute of Linguistics and Culture (for over four years), editor-in-chief of the PILC Journal of Dravidic Studies, and co-editor for Languages in India (e-Journal).