Impact | Sundayobserver.lk

Home

News Bar »

News: Sripathi dies in motor accident ... Political: All set for LG polls in B'caloa ... Finanacial News: Botanical name for Ceylon Cinnamon stands ... Sports: National recognition for Observer Schoolboy Cricketer of The Year contest ...

DateLine Sunday, 10 February 2008

Untitled-1

observer
ONLINE

News

Editorial

Financial

Features

Political

Security

Spectrum

Impact

Sports

World

Plus

Magazine

Junior

Letters

Obituaries

OTHER PUBLICATIONS

Researching Sri Lankan English through a corpus

by Prof Ryhana Raheem and Dr Dushyanthi Mendis

Corpus Linguistics is a field of Linguistics which analyses natural language data, in particular 'real' contemporary written and spoken data, by means of a 'corpus', which is a collection of electronic or digital texts.

On the highway of corpus linguistics

The objective of corpus linguistics is to discover patterns of language use, i.e., association patterns between particular words and phrases.

The collected data is recorded in a computer as text files, and special tools and computer software have been designed which help researchers analyze the wealth of information that has been collected and collated from a wide variety of fields.

Corpus linguists usually start by searching for a particular word or phrase in a corpus, and then do a more detailed analysis to find out exactly how the word has been used in naturally occurring language situations.

This analysis is of immense value to teachers and learners, in fact to all users, because it gives us information on the ways in which the language we use is changing and evolving, and what new words and phrases are being formed. Patterns of contemporary use discovered by corpus linguists are often incorporated into books on grammar, dictionaries and other tools that aid teaching and learning.

Corpus Linguistics with regard to English began in the 1960s. A project was begun to investigate the features of Sri Lankan English in this manner in the late 1990s.

Initially supported by the British Council, a multi-university team of academics and researchers along with Dr Chris Tribble (who was associated with the British Council at that time), commenced work on collecting data on Sri Lankan English.

At the moment, the Sri Lanka corpus team is chaired by Professor Ryhana Raheem, currently Director of the Post Graduate Institute of English at the Open University of Sri Lanka, and includes Dr Dushyanthi Mendis , Senior Lecturer, Department of English, University of Colombo, Dr Hemamala Ratwatte, formerly Head of the Department of Language Studies, Open University of Sri Lanka, Professor Manique Gunasekera of the University of Kelaniya and other researchers from these universities.

When completed, the corpus of Sri Lankan English will become a part of a larger corpus of International English known as the ICE (the International Corpus of English) and will be available to students, teachers and researchers for research and study purposes.

The Sri Lankan corpus will be known as ICE-SL, or the International Corpus of English - Sri Lanka. Work on this component is almost half-completed, and is currently being directed by Prof. Dr. Joybrato Mukherjee, Chair of English Linguistics of Justus Liebig University in Giessen, Germany, who has obtained funding to complete the project.

Prof Mukherjee is scheduled to visit Sri Lanka in February to review the work done so far on ICE-SL, collect the remaining texts to complete the corpus, and set parameters for data collection and analysis for the future.

Professor Mukherjee is also scheduled to deliver lectures on Corpus linguistics perspectives on Indian and Sri Lankan English for interested audiences at the University of Colombo (February 12th) and the Open University (February 13th).

Among the most well-known examples of research corpora are the Brown Corpus of America, the British National Corpus (BNC) and the London-Lund corpus of Britain, the Kolhapur corpus of India, ACE of Australia and the Wellington Corpus compiled for New Zealand English.

Work on Sri Lankan English done by the ICE-SL team is therefore a timely effort to align the research on English in this country with international trends and norms, and to provide a concrete base for prescriptive (how the language should be used) and descriptive (how the language is actually used) practices in Sri Lanka.

****

More on corpus linguistics

A landmark in modern corpus linguistics was the publication by Henry Kucera and Nelson Francis of Computational Analysis of Present-Day American English in 1967, a work based on the analysis of the Brown Corpus, a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources.

Kucera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, language teaching, psychology, statistics, and sociology.

A further key publication was Randolph Quirk's 'Towards a description of English Usage' (1960, Transactions of the Philological Society, 40-61) in which he introduced The Survey of English Usage.

Shortly thereafter Boston publisher Houghton-Mifflin approached Kucera to supply a million word, three-line citation base for its new American Heritage Dictionary, the first dictionary to be compiled using corpus linguistics. The AHD made the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used).

Other publishers followed suit. The British publisher Collins' COBUILD dictionaries, designed for users learning English as a foreign language, were compiled using the Bank of English.

The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), ACE (Australian English), the Frown Corpus (early 1990s American English), and the FLOB Corpus (1990s British English).

Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and the British National Corpus, a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library.

EMAIL | PRINTABLE VIEW | FEEDBACK

Gamin Gamata - Presidential Community & Welfare Service

Comments and suggestions to : Web Editor