15.487 Canadian research project funding; ELRA News

From: Humanist Discussion Group (by way of Willard McCarty (w.mccarty@btinternet.com)
Date: Tue Feb 05 2002 - 03:14:01 EST

  • Next message: Humanist Discussion Group (by way of Willard McCarty : "15.490 seminar, workshop, conferences"

                   Humanist Discussion Group, Vol. 15, No. 487.
           Centre for Computing in the Humanities, King's College London

       [1] From: Angela Mattiacci <amattiac@uottawa.ca> (32)
             Subject: Canadian Century Research Infrastructure

       [2] From: Magali Duclaux <duclaux@elda.fr> (133)
             Subject: ELRA News

             Date: Tue, 05 Feb 2002 08:10:04 +0000
             From: Angela Mattiacci <amattiac@uottawa.ca>
             Subject: Canadian Century Research Infrastructure

    OTTAWA, January 31, 2002 The Canadian Century Research Infrastructure
    (CCRI), a pan-Canadian research project, will benefit from the Canada
    Foundation for Innovation's latest round of funding.

    Industry minister Allan Rock released a list of 280 Canadian projects
    that will receive CFI grants yesterday. To receive funding, applicants had
    to demonstrate the
    excellence and innovative nature of their projects and how they will
    benefit Canada.

    "Our recent success in the Innovation competitions coupled with our 100
    per cent success rate in the New Opportunities program clearly
    establishes the University of Ottawa as one of Canada's leading
    research-intensive universities," noted rector Gilles Patry.

    The Canadian Century Research Infrastructure will receive $5.2M. With
    the matching funds from each province and thanks to the contributions of
    our partners, a total of $13.4M will be
    invested in this project.

    The project leader of the CCRI is Chad Gaffield, Director of the
    Institute of Canadian Studies and Professor of History at the University
    of Ottawa. Headquarters for the CCRI will be located at the University
    of Ottawa with partners in the following universities: Memorial
    University of Newfoundland, Universit Laval, Universit de Qubec
    Trois-Rivires, York University, University of Toronto, and University
    of Victoria.

    Canada Century Research Infrastructure

    One of the largest social science projects ever funded by CFI, the
    Canada Century Research Infrastructure will create a series of databases
    from census records covering a century of Canadian life. The databases
    will allow researchers to examine social structures and how they have
    changed in detail that until now was simply not available. The CCRI will
    spark bold and creative new approaches to the study of Canada in
    universities across the country and around the world.

    For more information, please contact the Institute of Canadian Studies
    at canada@uottawa.ca or phone (613) 562-5111

             Date: Tue, 05 Feb 2002 08:10:55 +0000
             From: Magali Duclaux <duclaux@elda.fr>
             Subject: ELRA News

    ELRA - European Language Resources Association

    We are pleased to announce some new resources
    available in our catalogue of language resources:

    S0119 Spanish SpeechDat Database for the Mobile Telephone Network
    W0032 Modern French Corpus including Anaphors Tagging
    W0033 CRATER 2

    A short description of these three new resources is given
    below. Please visit the online catalogue to get further details:

    S0119 Spanish SpeechDat Database for the Mobile Telephone Network
    The Spanish SpeechDat database for the mobile telephone network
    comprises 1066 Spanish speakers (526 males, 540 females) calling
    from GSM telephones and recorded over the fixed PSTN using and
    ISDN-BRI interface. The database was produced by Applied Technologies
    in Language and Speech S.L. (Spain). The MDB-1000 database is
    partitioned into 6 CDs in ISO 9660 format. This database follows the
    specifications given in the framework of the SpeechDat(II) project.
    Speech samples are stored as sequences of 8-bit 8 kHz A-law.
    Each prompted utterance is stored in a separate file. Each signal file
    is accompanied by an ASCII SAM label file which contains the relevant
    descriptive information.
    Each speaker uttered the following items:
            2 isolated digits.
            1 sequence of 10 isolated digits.
            4 connected digits: 1 sheet number (6 digits), 1 telephone number
    (9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits).
            3 dates: 1 spontaneous date (e.g. birthday), 1 prompted date
    (word style), 1 relative and general date expression.
            1 word spotting phrase using an application word (embedded).
            6 application words.
            3 spelled words: 1 spontaneous name (own forename), 1 city
    name, 1 real / artificial word for coverage.
            1 currency money amount.
            1 natural number.
            6 directory assistance names: 1 surname (set of 500), 1 city of
    birth / growing up, 1 most frequent cities (set of 500), 1 most frequent
    company / agency (set of 500), 1 forename surname (set of 150), 1
    spontaneous forename.
            2 questions including fuzzy yes / no: 1 predominantly Yes question,
    1 predominantly No question.
            9 phonetically rich sentences.
            2 time phrases: 1 time of day (spontaneous), 1 time phrase (word
            4 phonetically rich words.
            Call environment.
    The following age distribution has been obtained: 5 speaker are below 16
    years old, 543 speakers are between 16 and 30, 307 speakers are
    between 31 and 45, 202 speakers are between 46 and 60, 9 speakers are
    over 60. A pronunciation lexicon with a phonemic transcription in SAMPA is
    also included.

    W0032 Modern French Corpus including Anaphors Tagging
    The corpus that includes the tagging of the anaphors was created by
    the CRISTAL-GRESEC (Stendhal-Grenoble 3 University, France) team
    and XRCE (Xerox Research Centre Europe, France) in the framework of
    the call launched by the DGLF-LF (national institution for the French
    language and the languages spoken in France), for the creation of modern
    French corpora).
    Over 1 million words have been annotated. The corpora have been selected
    so that they represent a wide sampling of the French language (scientific
    and human science articles, extracts from newspapers and magazines,
    legal texts, etc.) and according to the points of interest of the teams working
    on the project. The processed corpora supplied by ELRA are listed below:
    - Two books edited by the CNRS: La protection des oeuvres scientifiques
    en droit d'auteur franais, Xavier Strubel. Paris, CNRS Editions, 1997 (77 591
    words) and Cinquante ans de traction la SNCF. Enjeux politiques, conomiques
    et rponses techniques, Clive Lamming. Paris, CNRS Editions, 1997 (124 990
    - 204 articles extracted from CNRS Info, a magazine which contains short
    popular scientific articles from the CNRS laboratories (201 280 words).
    - 14 articles dealing with Herms Human Sciences (111 886 words).
    - 136 articles extracted from "Le Monde", dealing with economics (roughly
    180 760 words).
    - 13 booklets of the Official Journal of the European Communities
    337 000 words).

    Below the tagged anaphoric elements:
    - Person pronouns: 3rd person pronoun, anaphoric.
    - Possessive determiners: 3rd person possessive determiner.
    - Demonstrative pronouns: anaphoric pronouns (celui, celle, ceux,
    - Indefinite pronouns: Aucun(e), chacun(e), certain(e)s, l'un(e), les
    tout(es), etc, when they are anaphoric.
    - "Proverbs": "le" + "faire".
    - Anaphoric and cataphoric adverbs: Dessus, dedans, dessous , when
    they have an anaphoric function.
    - Ellipsis of head nouns: Nominal adjectives or quantifiers determiners
    - Textual headers like "ce dernier": Ce dernier, le premier , etc.
    The annotation scheme was defined in XML format. The texts were divided
    into sections, paragraphs (<p>) and sentences (<s>). The sentence
    segmentation was carried out with
    NLP tools developed by XRCE, the annotation part was done manually by two
    qualified linguists. A large subset of anaphoric phrases was automatically
    pre-annotated. The antecedents and the tagging of the anaphoric relations
    were manually processed, but editing tools (emacs, macros from Author/Editor
    software) were used to make it easier. 5% of the corpora were evaluated to
    the annotation reliability.

    W0033 CRATER 2
    The CRATER corpus was built upon the foundations of an earlier project,
    ET10/63, which was funded in the final phase of the Eurotra programme.
    The Corpus Resources and Terminology Extraction project (MLAP-93 20)
    extended the bilingual annotated English-French International
    Union corpus produced within ET10/63 to include Spanish.
    The CRATER 2 corpus was produced by the Department of Linguistics & Modern
    English Language, Lancaster University (United Kingdom) with funding from
    ELRA. The ELRA funding in turn was provided by the European Commission
    project LRsP&P (Language Resources Production & Packaging - LE4-8335).
    This project has enhanced the CRATER corpus, available under the reference
    ELRA-W0003 in the ELRA catalogue. CRATER 2 has significantly expanded
    the French/English component of the parallel corpus by increasing the size
    of the English/French corpus from 1,000,000 words per language to
    approximately 1,500,000 tokens per language. CRATER 2 is sold with CRATER
    in a single package.

    For further information, please contact:

    55-57 rue Brillat-Savarin
    F-75013 Paris, France

    Tel: +33 01 43 13 33 33
    Fax: +33 01 43 13 33 30

    E-mail mapelli@elda.fr

    or visit our Web site:
    or http://www.elda.fr

    This archive was generated by hypermail 2b30 : Tue Feb 05 2002 - 03:36:22 EST