Lexical Entry StructureTopAssigning Grammatical FunctionsThe FrameNet Process

The FrameNet Process

An Informal Account of the FrameNet Process

Discovery of Frames

First, collect lists of words with similar meanings, where you think the similarity is because they are all built on the same semantic frame.

Speaking

Judging Classifying
speak, say, tell, talk, inform, discuss, complain, report, assert, affirm

admire, appreciate, belittle, scorn, blame, commend, denigrate, deplore, disapprove, condemn, respect, evaluate, judge

categorize, classify, define, interpret, depict, describe, regard, construe

Definition of Communication/Statement Frame

Characterize the frames and identify (and name) the actors and props in situations understood in terms of the frame: Communication/Statement (i.e., monologic, informing)

Frame Description:

A person (Speaker) produces some linguistic object (Message) while addressing some other person (Addressee) on some topic (Topic).

Frame Elements:

Refinements on "Message":

Definition of Cognition/Categorization Frame

Frame Description:

A person (the Cognizer) categorizes something (the Item). The Category into which the Item is placed may be expressed, as may the Criterion used as the basis for categorization.

Frame Elements:

Definition of Cognition/Judgment Frame

Frame Description:

A person (the Cognizer) makes a judgment about something or someone (the Evaluee). The judgment may be positive or negative. The target word may entail that the judgment is expressed verbally (e.g. scold) or it may not (e.g. blame). There may be a Reason for the judgment or a Role in which the Evaluee is evaluated.

Frame Elements:

Preliminary Exploration of Corpus Examples

Collect examples of each word on the list.

Recognize the possibility of polysemy: choose those examples in which the word has the sense being examined.

Speak, talk and discuss have both monologic and dialogic uses; for this frame we want only the monologic ones.

Hand Marking: Inform

Identify constituents in the example sentences and label them by the frame elements they realize.
Hand annotation of: They informed me that you were thinking of leaving.
 

Hand Marking: Say

Hand annotation of: She said something quite interesting about you.
 

Hand Marking: Discuss

Hand annotation of: The teacher discussed the next homework assignment.
 

Creating Subcorpora for Annotation

The Vanguard sets parameters relevant for a particular lemma which are then used by the CQP program to search the corpus and produce subcorpora. (See Resources used in FrameNet.) Lines from each subcorpus are selected and combined into a single file for annotation.

Annotation with the Alembic Workbench

Framenet's customized annotation software is not publicly available. However, a close approximation can be found by looking at the example sentences for a lemma such as admire in the FrameSQL interface.

SGML Labeling of Frame Elements

The following shows the annotation results at this stage which includes the SGML labeling of constituents with frame element tags.

<S TPOS="58295185"><C FE="Eval">He</C> led successful campaigns to
clear infestation on over forty worlds and was respected and <C
TARGET="y">admired</C> <C FE="Judge">by all of his colleagues in the
forces</C> .</S>

<S TPOS="7614466">He could coast along , asking nothing ; accepting
everything , like <C FE="Eval">the animals</C> <C FE="Judge">Vic</C>
<C TARGET="y">admired</C> <C FE="Degr">so much</C> , and staying out
of trouble .</S>

<S TPOS="26231339"><C FE="Judge">I</C> <C TARGET="y">admire</C> <C
FE="Eval">them</C> <C FE="Reas">for being so up front about their
religious activity</C> because it puts them right in the front line
against anti-Semitism . "</S>

SGML Syntactic Classification

At this stage, the FE-annotated file is passed through a constituent classifier which adds grammatical functions and phrase type information. Later on, the rearguard hand-checks the information and makes corrections, if needed.

<S TPOS="58295185"><C FE="Eval" GF="Ext" PT="NP">He</C> led successful
campaigns to clear infestation on over forty worlds and was respected
and <C TARGET="y">admired</C> <C FE="Judge" GF="Comp" PT="PP">by all
of his colleagues in the forces</C> .</S>


<S TPOS="7614466">He could coast along , asking nothing ; accepting
everything , like <C FE="Eval" GF="Obj" PT="NP">the animals</C> <C
FE="Judge" GF="Ext" PT="NP">Vic</C> <C TARGET="y">admired</C> <C
FE="Degr" GF="Adjunct" PT="Adv">so much</C> , and staying out of
trouble .</S>

<S TPOS="26231339"><C FE="Judge" GF="Ext" PT="NP">I</C> <C
TARGET="y">admire</C> <C FE="Eval" GF="Obj" PT="NP">them</C> <C
FE="Reas" GF="Comp" PT="PPing">for being so up front about their
religious activity</C> because it puts them right in the front line
against anti-Semitism . "</S>

Lexical Entry Preparation

Software is run to prepare the initial version of a lexical entry. The output shows the mappings between FEs and their syntactic realizations, along with example sentences. It also provides a summary of the valence patterns in which a lemma occurs.

Making Generalizations

Sample generalizations for Communication/Statement verbs

Facing reality:

Actually it's hugely more complicated than what we've seen so far; we'll discuss null instantiation, frame inheritance, blending, etc. later.

Resources used in FrameNet

The Corpus

The British National Corpus (BNC) is a large sample of modern (British) English taken from a number of genres. It was created in the UK by a consortium of publishing houses, universities, and government agencies, was completed in 1994, and was made available to European researchers in 1995.

The Corpus comprises 90% written language and 10% transcribed speech, totalling over 100,000,000 running words.

The Corpus has been processed in certain ways by the Consortium. It has been tokenized, which means that the boundaries of sentences are indicated, contractions are separated, individual word tokens are assigned numbered locations, and punctuation marks are indexed. Also, it has been pos-tagged (each word is tagged for part of speech) in a refined system of 65 word classes.

The version of the Corpus which we use has further been lemmatized, which means that inflectional (and dialect) variants are identified as instances of the same lemma. The lemmatizing was done at the University of Stuttgart. The lemmatized version was made available to us through the courtesy of the Institut fuer Maschinelle Sprachverarbeitung at the University of Stuttgart.

FrameNet has the use of this corpus by agreement with Oxford University Press, leader of the BNC Consortium.

Corpus Workbench: xkwic and cqp

The Corpus Query Processor (CQP) is a command-line tool that allows users to perform powerful regular expression searches over linguistically annotated corpora that have been pre-processed appropriately. Besides allowing the user to save search results to a file, CQP has many other capabilites used in the framenet process. With CQP one can, for instance, create and store subcorpora; set collocates; and sort the search results alphabetically based on the match, the collocate, or a specified position to the left or right of the match.

Xkwic is a graphical user interface for the Corpus Query Processor. More specifically, it is a key-word-in-context tool that lets users view the results of CQP-corpus searches with the matches aligned on the screen. Xkwic integrates all the functionality of CQP but in addition allows the user to display the context of search results in ways that are not supported by basic CQP .

Annotation Software

FrameNet I corpus data was annotated using the Alembic Workbench (MITRE). Information is available at http://www.mitre.org/technology/alembic-workbench/. FrameNet II data will be annotated using in-house software which is currently under development.
Lexical Entry StructureTopAssigning Grammatical FunctionsThe FrameNet Process