THIS IS A DRAFT THAT MORE OR LESS CORRESPONDS TO CJF'S LECTURES IN THE SECOND WEEK, ON IDIOMATICITY. LOTS OF REFERENCES ETC. ARE MISSING. I'M THINKING OF MOVING NOTES (HERE, PARENTHESIZED PARAGRAPHS) TO THE END, REACHABLE BY CLICK FROM THE TEXT, IF I CAN FIGURE OUT A WAY OF GETTING BACK TO THE PLACE IN THE TEXT WHERE THE FOOTNOTE GOT INTRODUCED. SUGGESTIONS AND CRITICISMS ARE WELCOME. firstname.lastname@example.org
It has sometimes been suggested that Construction Grammar is nothing more than the study of idioms. There are two possible reasons for this, one being that it's true - true in the sense that from our point of view, idioms are the irreducible units of description for the way a language works, and these are precisely the individual morphemes of the language (together with their combinatorics), and the constructions which license combinations of linguistic units into larger units, including, of course, those which require reference to one or more lexical items. In that sense, of course, almost every grammatical model is a theory of idioms, since the job is to isolate those principles which are in themselve irreducible, that is, which are not explained by other principles in the same language.
But if the charge is framed with the phrase "nothing but", it is intended to be trivializing. We might be thought of as interested in the careful study of idiomatic runs like "trip the light fantastic" or "kick the bucket", ignoring such global organizing principles as complementation patterns, long-distance dependencies, voice alternations, and the like. But in that sense, the charge is false.
Such an opinion might be based on the fact that in my own career, there was a period, especially in the early seventies, when I was convinced that almost everything that people produced in everyday talk consisted of assemblies of pre-fabricated sentence-structuring or discourse-structuring phrases, and that the role played by extremely general processes had been hugely exaggerated. [references] But actually studying what is idiomatic in a given language is the other side of the coin of studying what is general in that language. The right way to study a so-called idiom is to discover exactly what there is about the expressions that exemplify it that needs to be learned by linguistic convention, and in order to discover that, one needs a theory of what is regular or general in the language. What to the lay person is usually thought of as an "idiom" can be understood by considering the organization of information in a constructicon [glossary entry] [reference to Jurafsky], the inventory of all of the constructions in the language, together with a specification of the inheritance links connecting them.
In this constructicon, one will find a number of very simple constructions which are only bound [glossary entry], that is, which are inherited by lots of other constructions, but which do not themselves inherit any higher-level constructions. One of these might be a very general construction for English that requires that a lexical head precedes all of its complements. Thus, in English, verbs precede their complements in a phrasal VP ("showed the pictures to the sheriff"), prepositions precede their objects ("among the stars"), nouns precede their complements ("top of the hill"), adjectives precede their complements ("afraid of cats"), etc. Such highest-level constructions can be thought of as characterizing the typological nature of the language in question. Other constructions are not at that highest level, but are broadly necessary for accomplishing such things as providing specifiers, modifiers or complements for their heads, combining two constructs [glossary entry] of the same level and type into a larger construct of the same type, establishing coinstantiation [glossary entry], licensing [glossary entry] the grammatical realization of semantic arguments, etc., and these will correspond more or less to what people are thinking of when they speak of the core [glossary entry] properties of the language.
A very large number of constructions identify single lexical items, together with their semantic features and their combinatorial requirements. These include, of course, the ordinary lexical constructions. But some are more complicated than that, perhaps specifying one or more lexical constituents, contributing meanings that go beyond simple modification and complementation and argument realization, and so on. These could be referred to as the peripheral [glossary entry] constructions, if they are mainly syntactic in nature; and if they contain more than one lexical specification, they would correspond more or less to the kinds of expressions typically known as idioms - the kinds of expressions collected in idiom dictionaries.
It is probably obvious that these four or five categories of constructions can't meaningfully be sorted into separate boxes, because no matter where we put the boundaries, there are going to be unclear cases.
Negative definition of idioms.
We need to distinguish what it is that speakers of a language have to know outright from what it is that they have to be able to figure out on the basis of the other things that they know. Given such a distinction, we can give a kind of backwards definition of idiom. An expression is an idiom, if there are things that speakers know about its form, its meaning, or its use, which they wouldn't be able to figure out by simply knowing everything else about the language. By this definition, of course, we include the typical examples for which it is clearly possible to speak of the "arbitrariness" of the linguistic sign. [reference] We don't understand what the word "cat" means by computing its meaning from information about its shape or parts: that's something we just have to know, by linguistic convention. (A cooperative interpretation of this "backwards definition" will have to allow that for these purposes I don't include in knowing "everything else" about English the knowledge of the meaning of "cat food", "cat fur", "cat's claws" and the like.)
Now when we say that something like "You're pulling my leg" is an idiomatic construct, we have to recognize that there is much about such expressions that is not idiomatic. It has the form of an ordinary transitive active sentence. But that's not a separately "stored" property of the idiom;a passive form is equally possible. ("I think my leg is being pulled.") The possessor of "leg" is "my", but that particular word is not a part of the idiom: any possessive form is possible in this situation ("his leg", "the man's leg"). We might end up saying that the construction is reducible to a valence description involving the verb "pull", the requirement that the non-subject member of its valence set be headed by the noun "leg", and that its specifier be a possessive NP. Ordinarily the possessor will not be co-construed with the agent of the verb, but that might be a simple consequence of the verb's meaning and perhaps does not need to be stored as a property of the idiom. The idiom's meaning also has to be specified, and that will involve something about an agent playfully trying to get somebody (the individual associated with the "possessor" expression) to believe something that is not true, or to act in a way that is not in his or her best interests.
Encoding versus Decoding. We speak of encoding as the process by which speakers who have something to say choose the linguistic forms needed to express it; decoding is the process by which interpreters make sense of linguistic expressions they are exposed to. There can be both decoding and encoding idioms [Makkai reference]: an idiom of encoding is an expression which a speaker would not realize is a conventional way of saying what it means, without knowing that fact; an idiom of decoding is an expression whose interpretation could not be figured out by someone using only independently learned linguistic conventions. By these definitions, of course, every idiom is an encoding idiom; but we may allow ourselves, when speaking to people inclined to be cooperative, to use the termencoding idiom to refer to idioms that are not decoding idioms, i.e., to idioms which have transparent meanings. When speaking with people who demand more care, we will refer to these as encoding idioms which are not decoding idioms.
(A qualification may be needed here. In saying that the hearer would not be able to "figure out" what the expression means "by already known linguistic conventions" I intend to exclude such cases as figuring out what something means by its being situated in a meaningful context. Most of the vocabulary learning we accomplish during our lives takes place by contexted abductive reasoning: if I drop something that I haven't learned the name of and you say, "You dropped your comb", I will instantly know what "comb" means. [necessary to say the usual thing about gavagai?] Similarly, if you've been treating me in a certain unpleasant way for some time, and then, when you see how upset I'm becoming, you say something like, "Don't take it so seriously. We're just pulling your leg!", that might allow me to learn the idiom on one trial as well. We can think of these as examples of the occasions in which we acquire the linguistic conventions.)
"You're pulling my leg" is an example of a decoding idiom; something like "Let me be the first to congratulate you" is an example of an encoding idiom. (More carefully: an encoding idiom which is not a decoding idiom.) Interestingly, the adjective "idiomatic" is ambiguous according to whether the encoding or the decoding "direction" is intended. That is, if I say that Igor's English is not idiomatic, I might mean that he does not express himself in ways that an ordinary English-speaking native speaker would, that is, that he hasn't mastered the language community's most common or popular encoding idioms, the usual ways of saying things. But if I say that he doesn't understand much of what we say because he hasn't learned many idiomatic expressions, I would mean that there are lots of decoding idioms that he doesn't know.
The alternatives to idiomaticity include such explanatory principles as compositionality, pragmatic reasoning, and the anchoring of pragmatic indices. In considering the "backwards" definition of idiomaticity, we notice that somebody associates a meaning with a phrase that does not simply fall out from the meanings of the words themselves. Before we can decide what part of this might be idiomatic, we have to consider the alternatives.
Compositionality. Compositionality is the name given to the predictable relation between (I) the meaning of an expression and (2) the meanings of its parts, together with the ways in which those parts are grammatically organized. We can know by compositional principles what "blue box" means, simply by knowing the meaning of the adjective "blue" and the meaning of the noun "box" and by knowing the semantic force of the adjective + noun modification construction which this phrase is a construct of. [note about why this is an oversimplification?] In such cases we can say that the meaning of the phrase is a function of the meanings of its parts. Nobody has to "learn" separately the meaning of the phrase "blue box." The noun compound "greenhouse" is different. Knowing the meaning of the adjective and the noun, and knowing the familiar pattern by which adjectives can join nouns in a compounding process, does not alone prepare one for the knowledge that this is the name given to structures within which plant growth is facilitated or hastened.
The reason we need an understanding of compositionality in discussing idiomaticity is that some arguments against the treatment of an idiom as an idiom take the position that we are dealing merely with polysemy. Thus, if one of the meanings of "spill" is 'divulge', and one of the meanings of "beans" is 'secrets', then "spill the beans" is not an idiom: it is the compositional product of special meanings of each of the words.
Pragmatic Reasoning. In the case of sentential idioms, it is important to distinguish between the conventional meaning that a construct built on them might have and the kind of reasoning that is involved in cooperative conversational interaction. If a mother says, "I wonder who could have left their dirty socks on the middle of the floor", she probably expects her intended addressee to take this as a sarcastic request to pick the socks up and put them where they belong. A lot has been written about the mechanisms for this kind of reasoning; one reasonable view is that the mother expects what she says to be taken as the first part of a potentially continuable conversation that, given the relationships that hold between speaker and hearer, is going to lead to a specific conclusion; the cooperative child can anticipate this path and act on the inference without requiring the whole conversation to be played out. [references]
But now consider certain negative "why" questions, in particular, questions such as those exhibited here:
An attempted pragmatic reasoning explanation for these sentences might follow some such train as train as this: if someone asks me to explain a state of affairs that I am involved in, it might be that she thinks there's something wrong about that state of affairs, and making that inference might lead me to doing something to change it. Such reasoning will perform quite well with certain kinds of questions, but I will claim that it doesn't work in the case of these sentences.
If I say to you, "Why aren't you wearing your shoes?", your natural inclination might be to think that I find this situation questionable and I am suggesting you should put your shoes on. Such an inference, however, does not depend on the question being negative in form: it would be called on just as well if my question had been "Why are you going barefoot?".
The argument that the first group of negative "why" questions make up a special construction, even though constructs built on it closely resemble ordinary questions, includes the following points:
(1) "Real" questions with "why" can generally be paraphrased as something like "situation S exists; explain that". Thus, "You are not wearing shoes; explain yourself." The "why" questions that are taken as suggestions cannot. "Why don't you be the leader?", for example, cannot be paraphrased as "You don't be the leader; explain!".
(2) Instances of the construction can use "do" with "be", true also of imperatives (obligatory, in the negative "don't be obtuse" and optional in the affirmative, as in the gushy "do be careful"). Notice the difference in interpretation between "Why aren't you the leader?" and "Why don't you be the leader?". The first of these does permit the two-part paraphrase. ("you aren't the leader; tell me why").
(3) "Real"negative "why" questions are generally negative polarity contexts [references]; negative-why-question suggestions are not. In the following two sentences, notice the difference between the suggestion, with "something", and the ordinary question, with "anything.
Our conclusion, using the preceding observations and a few others, will have to be that there exists in English a way of expressing suggestions that has the form of a negative "why" question and has some of the internal trappings - the full story is a bit complicated - of a positive suggestion.
Anchoring of Pragmatic Indices. Other contextual effects are tied to linguistic conventions, but are separate and not tied to specific phrasal idioms. Here I refer to the fact that deictic [references to deixis literature?] references, of time, place, and person, generally need to be interpreted by appealing to such features of context as who is talking, where the conversation is taking place, where the interlocutors are located, what has become topical in the immediately preceding discourse, etc. In general, the nature of this particular sort of appeals to context does not figure in disputes about whether some expression is idiomatic or not, so nothing more will be said about it here.
Productivity.. A grammatical process or pattern or rule (or "construction") can be said to be productive if the conditions of its applicability do not require the listing of exceptions. Actually, productivity is a notion of degree. All grammatical constructions have some constraints on their applicability, but the extent to which those constraints can themselves be formulated in general ways is the extent to which we can say that the construction is productive. Some constructions only work with monosyllabic words; some only with certain grammatical categories. But they are general to the extent that such non-lexical constraints involve general (boolean) conditions involving properties shared by classes of lexical items, rather than lists of specific words.
A common test of productivity is that if a new word is added to the language which meets whatever semantic of formal conditions might be imposed on the construction, speakers have no trouble using the construction with the new word. In English the deadjectival nominalization construction that provides a "-th" suffix is not productive ("truth", "width"): the one that adds "-ness" is ("kindness", "friendliness"). Or at least relatively so. The theoretical point of the use of the notions blocking and pre-emption [see below] can be seen as a way of preserving (the appearance of) productivity, since having a theory of blocking reduces the need to mention exceptions in the description of individual constructions.
"Coining". We can distinguish two kinds of "creativity" in language. In one case there is the ability of speakers, using existing resources in the language, to produce and understand novel expressions. In the other case, the one for which we use the term coining, a speaker uses existing patterns in the language for creating new resources.
In many cases there are clearly unproductive patterns which can nevertheless be exploited in the creation of new words. We distinguish coining from productivity in word formation by saying that coining requires the recognition of a historical act in which someone created the new word and assigned it a meaning. The two words "benign" and "malignant" form a simple contrast set within a semantic frame having to do with types of tumors. Not *benignant or *malign. But I recently heard somebody introduce (I've forgotten the context) the word "benignant". If this word catches on, that won't be evidence that the particular word-formational process was, after all, productive, but will just show that the creator of the word made use of an existing pattern for creating a new word. The special meaning assigned to that new word was not predictable from its pieces. The word was "coined" and not "generated".
There is a view of grammar according to which the grammar proper will identify only the productive processes. Since the ability to create new words, using non-productive processes, is clearly a linguistic ability, it is my opinion that a grammar of a language needs to identify constructions that exist for "coining" purposes as well. Technically, the coining constructions will simply be thought of as bound constructions, constructions that are "bound" to - inherited by - particular complex words. They will serve to motivated and represent the substructure of morphologically complex words and some idiomatic phrases. But they are also available for the coining of new words.
Motivation. We very frequently find that words and phrases that are idiomatic, by our definition, nevertheless have components that are related, sometimes in quite explicit and easily statable ways, to the meanings of their parts. This is true perhaps for most morphologically complex words, and most conventional nominal compounds. Thus, there are good reasons why a device used for certain purposes in the classroom is called a pointer, and good reasons why a hunting dog that has been bred for certain behavioral characteristics might also be called a pointer; but those reasons are not sufficient to enable a speaker of the language to know that the word has just those meanings.
Some words are motivated by the semantics of one or more of their constituents; we can speak of semantic motivation. Some words are motivated by certain features of their sound: clap, slap, rap, tap, flap, for example; here we can speak of phonological motivation. Many idiomatic phrases appeal to our recognition of underlying symbolisms. "My hat's off to you" alludes to a gesture which has a particular social meaning, even in a time when nobody wears hats. Here we are dealing with a special kind of symbolic motivation.
Pre-emption a.k.a. Blocking. Pre-emption is the name given to a state of affairs in which a ready- made expression fills a semantic gap which might have been filled by a productive process, but where the productive competitor is ungrammatical. Curiosity is said to block or pre-empt "curiousness", "suspicion" to block "suspiciousness". [References to blocking: Ted Briscoe, etc.] (We might notice that it would not be quite true to say that the morphological combinations "curiousness" and "suspiciousness" are simply prevented from occurring. Here one needs to pay attention to subtle matters of meaning. The adjective "curious" can refer to a property of a person (or other sentient being), or it can refer to a property of something observed by a person. Thus, I can say either "I was curious about the details of that situation" or "That situation is quite curious". Similarly, "suspicious" can refer to a property of an experiencer or to a property of some phenomenon or behavior. Thus, I can say either, "I was suspicious about your behavior" or "I found your behavior quite suspicious". The productive formation in "-ness" is in fact possible with the latter of these two senses, allowing such phrases as "the curiousness of my situation" or "the suspiciousness of his behavior".)
Discussions of pre-emption are often limited to matters of word-formation; but there are fixed phrases that are also involved in pre-emption. Consider, for example, the (semi-)regular ways of describing calendric units of time that contain the moment of speaking. With the units week, month, and year, we find such expressions as these:
(the) X before last, last X, this X, next X, (the) X after next
You will see that X can be replaced by any of the three words "week", "month", and "year". But the word "day" cannot be fitted into these formulas. (There is, of course, a kind of ceremonial use of the phrase "this day".) Instead, we use the following special expressions:
the day before yesterday, yesterday, today, tomorrow, the day after tomorrow
These are not merely alternative ways of saying what they say, but they pre-empt the more regular means of expression. *"Last day" or *"day after next" are not English.
The layering of pre-emption in this semantic domain continues. The general ways of identifying portions of a day follow three patterns:
(1) "the DAYPORTION of DAYNAME", if the DAYNAME is expressed as a date or as a phrase with a definite article. Notice: "the morning of December 14th", "the afternoon of the day the baby was born", etc. This pattern applies to the day-names that are two away from "today": "the evening of the day before yesterday", "the afternoon of the day after tomorrow".
(2) "DAYNAME DAYPORTION", if the DAYNAME is expressed as a single word: "Christmas morning", "Thursday evening", but also "yesterday afternoon" and "tomorrow morning".
But in the case of day-portions of the day containing the moment of speaking, there is a new pattern: "this morning" (instead of *"today morning"), "this afternoon" (instead of *"today afternoon"), and "this evening" (instead of *"today evening"). However, this particular subregularity is itself pre-empted by "tonight" (in place of *"today night") and "last night" (in place of *"yesterday night").
In short, there can be layerings of pre-emption, where a productive pattern A is pre-empted by a semi-productive pattern B, but that pattern itself is pre-empted by a third pattern, C.
"Extragrammatical" Form. Sometimes even the form itself is not predictable from the grammar. Phrasal idioms of this sort are what are called in the Fillmore/Kay/O'Connor article extragrammatical idioms [reference]. Examples: "by and large" "first off", "all of a sudden". The interpreter could not even know, without learning them separately, that these expressions are sayable in English. The grammar of English has no mechanism for conjoing a preposition ("by") with an adjective ("large").
Form-Meaning conventionalities. The meanings of monomorphemic words are the clearest and most numerous examples of special conventions linking form with meaning. Nothing will help you figure out what "butter" means from its form. The word "butterfly" differs in that it is motivated in ways that mirror its morphological structure; its head component makes it, perhaps, easy to remember: but one couldn't know what kind of creature the word designates from the name alone.
Meaning-Use conventionalities. For idioms that directly map forms with uses, their histories might be of the following sort. Let us suppose there was a time when it was conventional to express to someone about to embark on a journey one's wish that God accompany them. And let's suppose that this convention governed the relation between the occasion and the meaning of the wish, so that the greeting could be expressed in a number of ways. ("May God go with you", "Travel with God", "May God be your travelling companion", "Take God with you", "Go with God", etc.). At this stage what we have is a convention relating an occasion with a meaning.
Then let us suppose that after a number of decades there came to be a single preferred expression that carried that function: "May God be with you". At this point (1) a given form (2) expresses a given meaning, and (3) is used in a given social situation. The convention connects a meaning, a form, and a use.
Then let us suppose that a period of phonetic erosion takes place and this automatic expression comes to sound like "Goodbye". When this has happened, there is a conventional pairing of the form with the use, but the form no longer has anything to do with what the phrase is supposed to mean, and the users have no consciousness of its history. In a sense, it doesn't, strictly speaking, mean anything. Here we could say that the form is directly associated with the use, without any intervening semantics. [Jerry Morgan reference]
Form-Use conventionalities. With this last example, we can see how there can be conventions in which the form is associated with a use, even if nobody knows what the form means. In most people's use, this is certainly true of certain politeness formulas in English, such as "How do you do?", "You're welcome", and so on.
The lack of a semantics that independently motivates the automatisms in our language can be best understood by noticing occasions in which speakers of English have mislearned idioms. Perhaps the speakers have mislearned the idioms by assigning them some minimal sort of meaning, but it is at least clear that it is not its conventional meaning that gives speakers a reason for using the expression. Such mistakes as "for all intensive purposes" for "for all intents and purposes", "a nominal egg" for "an arm and a leg" ("it'll cost you a nominal egg") [Morgan?], "by in large" for "by and large", etc. [Kay's collection?]
Form-Meaning-Use conventionalities. A very large number of formulaic expressions used in conversation, used in organizing a discourse, used in ceremonial occasions, etc., are of the type where there are special conventions linking a form with a meaning and these with occasions of use. There will probably be lots of occasions for discussing these later on in the course.
Idioms can be classified by a large number of criteria.
Category and Level. First, we might wish to group them according to their category and level. Lexical idioms (ignoring monomorphemic lexical items) can be nouns ("pointer", "cranberry"), verbs ("subvent", "react"), adjectives ("uncanny", "holistic"), for example. Phrasal idioms can be adjectival ("stark raving mad"). nominal ("notary public"), verbal ("come a cropper"), prepositional ("in a brown study"), or sentential ("it takes one to know one").
Function. For the idioms that are not syntactically dependent on other elements, we could classify them according to their function. Some formulaic expresses accompany acts ("upsy daisy", "this hurts me more than it hurts you"), some accomplish acts ("I declare the meeting adjourned"), some are comments on the ongoing discourse ("I wouldn't touch that with a ten-foot pole"), some are parentheticals, qualifying what is being said ("you might say"), and so on.
Sentence Type. Sentential idioms can be classified according to the sentence type. Some are imperatives ("knock on wood", "shut up"), some are conditionals ("if the shoe fits, wear it"), some are questions ("who knows?", "can the leopard change its spots?"), and some use certain special constructions ("the more the merrier", "the bigger they come, the harder they fall").
Gaps. Many idioms are not complete "runs" but have gaps in them. Some such gaps are complete sentences ("it's (about) time [you brushed your teeth]", where the sentence has to be in past tense form), some are verb phrases ("I wouldn't [marry Louise] for all the tea in China"), some are noun phrases ("play second fiddle to [Harry]"). Possessive gaps can be coreferential to the subject, in the case of verbal idioms ("to blow [one's] nose"), or referentially distinct ("to pull [someone's] leg"), and some can go either way ("to cook [(some)one's] goose").
Collocations. Collocations are phrase made up of two or more words, in some grammatical relation to each other, where it appears that one or both of the words is has some special conventional association with the other. In some cases, one of the word only, or almost only, occurs in the phrase in question (the "blithering" of "blithering idiot", the "aspersions" of "cast aspersions"), sometimes each word occurs frequently elsewhere but the combination has a special sense or a special frequency of occurrence ("spontaneous combustion", "manual labor", "consenting adult"), and so on.
In many cases a dependent or modifying word fulfills a necessary function in respect to the other word, such as that of intensifying: "broad daylight", "dark red", "fancy footwork", "vast majority", etc. [reference to Mel'cuk?]
Cultures differ greatly in respect to their attitudes toward ready-made speech. In certain cultures - Japan or Turkey, say - knowing "the right thing to say" for a given occasion is extremely important. Although in everyday small-talk, and in ceremonial speech in American life, we depend a great deal on ready- made expressions, there is a general attitude here that when it is important to say something, one should use language that directly expresses what is on one's heart, rather than to recite well-worn ready-made expressions.
It's as if the members of some cultures would say, "Why do you display your own linguistic cleverness at a time like this? Everybody knows what you're supposed to say, and by doing what you've just done, you've drawn attention to yourself!"; whereas other cultures would say, "How thoughtless of you just to recite phrases that you've memorized; at a time like this you, as an individual, should say what's on your heart!". In short, having lots of ready-made things to depend on can be thought of as a "groove", making your conversational life easier, or as a "rut", blocking your freedom of expression, getting you stuck in routines that don't fit your special occasion.
In any case, making use of ready-made expressions creates a sense of group solidarity, since each member of a language community can depend on the others to pick up allusions, to recognize familiar ways of thinking, and so on.
Obviously the first thing to discover is just which part of the expressions you think of as instances of an idiom belong to the idiom, as opposed to those that belong to the "rest of the grammar". This topic is taken up in great detail in the Kay & Fillmore paper, "What's X doing Y?".
Suppose you are thinking of the collocation "ripe old age". One question to ask is where does this idiom end? In the initial set of examples you've seen, it always had the indefinite article in it ("he lived to a ripe old age") so that might appear to be a part of it; but then you find a new example "at the old age of 83" or the like. At first you might notice that most of the expressions contain the word "live" ("live to a ripe old age"), but then you find "attained a ripe old age" or "died at the ripe old age of 95". Should such information be brought in to the description, or will that fall out from the meaning of the expression? All of these are issues that need to be taken up in determining the nature of an idiomatic expression.
Or suppose you've been looking at the phrase "full well". Here you might end up deciding that the full idiom is "know full well", since there doesn't seem to be much variability in the use of this adverbial expression. Then you might think that "by heart" is another of these, since "know by heart" is a common phrase; but then you'll notice other expressions like "learn by heart", "recite by heart", "play by heart", etc., and you will think of "by heart" as the full extent of the idiom, while wanting to describe its semantics in such a way that it will turn out to be a common companion of the larger class of expressions in which it is found.
For expressions with gaps, it is important to study the degree of flexibility of the expressions that can fill that gap. Sometimes this will result in a description of a family of related idioms, rather than simply an idiom with a free gap. For the idiom "cut a figure" there is a favorite expansion "cut a fine figure", but other possibilities also exist, such as "not cut much of a figure".
It sometimes takes a lot of work to figure out just what the idiom means. In the case of "spic and span", one might first propose that it means "very clean" or "completely clean" or the like. But then you may notice that it seems to be applicable mainly to shiny things in a kitchen. One would never say, "when we got the drapes back from the dry cleaners, they were spic and span". Or "I want your hands spic and span when you come to the dinner table". Should the final description identify "selectional restrictions", specifying the kinds of nouns that can occur as the subject of "spic and span" as a predicate, or should it be explained that this expression seems to be dedicated to talk about certain kinds of smooth metal and glass surfaces of the sort found in kitchens and bathrooms? These questions are sometimes difficult to answer.
Here are some idiomatic expressions to play with in carrying out such inquiries. In some cases I have added comments on what is special about them.
lift a finger (requires negative polarity context)
deader than a doornail (fixed phrase with "doornail" also found in "as dead as a doornail; but "deader" is an irregular comparative)
what, Jim get married? (incredulity response construction [references] )
now watch me drop it (tempting fate construction)
it's about time we did something about this (embedded sentence has to be in past tense)
look what the cat dragged in (not "look at")
the thing is is (the "double IS construction" [references])
the bigger the better [references}
how big of a box (*how big boxes do you need? *how hot soup do you like?)
let alone [Fillmore, Kay & O'Connor reference]
number words (nine+ty seven; cf. Fr.quatre-ving dix-sept = four-twenty ten-seven)
time-telling formulas (quarter past seven; sechs Minuten vor dreiviertelsechs = six minutes before three-quarters six [5:39])
cousin terminology (third cousin twice removed [Kay reference])
money value (UK: two pounds fifteen, US: *two dollars fifteen)
a philosopher's philosopher
coffee coffee ("instant coffee or coffee coffee?")
my every wish ("fulfills my every wish", "my every dream came true", *"my every son became a doctor")
day in and day out ("week in and week out", *"week in and day out")
boys will be boys ("war is war" [Wierzbicka reference]
tag questions ("that was fun, wasn't it?" - polarity reversing; "you did your homework, did you?" - polarity preserving)
snow is to white as blood is to red
what a fool I've been
hungry though we were
may he rest in peace
what did you go to the store and buy (extraction from first conjunct [Lakoff reference])
what can you eat and not get cancer (extraction from second conjunct [reference])
what the heck did you see ("the hell", "the deuce", "on earth", "in heaven" [references])
boy was I tired ("man, was he stupid" - *"girl ...," "*woman ...")
just because I live in Berkeley doesn't mean I'm a revolutionary [reference]
such that (the logician's relative clause)
with palm outstretched ("with hat in hand"; paraphrase with "have")
what's x doing y? [reference]