An Empirical Problem for Conceptual Analysis
Conceptual analysis works something like this: philosophers who are interested in discerning the true essence of some concept C present candidate definitions to a tribunal of virtues. For one, it is virtuous for a definition of C to include all those things that intuitively instance C and exclude all of those things that intuitively don’t. Thus, the conceptual analyst may discriminate between candidate definitions by prodding their susceptibility to intuitive counterexamples, as Gettier does when he formulates his counterexamples to JTB analyses of knowledge. It is also virtuous for a definition of C to be simple. William Ramsey writes that, “[i]f an analysis yields a definition that is highly disjunctive, heavily qualified or involves a number of conditions, a common sentiment is that the philosopher hasn't gotten it right yet. … To borrow a technical phrase from Jerry Fodor, analyses of this complex sort are commonly regarded as ‘yucky'” (Ramsey 1992, p. 2). Equipped with tools like these, conceptual analysts aim to chisel out, “an illuminating set of necessary and sufficient conditions for the (correct) application of a concept” (Audi 1983, p. 90).
The purpose of this post is to argue that this methodology is fundamentally flawed. I will draw primarily from William Ramsey’s empirical critiques before deferring to the Churchlands to sketch an alternative picture.
Empirical Problems with Conceptual Analysis
Conceptual analysis as described above presupposes facts about our cognition. For example, if we suppose a correct analysis of C must meet the intuitiveness condition, then we are betting that “our intuitions will nicely converge upon a set whose members are all and only those things which possess some particular collection of features” (Ramsey 1992). Consider this the weaker empirical presupposition of conceptual analysis–that our intuitions about which cases instance C will lead us to a clear set of conditions. The stronger claim presents itself when one tries to explain the weaker claim. The most straightforward account is that our internal representations of C in some manner contain the necessary and sufficient conditions for instancing C, and that these conditions by some mechanism are exploited by our intuitions. In other words, when we intuit that a case is an instance of C, this intuition is actually applying the internalized necessary and sufficient conditions for instancing C. Call this the classical view.
On the classical view, conceptual analysis is telling us something about ourselves: it tells us what internalized set of conditions generate intuitions of the sort [x is an example of C]. In doing so we find out what necessary and sufficient conditions we tacitly understand and eventually what the correct conditions are for the application of C. Ramsey shows that neither the weaker claim nor the classical view of internal representations looks promising in light of evidence from cognitive psychology. Because I am too stupid and lazy to go through the evidence study by study, I will briefly summarize the findings and recommend that interested readers look through the multitude of studies discussed in Churchland 1989 Bishop 1992, and Ramsey 1992.
One finding discussed in the above sources is that for both highly abstract and prosaic concepts, membership is often understood in gradations rather than as an all-or-nothing matter. For example, a robin is often seen as a better example of a bird or more “bird-like” than an ostrich. Furthermore, subjects tend to recognize more “typical” instances of concepts faster, learn concepts faster when presented with typical rather than atypical examples, and list typical instances first and atypical instances later when asked to list examples of a concept. When subjects are asked to list attributes relevant to a concept, typical instances of that concept tend to have more of those listed attributes (even when these attributes seem not to align at all with the necessary and sufficient conditions for category membership). Additionally, the significance of listed attributes to categorization shifts between contexts.
There is severe tension between the classical view which says that concepts are represented internally with strict criteria for instantiation, and evidence that shows intuitions being distorted by context and how much resemblance examples bear to prototypical instances. That is, if subjects internally represented things with strict necessary and sufficient conditions, we would expect all instances of a concept to have equal claim to category membership because they all fit the criteria. But subjects tend to identify and learn some instances faster, list them first, and view them as better examples because of the (often inessential) attributes they share with prototypical instances.
Influential cognitive psychologist Eleanor Rosch concludes that the internal, “representation of categories in terms of necessary and sufficient attributes alone would probably be incapable of handling all of the presently known facts …” (Rosch 1978). Ramsey doubles down with these alarming remarks:
Because of these and similar considerations, many psychologists have become increasingly skeptical about the classical view. … as is often the case in science, the situation here demands an appeal to the best explanation, and efforts to save the classical theory have largely been dampened by the success cognitive psychologists have had in constructing alternative accounts that comport much better with the data. It's safe to say that, with a few exceptions, psychologists today believe that the classical picture of concept representation is no longer in the running.
Some general problems start to take shape. For one, we no longer have strong reason to think even the ideal deployment of our intuitions will yield a set of necessary and sufficient conditions. If our intuitions are not guided by our grasp of criteria for category membership, then why would we expect our intuitive judgements to eventually point us to one? And if our concepts are variably realized, why should we expect simple definitions? I will come back to these questions after sketching a positive account of internal representations.
Alternative Theories and Connectionism
There are a variety of theories proposed instead of the classical view. What unites many of them is their rejection of the notion that intuitive categorizations are generated by the application of necessary and sufficient conditions. Rather, many of these accounts suppose instead that intuitive categorization is based on the degree to which examples resemble prototypical instances or instances stored in the memory. A result of these theories is that there are multiple ways for an example to generate categorization as C, and that not all examples deemed instances of C are united by a particular small set of properties.
One such view I’d like to dive deeper into is connectionism. I’ll specifically engage with the Churchlands’ connectionism because it is a helpful example of how a concept might be internally represented if not with conditions. I will now give a very brief sketch of the view.
The Churchlands build their connectionist model with the neuron as the basic building block. It receives an input signal that is processed by the neuron and sent down its axon, which terminates in end bulbs that form synaptic connections with the cell body or dendrites of other neurons. Every neuron is part of such a network and receives its inputs through synaptic connections from a large number of other neurons. Thus, Paul Churchland explains that, “the level of activation induced [in a neuron] is a function of the number of connections, of their size or weight, of their polarity (stimulatory or inhibitory), and of the strength of the incoming signals” (Churchland 1989, p. 160). Neurons are organized in populations, and a neuron in one population connects to the second through the synaptic connections formed at the end branches of the neuron’s axon. Inputs in the first population or layer of neurons are processed and then outputted through variably weighted or amplifying synaptic connections that alter the strength of the signals received by the neurons in the subsequent layer. There are three main types of layers in this model: input layers that receive the initial inputs, output layers whose activation levels are the end result of all of the previous computations, and the hidden layers that lie in between to perform intermediate processing.
Activation levels in a given population of neurons are represented as an n-dimensional vector, where each component corresponds to the level of activation of a particular neuron. Thus a layer of 3 neurons might have an activation vector of <0.5, 0.6, 0.3> at a time t, and each of these components picks out an activation level in one of the 3 neurons. Every neuron in this first layer will form a weighted synaptic connection with the neurons in the succeeding population which transforms the input signal as it passes through the synapses. In summary, “[the network] is just a device for transforming any given input-level activation vector into a uniquely corresponding output-level activation vector. And what determines the character of the global transformation effected is the peculiar set of values possessed by the many [synaptic] connection weights” (Ibid, p. 163).
A crucial detail is that the configuration of synaptic weights in a network is highly plastic and constantly adjusted in light of new inputs. (Additionally, it’s highly implausible that our genomes directly code in some way for any of our synaptic weights (Churchland 2007, p. 139-140)). Through a process of training called Hebbian learning, the space in which information is processed is gradually sculpted to properly “sort” inputs into a variety of partitions in the possible values of the output layer. Sometimes, like in the example I'm about to give, synaptic weights are adjusted with the help of a program which propagates reports of errors back through the system. The end result of both unsupervised and supervised learning is that the network will begin to adjust the synaptic weights such that inputs of similar character are “funneled” into the proper partition and end up with similar output values. This is the foundation for state-space semantics. Say that the 3-neuron layer mentioned above is an output layer. The activation vector of the layer at t is <0.5, 0.6, 0.3>. This vector is modeled in a three dimensional state space (imagine a coordinate plane where each axis measures the level of activation for one of the three neurons). Outputs of sufficient similarity will be proximal to each other on our output state space. Those proximally clustered vectors contour the structure of a prototype region, at whose center of gravity lies the prototypical point—an analog to a network’s personal “Platonic ideal” which those proximal activations resemble. (Is anyone still reading this shit or did you all alt-tab back to Twitter?)
All of this is very abstract, so I’ll summarize an example drawn from Gorman and Sejnowski 1988 and Churchland 1989 to help bridge the gap between this description and internal conceptual representation. Imagine a team of submarine engineers trying to create a neural network to discriminate between rocks and sea mines. Sonar echoes are fed into a frequency analyzer that outputs the energy levels of the sonar echo at 60 different frequencies. The output of the frequency analyzer is then sent to the input layer of a network composed of 60 neurons. These frequencies are transmitted through progressively smaller layers of the network, the original values being transformed by the synaptic connections between populations. Eventually the signal reaches an output layer composed of only 2 neurons. The engineers want to train the system such that when the echo of a rock is inputted into the system, the output vector will be near <0, 1,>, and when the echo of a sea mine is inputted, the output vector will be near <1, 0>. Call these our prototypical rock and prototypical mine. (This network is pictured in figure 1).
(Figure 1: Depiction of sonar analysis network architecture. “Cylinder” is used instead of “mine” because metallic cylinders are what were used for the actual experiment. Shamelessly stolen without edit from Sejnowski and Gorman 1988.)
To train the network, the engineers feed it a handful of mine echoes and a handful of rock echoes as training samples. The engineers then take the outputs and run them through a program that compares them to the desired or correct outputs, and then adjusts the weights of the synaptic connections identified to be most responsible for the errors. While this process is repeated over and over again (about 300 times in the real experiment), the network gradually decreases the frequency of its errors until it is capable of sorting the echoes into rock and mine echoes with reasonable accuracy.
At this point the network has begun to grasp the characteristics of rock and mine echoes. It is put to the test by exposing it to echoes from outside of the original samples. While rocks and sea mines vary in shape and size and their sonar returns were sampled at different angles and distances, there is some kind of enduring regularity or disjunction of regularities to be found in the echoes such that the network is capable of adjusting its synaptic weights to reliably output something near the desired prototypical vector. To zoom out for a second, I’ve described a network that is capable of interpreting “sensory” inputs and sorting them into categories on the basis of shared attributes. Furthermore, these categories are revised in the face of new data. What I have described is a neural network which possesses a (very simple) theory or conceptual framework consisting of two concepts: rock echoes and mine echoes. Categorization in this network happens without any discursive or symbolic representation at all, let alone an internalized set of criteria for conceptual instantiation.
Recall the way that outputs can be modeled as vectors on a coordinate plane. The prototypical rock echo is the vector <0, 1>. The synaptic weights of the system are configured to give that prototypical point a kind of gravitational pull. Input vectors with certain characteristics (namely those of rock echoes) are pulled closer and closer to that prototypical point as they flow through synaptic connections. This is (very) crudely how categorization can be understood without necessary and sufficient conditions. (Obviously, this does not map one-to-one on our actual neurobiology, but there is a lot of research exploring this mapping or updating connectionist models like this to more easily accommodate our actual anatomy.) Returning to robins, a connectionist would say that there is a prototypical point or region in a subject's output state space that basically represents birds. A robin is more readily identified as a bird by the subject because when their sensory neurons are activated by a robin, the activations are transformed to a vector that lies closer to the nucleus of the prototypical bird region in the state space than the output produced when the subject perceives some fucked up bird like an ostrich. We could perhaps explain this by noticing robins and birds of similar attributes were more present in the initial construction of that subject’s internal representation of birds: people in the subject’s cultural context are more often exposed to and taught about pigeons and hummingbirds and crows than they are ostriches, so we’d expect there bird-region to be biased towards those attributes that common birds share.
I will now discuss some important aspects of this view and what they mean for conceptual analysis. The first thing to note is how this model accounts for divergent intuitions. Our internal representations are plastic and shaped by our experiences. No two people have exactly the same set of experiences, thus no two people will form exactly the same configuration of synaptic weights, and thus no two people will have exactly the same conceptual frameworks. Of course, because there are enduring regularities in nature and the way things are taught, we would expect congruence in the partitioning of our state spaces. (See: Churchland 2007, chapter 8 for an exploration of conceptual translation on this view.) But as we move away from things like birds and into domains like ethics and epistemology, enduring regularities in local social reality become more salient factors in the construction of our internal representations. To whatever degree our concepts of justice or (epistemic) justification are plastic, they are largely sculpted by the ethical and epistemic experiences, and this admits endless contingencies into the “programming” of the relevant intuitions. These contingencies include our upbringings, our cultural norms, socioeconomic factors, our traumas, our linguistic communities etc. Philosophical consensus by way of conceptual analysis thus looks hard to attain for these concepts so long as there is sufficient diversity among analysts. Put more strongly, we should not expect philosophers to ever agree on what the intuitive necessary and sufficient conditions for justice or justified belief are. These considerations generate similar problems for analyzing many other concepts deployed in philosophical investigation like knowledge, beauty, goodness, god, womanhood, explanation, etc. (And all of this undersells the ways that differences in language greatly affect intuitions. See Stich et al. 2017 for an exploration of the way unduly focusing on English words like “knowledge” impacts epistemological investigation.)
(Sandin 2006 contends that conceptual analysis may still be worthwhile despite this as we may end up with definitions that were better than they were before. I don’t dispute that, but I would like to distinguish between conceptual analysis as defined by Audi and Ramsey and conceptual engineering or explication which I take to be less vulnerable to critiques like the ones I produce in this post. For a little explanation of the difference, check out my previous post on conceptual engineering and the definition of ‘woman’.)
There is another problem for conceptual analysts. Recall the two general conditions for a successful analysis of C: intuitiveness and simplicity. We could not expect these to be jointly satisfied under this connectionist theory of internal representations, nor any theory in its neighborhood: if C is instantiated for subjects by several cases which do not share in C’s purported essential features, the conceptual analyst might have to embrace disjuncts for the sake of our intuitions or ignore some intuitive counterexamples to save on ink. Either way, assuming our internal representations are more like families than laws, conceptual analysis as it is defined above will be ineffective at carving out real definitions.
Works Cited
Audi, Robert (1983). ‘‘The Applications of Conceptual Analysis.’’ Metaphilosophy 14, no.
2:87–106
Bishop, Michael (1992). “The Possibility of Conceptual Clarity in Philosophy.” American
Philosophical Quarterly 29 (3):267 - 277.
Churchland, Paul M.
(1989). A Neurocomputational Perspective: The Nature of Mind and the Structure of
Science. MIT Press.
(2007). Neurophilosophy at Work. Cambridge University Press.
Gorman, Paul & Sejnowski, Terrence (1988). “Analysis of Hidden Units in a Layered Network
Trained to Classify Sonar Targets.” Neural Networks 1 p. 75-89.
Ramsey, William (1992). “Prototypes and conceptual analysis.” Topoi 11 (1):59-70.
Rosch, Eleanor (1978). “Principles of categorization.” In Cognition and Categorization.
Lawrence Erlbaum.
Sandin, Per (2005). Has psychology debunked conceptual analysis? Metaphilosophy 37
(1):26–33.
Stich, Stephen, Mizumoto, Masaharu & McCready, Eric (eds.) (2017). Epistemology for the rest
of the world. Oxford University Press.
Comments
Post a Comment